Memory problem with storing unicode characters



  • It seems if you store unicode characters in memory, occasionally additional characters will be added to the string automatically. An easy way to reproduce this is to run the following code:

    Memory.test = 'য়জঝঞঠডঢড়৞ৠৢચਜਝਡਢ੝੡જઝડઢછ૜૞ૠૢଜଝଞଡଢਫ਼ਙࡎࡏࡐࡒࡓࡔࢎ࢐࢒࢔Ҡૉ઒૓ઑટ੠࣏࣓࣎ࣔएओॎॏ॓॔ਖ਼ଓ૑଒॑࢑ऐऒ঎ঐ঒ঔৎ৏৐৒৓৔ਟ࣑੒૔੓ઔটਚਜ਼ઊଊԡӡաӢԞԝԟ՟ਗ਼ਛਖ਼છਙ';

    Wait a period of time, and eventually `Memory.test` will include additional characters. For example:

    য়জঝঞঠডঢড়৞ৠৢચਜਝਡਢ੝੡જઝડઢછ૜૞ૠૢଜଝଞଡଢਫ਼ਙࡎࡏࡐࡒࡓࡔࢎ࢐࢒࢔Ҡૉ઒૓ઑટ੠࣏࣓࣎ࣔएओॎॏ॓॔ਖ਼ଓ૑଒॑࢑ऐऒ঎ঐ঒ঔৎ৏৐৒৓৔ਟ࣑੒૔੓ઔ���ਚਜ਼ઊଊԡӡաӢԞԝԟ՟ਗ਼ਛਖ਼છਙ

    These seem to always be the unicode 'replacement character': � (U+FFFD). They appear in memory all by themselves and wreak havoc. This seems to only be happening on private servers, though I haven't tested it extensively on the main server. I posted this in Slack, and several other people confirmed that they've also noticed the issue.


  • Culture

    When storing things in unicode there's a specific bit you have to ignore to prevent things like that from happening. You can poke the people in #diplomacy for details.



  • Thanks tedivm. I hopped over to #diplomacy to discuss, and it appears I am already ignoring the bit that causes problems. My serialization/deserialization logic is exactly the same as dissi's, so it seems this is a genuine issue for private servers.



  • Checked my memory this morning and it was at 750KB. Then ran a line of code to remove all � characters from RawMemory and it was reduced to 100KB! These just keep piling up.


  • Culture

    Can confirm, I have also seen this on the private service.