Memory problem with storing unicode characters

kshepard

It seems if you store unicode characters in memory, occasionally additional characters will be added to the string automatically. An easy way to reproduce this is to run the following code:

Memory.test = 'য়জঝঞঠডঢড়৞ৠৢચਜਝਡਢ੝੡જઝડઢછ૜૞ૠૢଜଝଞଡଢਫ਼ਙࡎࡏࡐࡒࡓࡔࢎ࢐࢒࢔Ҡૉ઒૓ઑટ੠࣏࣓࣎ࣔएओॎॏ॓॔ਖ਼ଓ૑଒॑࢑ऐऒ঎ঐ঒ঔৎ৏৐৒৓৔ਟ࣑੒૔੓ઔটਚਜ਼ઊଊԡӡաӢԞԝԟ՟ਗ਼ਛਖ਼છਙ';

Wait a period of time, and eventually `Memory.test` will include additional characters. For example:

য়জঝঞঠডঢড়৞ৠৢચਜਝਡਢ੝੡જઝડઢછ૜૞ૠૢଜଝଞଡଢਫ਼ਙࡎࡏࡐࡒࡓࡔࢎ࢐࢒࢔Ҡૉ઒૓ઑટ੠࣏࣓࣎ࣔएओॎॏ॓॔ਖ਼ଓ૑଒॑࢑ऐऒ঎ঐ঒ঔৎ৏৐৒৓৔ਟ࣑੒૔੓ઔ��ਚਜ਼ઊଊԡӡաӢԞԝԟ՟ਗ਼ਛਖ਼છਙ

These seem to always be the unicode 'replacement character': � (U+FFFD). They appear in memory all by themselves and wreak havoc. This seems to only be happening on private servers, though I haven't tested it extensively on the main server. I posted this in Slack, and several other people confirmed that they've also noticed the issue.

tedivm

When storing things in unicode there's a specific bit you have to ignore to prevent things like that from happening. You can poke the people in #diplomacy for details.

kshepard

Thanks tedivm. I hopped over to #diplomacy to discuss, and it appears I am already ignoring the bit that causes problems. My serialization/deserialization logic is exactly the same as dissi's, so it seems this is a genuine issue for private servers.

kshepard

Checked my memory this morning and it was at 750KB. Then ran a line of code to remove all � characters from RawMemory and it was reduced to 100KB! These just keep piling up.

tedivm

Can confirm, I have also seen this on the private service.