Memory problem with storing unicode characters
-
It seems if you store unicode characters in memory, occasionally additional characters will be added to the string automatically. An easy way to reproduce this is to run the following code:
Memory.test = 'য়জঝঞঠডঢড়ৠৢચਜਝਡਢજઝડઢછૠૢଜଝଞଡଢਫ਼ਙࡎࡏࡐࡒࡓࡔࢎҠૉઑટ࣏࣓࣎ࣔएओॎॏ॓॔ਖ਼ଓ॑ऐऒঐঔৎਟ࣑ઔটਚਜ਼ઊଊԡӡաӢԞԝԟ՟ਗ਼ਛਖ਼છਙ';
Wait a period of time, and eventually `Memory.test` will include additional characters. For example:
য়জঝঞঠডঢড়ৠৢચਜਝਡਢજઝડઢછૠૢଜଝଞଡଢਫ਼ਙࡎࡏࡐࡒࡓࡔࢎҠૉઑટ࣏࣓࣎ࣔएओॎॏ॓॔ਖ਼ଓ॑ऐऒঐঔৎਟ࣑ઔ���ਚਜ਼ઊଊԡӡաӢԞԝԟ՟ਗ਼ਛਖ਼છਙ
These seem to always be the unicode 'replacement character': � (U+FFFD). They appear in memory all by themselves and wreak havoc. This seems to only be happening on private servers, though I haven't tested it extensively on the main server. I posted this in Slack, and several other people confirmed that they've also noticed the issue.
-
When storing things in unicode there's a specific bit you have to ignore to prevent things like that from happening. You can poke the people in #diplomacy for details.
-
Thanks tedivm. I hopped over to #diplomacy to discuss, and it appears I am already ignoring the bit that causes problems. My serialization/deserialization logic is exactly the same as dissi's, so it seems this is a genuine issue for private servers.
-
Checked my memory this morning and it was at 750KB. Then ran a line of code to remove all � characters from RawMemory and it was reduced to 100KB! These just keep piling up.
-
Can confirm, I have also seen this on the private service.