Difficult to implement reasonable mutual exclusion when shard stops executing code immediately after creep leaves shard



  • The intershard segment is is not safe for concurrent usage and requires some system of signalling to control access to the shard per the docs:

    Warning: this segment is not safe for concurrent usage! All shards have shared access to the same instance of data. When the segment contents is changed by two shards simultaneously, you may lose some data, since the segment string value is written all at once atomically. You must implement your own system to determine when each shard is allowed to rewrite the inter-shard memory, e.g. based on mutual exclusions.

    I've observed shard0 stop executing my code the same tick that the creep leaves the shard, which, in my opinion, makes it unreasonable to implement the necessary mutex control to implement cross-shard data transfer.

    Although there are various concurrency primitives that could be used, if you're to send a message containing the creep's memory to another shard, the absolute shortest number of ticks it could take to succeed is 2 ticks.

    1) if no other shard is controlling the intershard segment, write your request to control the intershard segment then wait 1 tick
    
    2) check that no other shard simultaneously attempted to gain control of the segment, if not, write your message.
    

    But if at any stage another shard is controlling the segment or attempts to gain control at the same time as you, it will take longer than 2 ticks for you to gain control and send your message.

    There might be ways to unreliably work around this e.g. try and send the message 10 ticks in advance of the creep entering the portal which would give you a little time, but the same problem would occur if the creep was killed the following tick and the shard stopped processing your code.

    I think it's reasonable that a shard carries on processing your code for a period of time e.g. 1 minute after the last owned objects have disappeared from the shard.

    These are the logs of the shard stopping processing, for what they're worth

    2018-11-21 01:56:24 shard0: Inmate_1552296_exp explorer [room E50S0 pos 7,25] 349ttl About to enter inter-shard portal to shard1 E30S0
    2018-11-21 01:56:24 shard0: send_creep_memory ....
    2018-11-21 01:56:24 shard0: ISM attempted to gain control of intershard segment
    

    oops shard 0 stops running, leaving the intershard segment locked to it, and causing problems for other shards!

    2018-11-21 01:56:25 shard2: ISM intershard segment currently owned by shard0
    2018-11-21 01:56:28 shard2: ISM intershard segment currently owned by shard0
    2018-11-21 01:56:31 shard2: ISM intershard segment currently owned by shard0
    2018-11-21 01:56:35 shard2: ISM intershard segment currently owned by shard0
    2018-11-21 01:56:37 shard2: ISM intershard segment currently owned by shard0
    2018-11-21 01:56:40 shard1: Inmate_1552296_exp explorer [room E30S0 pos 44,12] 345ttl selected inter-shard portal to shard0 E50S0
    2018-11-21 01:56:40 shard1: ISM intershard segment currently owned by shard0
    2018-11-21 01:56:41 shard2: ISM intershard segment currently owned by shard0
    2018-11-21 01:56:43 shard1: ISM intershard segment currently owned by shard0
    2018-11-21 01:56:43 shard2: ISM intershard segment currently owned by shard0
    2018-11-21 01:56:46 shard1: ISM intershard segment currently owned by shard0
    2018-11-21 01:56:47 shard2: ISM intershard segment currently owned by shard0
    2018-11-21 01:56:49 shard1: Inmate_1552296_exp explorer [room E30S0 pos 41,15] 342ttl About to enter inter-shard portal to shard0 E50S0
    2018-11-21 01:56:49 shard1: send_creep_memory ...
    2018-11-21 01:56:49 shard1: ISM intershard segment currently owned by shard0
    2018-11-21 01:56:50 shard2: ISM intershard segment currently owned by shard0
    2018-11-21 01:56:53 shard2: ISM intershard segment currently owned by shard0
    2018-11-21 01:56:56 shard2: ISM intershard segment currently owned by shard0
    
    • finally shard0 starts running again when a creep is sent back into the shard a few ticks later, so the ownership is cleared *
    2018-11-21 01:56:57 shard0: tick skipped - we went from 28812812 to 28812818 a jump of 6. bucket is 9463