When shards are known to be broken...



  • ... can they be seriously slowed or paused? Shard 2 has been ticking along for an hour now in a broken state. Global is resetting every tick and creeps are spawning either without memory or with corrupt memory. Still going ahead at 3.2 seconds/tick, collapsing people's colonies.

    Furhermore, some sort of fault detection and "safe mode" for the server might be a good idea. These memory corruptions and constant global resets are clear to the user and I'm sure they must be leaving some sort of trace in the server log. Dropping the tick rate in these circumstances would seem to be fairly easy to automatically do until an admin can come along and fix it.

    I'm sure the new servers will get more stable with time, but we're currently in the middle of the 5th or 6th major issue since the server move. It's getting rather frustrating.

    EDIT: To be clear, in case you aren't aware, shard 2 is resetting global every tick (which is costing me 80 CPU/tick before we take into account wiped caches), creeps are spawning with no memory (which some people don't have any recovery code for), and creeps are spawning with corrupt memory (which is just annoying). Been like this since ~6:20 UTC.



  • So turns out the specific issue I have right now is parallel global states. I have more than one global state (not sure how many) and the game is alternating which it gives me, breaking any sort of between-tick heap usage. Had it 3.5 hours now. Point remains though.


  • Dev Team

    We're working on it, sorry for the inconvenience.


  • Dev Team

    Please report if you still have any unexpected behavior.

    As to the auto-detect and self-healing mechanisms, yes they are yet to be improved, but it'll take time so I can't give any promises here.



  • Looks good from here - thanks - 300 ticks stable global.



  • I think it just went again. I have creeps with impossible TTLs and corrupt memory. Eg. this guy appeared with 1600 TTL: https://screeps.com/a/#!/room/shard2/W7S30


  • Dev Team

    Well I don't think we can figure it out now, it's 1:20 AM here. Something is misconfigured badly in our Redis instances. Will investigate tomorrow morning.