Solution to CPU lost to garbage collection (GC)



  • A few players who have been analyzing script performance have noticed the problem with Node GC and multitenant applications:

    Node's GC regularly has small and large pauses. The statistics on these are better explained by large experienced players.

    Artem has done some great things to triage this, for example observing and experimenting with optimistic GC / global.gc() calls.


    As an alternative to attempting to solve the problem of multitenant realtime fairness why don't we simply accept that Node is not a hard realtime system, and instead deal with this as a billing problem.

     

    I propose that the Screeps server should give players a CPU bucket credit relating to the GC delays that they suffer relative to their own memory allocation.

    For example, the memory usage statistics before Player A's tick begins are recorded. If possible GC events should be detected and accounted for as their tick progresses. After Player A's tick ends the memory usage statistics are recorded and compared.

     

    Players running on a server instance would have their memory usage compared and players who do not generate as much memory, but who experience GC events would be given some amount of CPU in their bucket. This would be similar to the CPU bucket credit granted during an instance reset.

    The objective is to simplify server development while making the system relatively fair for competitive players: treat CPU lost to GC like a billing problem not like a real-time software problem.


  • Culture

    The issue here is it depends on knowing when GC events happen, and from my understanding that information isn't exposed by V8 in a reasonable way.



  • Yahoo Inc has open sourced the monitr NPM package which adds this capability to node.

    Implementation: https://github.com/yahoo/monitr/blob/master/src/monitor.cc

     


  • Culture

    It would be interesting if that module could be used to detect when hard resets are caused by the GC and prevent the penalty from being applied.


  • YP

    I'm not sure if its feasible to compare memory allocations between players. I guess a huge amount of objects created is not really in control of the player .. If you compare a GCL 25 with a GCL 2 player there is a huge amount of objects created just because of roads, walls and stuff like that.  On the other side stuff stored in a few big strings might have bigger memory usage but should not really take long time to clean up.

     

    I think it  would be interesting to have more statistics available, maybe to optimize the own code... 😉



  • This seems to be a good suggestion. I just read the README at https://github.com/yahoo/monitr and it seems to be just the sort of thing that is needed in the game server to equalize the inherent issues of a non-real-time language.


  • Dev Team

    I’ve checked this module and this phrase is what worries me:

    Spawns a thread and monitors the process. Writes process stats every second to the socket path.

    So we only have info with 1 second granularity which is obviously not enough to track player scripts runs.



  • Whoops, looks like that was a wild goose chase.

    I spent a little more time on it and found some v8 accessor methods to get heap statistics. https://nodejs.org/api/v8.html

     

    I have made no attempt to test how these perform tho. In Java these features are all just built in to the runtime. 😛


  • Dev Team

    I don't think heap usage statistics can be used to detect GC runs reliably. And we cannot tell how long the GC run has taken in milliseconds based on this metric alone.