IVM heap usage & game objects
-
I had a bit of an empire collapse due to a big war. I've recovered a bit now, and recently added some code that leads to quite a bit more room visibilty (checking for power banks pre-observer with static creeps).
This has lead to my heap hitting the memory limit repeatedly once more. It seems that game objects are still a major issue when it comes to heap.
@stuy486 mentioned in slack that maybe this is due to JS GC, and maybe could be tweaked. He said:
Now armed with 15 minutes of reading on Node GC and nothing else... I wonder if, for the Screeps use case, the new-space heap size that's being used isn't large enough? This would cause a bunch of tick-only aged stuff to get moved to old-space, which might be the reason we see the heap size step up like that... This would affect larger players much more since the amount of heap used to generate
Game
is much higher.I do hope this is being looked at. I don't feel we should be penalized by resets/skipped ticks just because we can see a lot of game objects. Rooms visibily seems to be often the culprit.
-
I'm not sure about the new-space old-space thing, as I mentioned in my comment, I just read a quick post about it. Seems like something that should be investigated if it hasn't already been.
My current experience is that I'm getting 5-10
CPU time limit reached
messages an hour, and they are very clearly occurring when GC hits in the middle of my tick. For a period of about a week ending a couple days ago, it seemed that the 500 CPU/tick limit was relaxed. During that time, I was never getting errors, but was frequently seeing ticks that would take 700-800CPU. Given that I short circuit my code at 350CPU and average ~215CPU, this means that GC is taking 400-500CPU when it hits my code mid-tick.I suspect these heap/GC issues have always been around and now they're just more obvious with IVM and the addition of
get_heap_statistics()
. I also suspect that if we're able to figure out what's going on, the servers as a whole will really benefit. Similar to @demawi, I see my used heap climb ~70MB ever tick before my code even runs. It only increases 3-5MB while my code is running.
-
I have more problems with this: (don't know if it's also a heap problem?)
[00:06:38][shard0]Script execution timed out ungracefully [00:06:43][shard0]Script execution timed out ungracefully [00:06:48][shard0]Script execution timed out ungracefully [00:06:53][shard0]Script execution timed out ungracefully [00:06:58][shard0]Script execution timed out ungracefully [00:07:03][shard0]Script execution timed out ungracefully [00:07:07][shard0]Script execution timed out ungracefully [00:07:12][shard0]Script execution timed out ungracefully
currently... each tick without getting to my code.. I already had such a phase a few days ago.where it lasted over 200 ticks..
-
Since wednsday it went back to previously described heap problems. And he doesn't stabilize any more. Currently every 3 ticks I have to call gc myself, but still getting "Isolate was disposed during execution due to memory limit"-errors regularily. Frustrating...
with an empty main-file:
[20:00:57][shard0]{"total_heap_size":140148736,"total_heap_size_executable":3670016,"total_physical_size":131834272,"total_available_size":252732968,"used_heap_size":115534184,"heap_size_limit":380633088,"malloced_memory":8192,"peak_malloced_memory":3157248,"does_zap_garbage":0,"externally_allocated_size":104036698} [20:01:01][shard0]{"total_heap_size":234536960,"total_heap_size_executable":3670016,"total_physical_size":225970904,"total_available_size":175003304,"used_heap_size":195191992,"heap_size_limit":380633088,"malloced_memory":8192,"peak_malloced_memory":7507824,"does_zap_garbage":0,"externally_allocated_size":104040970} [20:01:06][shard0]{"total_heap_size":323624960,"total_heap_size_executable":3670016,"total_physical_size":322073392,"total_available_size":76740336,"used_heap_size":291719864,"heap_size_limit":380633088,"malloced_memory":8192,"peak_malloced_memory":7507824,"does_zap_garbage":0,"externally_allocated_size":104047668} [20:01:10][shard0]{"total_heap_size":206999552,"total_heap_size_executable":3670016,"total_physical_size":202776616,"total_available_size":182927992,"used_heap_size":192229592,"heap_size_limit":380633088,"malloced_memory":8192,"peak_malloced_memory":7507824,"does_zap_garbage":0,"externally_allocated_size":104040970} [20:01:14][shard0]{"total_heap_size":314101760,"total_heap_size_executable":3670016,"total_physical_size":303882056,"total_available_size":87456760,"used_heap_size":280090032,"heap_size_limit":380633088,"malloced_memory":8192,"peak_malloced_memory":7507824,"does_zap_garbage":0,"externally_allocated_size":104045242} [20:01:19][shard0]{"total_heap_size":205426688,"total_heap_size_executable":3670016,"total_physical_size":201582624,"total_available_size":182969808,"used_heap_size":192228672,"heap_size_limit":380633088,"malloced_memory":8192,"peak_malloced_memory":7507824,"does_zap_garbage":0,"externally_allocated_size":104040970} [20:01:23][shard0]{"total_heap_size":308334592,"total_heap_size_executable":3670016,"total_physical_size":301953816,"total_available_size":89031408,"used_heap_size":279926960,"heap_size_limit":380633088,"malloced_memory":8192,"peak_malloced_memory":7507824,"does_zap_garbage":0,"externally_allocated_size":104045242} [20:01:27][shard0]{"total_heap_size":205950976,"total_heap_size_executable":3670016,"total_physical_size":201367544,"total_available_size":182943552,"used_heap_size":192254952,"heap_size_limit":380633088,"malloced_memory":8192,"peak_malloced_memory":7507824,"does_zap_garbage":0,"externally_allocated_size":104040970} [20:01:31][shard0]{"total_heap_size":312528896,"total_heap_size_executable":3670016,"total_physical_size":302439016,"total_available_size":88842216,"used_heap_size":280037088,"heap_size_limit":380633088,"malloced_memory":8192,"peak_malloced_memory":7507824,"does_zap_garbage":0,"externally_allocated_size":104045242} [20:01:36][shard0]{"total_heap_size":206475264,"total_heap_size_executable":3670016,"total_physical_size":202223896,"total_available_size":182919864,"used_heap_size":192277880,"heap_size_limit":380633088,"malloced_memory":8192,"peak_malloced_memory":7507824,"does_zap_garbage":0,"externally_allocated_size":104040970} [20:01:40][shard0]{"total_heap_size":312528896,"total_heap_size_executable":3670016,"total_physical_size":302239240,"total_available_size":86718288,"used_heap_size":280431392,"heap_size_limit":380633088,"malloced_memory":8192,"peak_malloced_memory":7507824,"does_zap_garbage":0,"externally_allocated_size":104045242} [20:01:45][shard0]{"total_heap_size":208048128,"total_heap_size_executable":3670016,"total_physical_size":203117480,"total_available_size":182891712,"used_heap_size":192277784,"heap_size_limit":380633088,"malloced_memory":8192,"peak_malloced_memory":7507824,"does_zap_garbage":0,"externally_allocated_size":104040970} [20:01:49][shard0]{"total_heap_size":310681600,"total_heap_size_executable":3670016,"total_physical_size":301337400,"total_available_size":85588216,"used_heap_size":280931136,"heap_size_limit":380633088,"malloced_memory":8192,"peak_malloced_memory":7507824,"does_zap_garbage":0,"externally_allocated_size":104045242}
-
Me again!
It's better than it was, but I'm still seeing heap limits hit when I get too much room visibility. Up to 200 rooms visibile before it starts crapping out now.
I still feel fairly strongly that we shouldn't be seeing errors as users just because we can see too much stuff. It's a reallt weird game mechanic.
-
I've been moving rooms from shard0 to shard1/2 to reduce heap size, and have a similar experience as Tigga visible rooms was close to 200. I've now reduced it to below 180, and have no longer problem with heap size. It is probably a combination of your improvements and my room-reduction.
I'll add graph of global age for the last month, where most resets the last 10 days have been code uploads.
-
as long as you're tweaking the server and those heap spikes are a regularily problem, it would be nice if you could spend more heap memory (in general, gcl-dependent, or per claimed room don't know what's possilble).. so nobody has to worry about being thrown out of the game after a server tweak
-
Still having issues at about 200 rooms. It's usually around 202/203 it starts going.
Is there an ETA for a fix? I heard rumours persistent game objects were planned. It'd be nice to get this sooner rather than later, or maybe a +20% heap allocation while we're waiting.
-
maybe a +20% heap allocation while we're waiting
That would just shift the visible rooms limit to ~240. Do you think the 20% increase solves the issue for you?
-
@artch I guess there's other things going into the equation (such as creeps/structures).
I think it would probably be sufficient until we get persistent game objects. My understanding was that the ETA was that was not too far away. I'm GCL 24 right now, so there's only 20% more CPU I can get.
-
Alright, we've temporarily increased heap limit by 64 MB. We'll keep an eye on our servers, to see if they actually can mantain such memory usage.
-
Just a individual feedback: For me 64MB is not enough to stand the ~100MB spikes. The time before the tweak my memory has magically stabilized, currently he doesn't. I don't have to call gc() myself anymore, to prevent out of memory IVM resets. But intick-autoGC triggers regularly and consumes a significant amount of cpu. Before I was at ~265cpu/tick in average, currently it's over 300.
-
@demawi Please elaborate what tweak are you referring to?
-
I meant the increase of the heap limit by 64MB (I have no knowledge about other changes today)
-
My heap is a lot more unstable after today. It reached thousands of ticks on the same global (purple line) before, but now its resetting a lot more often now.
-
@geir1983 said in IVM heap usage & game objects:
My heap is a lot more unstable after today. It reached thousands of ticks on the same global (purple line) before, but now its resetting a lot more often now.
I wonder what's going on there. The change was made at 12:30, right? I didn't see the 9:30 change on my plots, but I think you're on a different shard. It's the 9:30 change that seems to have messed something up for you.
-
This post is deleted!
-
Just wait a day or two, it should calibrate itself after some time and become more stable.
-
My heap is still very unstable since the change 5 days ago. My CPU not really taking a hit, as my global is not completely reset, but somehow the heap is not stable for me anymore. Before this change my heap was stable and could reach thousands (20k+) of ticks between global resets. Edit: Im now reaching around 500 ticks before my global is reset.
I do have vision in about 230 rooms, 30 owned room and about 1000 creeps if this is related.
-
@artch said in IVM heap usage & game objects:
Just wait a day or two, it should calibrate itself after some time and become more stable.
So my heap is getting worse and worse instead. Right now my globals last an average of 100 ticks, down from 20k+ when the size limit was increased. I am pretty sure i did not introduce any new caching etc on my end. This is now resulting in error messages and loss of ticks. I have attempted to disable my path caching without any noticable effect. My heap usage (total_heap_size + externally_allocated_size / heap_size_limit) jumps between 60%-90% every other tick.
[8:31:40 AM][shard1]Error: Isolate was disposed during execution due to memory limit at () at runUser (/opt/engine/dist/core/runtime/make.js:164:34) at _tickDomainCallback (internal/process/next_tick.js:228:7) [8:31:43 AM][shard1]Script execution has been terminated: your isolate disposed unexpectedly, restarting virtual machine