Heap Problems are back with 50 Claimed Rooms

Totalschaden

Heap Problems are back with 50 Rooms on Shard1. Can we finally get a more fair System regarding Heap ? A Slider like for CPU per Shard or Heap per Claimed Rooms on the Shard ?! I dont want to move any Rooms to another Shard!!!

Staying on the same Shard is currently a massive drawback.

alt text

deft-code

I have a patch that could help if @o4kapuk is ready for the PR.

Currently the spatial index for each room is 2500 array. Swapping that index for a Map (or another sparse index) should help a lot since most rooms are far from full.

o4kapuk

@deft-code I'm ready to test the PR, please ping me with the link.

o4kapuk

But I must say that any solution that implies optimization or heap increase can only be temporary. It's always just a matter of time when Totalschaden (or AzuraStar, or anybody) overgrows everything in the realm of reasonable possibilities then comes back asking for more heap. We'll discuss what we can do to solve it permanently somehow.

deft-code

@o4kapuk Here is a minimal change. https://github.com/screeps/engine/pull/110.

I think style wise using a Map would be better but a sparse array should also avoid the junk memory.

Roughly I expect this to save about 100KiB per room per tick. In @Totalschaden's 50 room case that would be 5MiB.

The savings should compound since the the entire Game object is junked every tick, but only garbage collected every few ticks.

Xenofix

Every bot should just get more memory instead of optimizing for less memory consumption at the cost of cpu cycles.

deft-code

My fix doesn't cost CPU. It simply avoids wasting memory that was going to be unused anyway.

Another tradeoff that might help is Lazy calculation of the the FIND and LOOK registers. Construct each Room object with a list of unindexed room objects, then create the indices as needed. That change potentially decreases CPU and RAM. However I think most players run at least a Room.find over every room every tick anyways, so the FIND registers saving are ephemeral. But I think it might be a net win for the LOOK registers.

Xenofix

Ok, after looking up some details. new Array(2500) would also create a sparse array. So it is no difference. Array.apply(null, Array(2500)) would create an efficient dense array. However, I think a sparse array was wanted here, because this spatial is used for look and you already said, it's usually far from full. I've read about sparse arrays and tested it: new Array(0x7FFFFFFF) should use up all your heap, but it doesn't. That means, I don't expect a positive impact by changing it to [].

I've spend some time now looking at the engines code for that lazy look/find-idea: What I see in code looks like look and lookFor are fully precached, it's recreated at every request and uses that spatial array, but it doesn't need to search for anything thanks to that sparse array. See addObjectToRegister. But look for area is a different story, it uses a different cache. find is cached and only for exitTiles lazily evaluated. The findCache is initialized when the Game-Object is constructed. See addObjectToFindCache. So calling find just reads directly from the prefilled cache, it's cached for maximum performance. But I'm sure it's not needed to precompute it for every type of find. Maybe a Map helps for the findCache.... which is already an object. In fact, every object that has roomNames as keys could use Map instead. But I don't know if Map uses less memory than object, I just know that it performs better. But I'm not sure if Map also performs better than a sparse array, after all a sparse array can be optimized for integer keys. However, conclusion: creating spatial on demand might be less performant if it's used on every room, but it's a win for those who rarely look. However, there is at least one thing which can only be seen by look and not by find, like spawning hostile creeps, but also here, look would only be used in some hostile rooms.

I think a big impact could have a compressed terrain data, which is in runtimeData.staticTerrainData. Terrain data is very big and the usage looks like it's an staticTerrainData[id][y*50+x] with values 0,1,2,3 each. I've found staticTerrainData in the driver. It's an Uint8Array. If every tile is compressed to 2 bits in that Uint8Array, then the size for terrain decreases from 2500 bytes + overhead to 625 bytes + overhead per room. That's something. Compressing terrain data will come with a performance cost.

However, I'm really not sure if such optimizations (as well as the packed RoomPosition fix) should be done without profiling how much memory is really used and where optimizations can be done.

Talking about packed RoomPositions... to pack pos of RoomObject could also save a little memory, likely even more than the last change of RoomPosition. An access of the property pos would unpack the RoomPosition... but since RoomPosition itself is already packed, it would effectively just wrap the integer in an instance of RoomPosition.

One last note: deft-code mentioned that lazy initialization of look could also decrease CPU. I think that depends if the cpu time for the creation Game counts to the users cpu or not. Means, if the user doesn't pay cpu for the creation of Game right now, then it's obvious what happens on lazy creation of spatial.

Edit: What about giving bots more memory? Obviously the problem here are temporary objects which exist only during one tick. The memory of ivm is persistent, but it doesn't need to be persistent for these temporary objects. So my idea is, to allow all bots to use more memory during his execution time, but at the end of the bots loop, free up the space used by temporary engine objects, garbage collect and if then the memory still exceeds the current ivm memory limit, then reset the ivm. This way we introduce a hard limit, which is what the ivm can persist, and a soft limit which is what the ivm can use while active.

I looked up the isolated-vm from laverdet. If more than 100% of memory is used, then that's already possible for a very short amount of time. So you would install the soft limit and hard limit here, so that it's just checked for the soft limit during execution and there is one explicit check for the hard limit after the temporary objects are released. Maybe that's also possible without changing the code of ivm, if the memory limit can be changed on the fly. Then you can set the higher soft limit before Game is initialized and the lower hard limit after Game is destructed.

deft-code

Sparse array means two things. One a usage pattern. The current spatial index already uses the sparse array usage pattern. The second is a common implementation detail. The key difference in my change that new Array(2500) creates an array with 2500 slots filled with undefined. My patch skips that pre-allocation. The behavior is unchanged all indices are undefined until assigned to. I have also read some docs that suggest Map might be faster than a sparse array (the optimizer can make more assumptions with a Map and therefore make faster code).

As for lazy init. I was referring to how the runner builds the spatial index at the start of every tick for each room. With zero overhead we could lazily create that index for each kind of LOOK_* the first time it is need on a room. This reduces CPU because we avoid wasting CPU on spatial indices that will never be queried. HOWEVER, the little profiling I've done didn't show that the construction of Game was a huge CPU drain. My procedure was to grab Game.cpu.getUsed() at the top of my loop, access Memory and grab it again. I should repeat that profile, but when I did it with 10 rooms my Memory was <200k and cost twice as much as the Game construction.

Xenofix

new Array(2500) does not create a dense array with 2500 prefilled slots. new Array(2500) just creates a sparse array with a fixed length. But that doesn't mean a dense array is used internally. Maybe that's an implementation detail which differs with different engines, but in V8 it is a sparse array. It will be changed into a dense array if it has no holes, but spatial always has holes. So you can change it to [], but I believe it wouldn't help, because the array is already sparse. spatial and the findCache could be replaced by Map, that might help a little. The elements of findCache can also be replaced by Map, but I'm not sure if it helps to replace the elements of spatial with Map, but we could have a try.

You can't create the indices for look and find with zero overhead. Game is not created per room, it's created per player. The loop iterates over each object once and fills all the caches for all rooms at once. If you create them lazily, then you must iterate over all objects for each requested room. This is better in case of very rare uses of look, but it's worse if you use look in several rooms. So this problem must be fixed. One possible way to fix it is, to group the objects by room and use these grouped objects to create spatial lazily, maybe register.objectsByRoom is already such a structure. If look is used much, then this would need just a little more cpu, but if rarely used, then overall memory and engine cpu time is reduced.

One question still is (and I didn't find it in code, yet), if Game.cpu.getUsed() === 0 starts after Game is created or before Game is created. I suspect it's after Game is created. Because my bot needs at minimum only 0,03 ms to the beginning of loop. It's very unlikely that Game is created in just 0,03 ms with JavaScript for 23 rooms. That means, if parts of the cache will be created lazily, then they will drain the users cpu time instead of the engines cpu time. So that change would increase cpu consumption of all players who use any kind of look and they will start complaining about this change again. In other words: Such a change helps few and punishes many. Keeping that in mind, it would be very necessary to count how much cpu is needed to initialize the spatial cache lazily and subtract that time from the users cpu time, so that it comes with almost zero cost for the user again, just like now with pre-initialized spatial which is initialized outside the users cpu time.

And don't forget the other options which I mentioned before. Compressing terrain data can save memory at the cost of little cpu, compressing positions of RoomObjects can save memory at the cost of little cpu and allowing much temporary memory for each bot could even be a long term fix for the problem with no cpu cost. Maybe it's an option for huge players to switch back to the legacy model, so that they can access more memory.

deft-code

The documentation for Array implies that the slots are pre-allocated. The documentation agrees with you that the array is not prefilled.

When I fire up the javascript console in Chrome. I can see that new Array(1024*1024) increases the heap dump by a little more than 8 MiB.

However the documentation is interpreted it's clear the V8 engine allocates 8 bytes per item.

Xenofix

That documentation says JavaScript arrays are not guaranteed to be dense; this depends on how the programmer chooses to use them.

I've tried it in screeps: {let x = new Array(1000000 * 4000);x[1000000 * 4000 - 1]=123;console.log(x.length);} works, but I shouldn't be able to reserve 32 GB of ram and use some of it. So it can only be a sparse array. I've tried the same in my Chrome console and it also works. It just fails if I enter new Array(big_number); because then the array is stringified and that string exhausts my memory.

Xenofix

Summary of some memory optimizations:

Replace spatial and findCache by Map instead of {}. Likely increases performance, impact on memory consumption is unknown. Easy to implement.
Replace elements of spatial: new Array(2500) by Map. Maybe increases performance, impact on memory consumption is unknown. Easy to implement.
Lazy creation of spatial and subtracting the cpu cost from user cpu time. Can decrease engine cpu comsumption for bots which don't depend on look. Decreases memory consumption.
Compress terrain data from 2500 bytes down to 625 bytes. Decreases memory consumption, may increase cpu cost a little, breaking change for Room.Terrain.getRawBuffer, but that's already documented.
Replace pos of all RoomObjects by a property which wraps the new integer field __packedPos. into a RoomPosition. Just like the previous change of RoomPosition this decreases memory consumption a little and adds a little more cpu cost for accessing pos.
Write about memory problems with large empires in the documentation, so that players are prepared that intersharding is a 'must have' for large empires.

Every of these points can be done independently. @o4kapuk Which changes would you support?

deft-code

@Xenofix in V8 new Array(N) only pre-allocates when N < 2^31-1. Try again, with a billion and you should see the memory show up.

Replacing new Array(2500) with a plain array will reduce memory by a little less than 200KiB per room. It's a tiny PR. Worse case scenario is I'm wrong and memory usage doesn't decrease.

Some thoughts on your suggestions.

1 and 2: Change all the sparse arrays to use Map instead looks like a good idea. I don't know if the ROI is worth dev time. I'd hope @o4kapuk would be willing merge in a PR that did the work for them. But you also have remember the current code works. Changing the code at every point that touches an index is a riskier change. There is nothing we can do to make that easier on o4kapuk short of getting screepsplus to run the patch for a few weeks to prove it's stable.

3: I looked into the code after your previous post. The player cpu usage doesn't account for Game initialization, (at least in the open source screeps driver). That makes this change much trickier from the PR perspective. It would reduce overall CPU usage but would increase individual player cpu usage.

4: I don't think compressing the terrain buffer will help much. Due to how it's set up the terrain buffer ram is shared across the entire runner. The shared ram isn't accounted to the per player ram limit.

5: I suspect any gains from not storing a RoomPosition object will get washed out by recreating one every time a player accesses it.

6: Maybe not official documentation (it only affects a very small subset of players) but a statement from the devs about the expected room count would probably be a good idea. If I were them I'd just say up to 45 rooms will work, more than than and you might need to get creative with memory.

TuN9aN0

I am running 50 claims with ~205 rooms vision, my heap jumps up around 100~150mb each tick. With 608mb and coming in at around 250mb on the first tick it has barely any slack for garbage collection. It basically flip-flops from tick to tick between 420mb and 550mb.

100~150mb of released heap space each tick is a lot of RAM, dividing this by the 205 rooms I have visible this would mean 600k of memory per room. Not sure if a 2500 element array is significant enough in this regard.. What or why is the server shoving this much data into my IVM each tick, can't it hold on to some of this stuff at least (assuming it's largely the same data every tick anyhow, especially if it's terrain data that is the main contributor)?

Totalschaden

Running 52 Rooms now, cannot use full CPU anymore due to Heap Limitations (cant setup new Remotes). Disabled Observer rescans yesterday to get some more tolerance, not helping as much as I was hoping though.

Totalschaden

So any reason we cant have a Slider for Heap per Shard ? Why do Players who spread on all shards get combined 4x the Heap I get ?

Xenofix

No, that's the very wrong direction. Memory should not be another limiting factor. We are not writing code in C++ here where we have full control of the memory. If it's just limited because of limitations of the node servers then that's pity for you but that's not a reason to make memory a limiting factor for all the other players!

Btw. intersharding is more complex than staying in one shard, it also adds more overhead for each shard. The extra memory can be seen as an incentive for intersharding. Maybe you take the challenge? What about this idea: Every player gets 20+GCL*10 cpu without a cap, but can only assign max. 300 cpu to one shard. Would that be incentive enough to go intershard? Then players wouldn't hit the memory limit as they strive for intersharding empires to receive full cpu.

One last thing I would like to know from you Totalschaden: How much code can be executed per tick? Is the first line of your main.js executed? Is the first line in your loop executed? How much memory is used before you initialize any modules at the very first lines of your main.js and what's the memory consumption there? That first line should just be executed once per vm reset and the memory consumption there, before you initialize anything shows how much the engine really takes.

wtfrank

@totalschaden If there were a slider to allocate heap between shards, it would presumably cap out at the current heap limit for obvious reasons

TuN9aN0

@xenofix I think you miss the point that memory already is a limiting factor such that when you keep claiming rooms with your 300 cpu eventually a significant portion of it gets eaten by garbage collection. It's just the Game objects graphs the game throws up that does not fit in the heap anymore, or it stuffs it so much that the garbage collectors enters a more aggressive schedule making it very costly. Partially migrating to another shard creates more heap slack and restores the garbage collector to an easier more CPU friendly schedule.

A user on 3 shards gets 1800MB of RAM for his code. I only get 600mb cause I stay on one. This is where I agree with Totalschaden, this is not entirely fair. What I also don't understand is why someone with say 16 rooms gets the same 608mb where in this case it's a massive over allocation.

A cleaner alternative is perhaps a hard cap on claims, perhaps with a slider per shard. Scale heap according to what is chosen there.

Either way let's wait and see what they come up with, I understand they are changing some things around in the architecture.