Heap Problems are back with 50 Claimed Rooms
-
Ok, after looking up some details.
new Array(2500)
would also create a sparse array. So it is no difference.Array.apply(null, Array(2500))
would create an efficient dense array. However, I think a sparse array was wanted here, because thisspatial
is used forlook
and you already said, it's usually far from full. I've read about sparse arrays and tested it:new Array(0x7FFFFFFF)
should use up all your heap, but it doesn't. That means, I don't expect a positive impact by changing it to[]
.I've spend some time now looking at the engines code for that lazy
look
/find
-idea: What I see in code looks likelook
andlookFor
are fully precached, it's recreated at every request and uses thatspatial
array, but it doesn't need to search for anything thanks to that sparse array. SeeaddObjectToRegister
. But look for area is a different story, it uses a different cache.find
is cached and only for exitTiles lazily evaluated. ThefindCache
is initialized when theGame
-Object is constructed. SeeaddObjectToFindCache
. So callingfind
just reads directly from the prefilled cache, it's cached for maximum performance. But I'm sure it's not needed to precompute it for every type offind
. Maybe aMap
helps for thefindCache
.... which is already an object. In fact, every object that has roomNames as keys could useMap
instead. But I don't know ifMap
uses less memory than object, I just know that it performs better. But I'm not sure ifMap
also performs better than a sparse array, after all a sparse array can be optimized for integer keys. However, conclusion: creatingspatial
on demand might be less performant if it's used on every room, but it's a win for those who rarelylook
. However, there is at least one thing which can only be seen bylook
and not byfind
, like spawning hostile creeps, but also here,look
would only be used in some hostile rooms.I think a big impact could have a compressed terrain data, which is in
runtimeData.staticTerrainData
. Terrain data is very big and the usage looks like it's anstaticTerrainData[id][y*50+x]
with values 0,1,2,3 each. I've foundstaticTerrainData
in the driver. It's anUint8Array
. If every tile is compressed to 2 bits in thatUint8Array
, then the size for terrain decreases from 2500 bytes + overhead to 625 bytes + overhead per room. That's something. Compressing terrain data will come with a performance cost.However, I'm really not sure if such optimizations (as well as the packed
RoomPosition
fix) should be done without profiling how much memory is really used and where optimizations can be done.Talking about packed RoomPositions... to pack
pos
ofRoomObject
could also save a little memory, likely even more than the last change ofRoomPosition
. An access of the propertypos
would unpack theRoomPosition
... but sinceRoomPosition
itself is already packed, it would effectively just wrap the integer in an instance ofRoomPosition
.One last note: deft-code mentioned that lazy initialization of look could also decrease CPU. I think that depends if the cpu time for the creation
Game
counts to the users cpu or not. Means, if the user doesn't pay cpu for the creation ofGame
right now, then it's obvious what happens on lazy creation ofspatial
.Edit: What about giving bots more memory? Obviously the problem here are temporary objects which exist only during one tick. The memory of ivm is persistent, but it doesn't need to be persistent for these temporary objects. So my idea is, to allow all bots to use more memory during his execution time, but at the end of the bots
loop
, free up the space used by temporary engine objects, garbage collect and if then the memory still exceeds the current ivm memory limit, then reset the ivm. This way we introduce a hard limit, which is what the ivm can persist, and a soft limit which is what the ivm can use while active.I looked up the isolated-vm from laverdet. If more than 100% of memory is used, then that's already possible for a very short amount of time. So you would install the soft limit and hard limit here, so that it's just checked for the soft limit during execution and there is one explicit check for the hard limit after the temporary objects are released. Maybe that's also possible without changing the code of ivm, if the memory limit can be changed on the fly. Then you can set the higher soft limit before
Game
is initialized and the lower hard limit afterGame
is destructed.
-
Sparse array means two things. One a usage pattern. The current spatial index already uses the sparse array usage pattern. The second is a common implementation detail. The key difference in my change that
new Array(2500)
creates an array with 2500 slots filled with undefined. My patch skips that pre-allocation. The behavior is unchanged all indices are undefined until assigned to. I have also read some docs that suggest Map might be faster than a sparse array (the optimizer can make more assumptions with a Map and therefore make faster code).As for lazy init. I was referring to how the runner builds the spatial index at the start of every tick for each room. With zero overhead we could lazily create that index for each kind of LOOK_* the first time it is need on a room. This reduces CPU because we avoid wasting CPU on spatial indices that will never be queried. HOWEVER, the little profiling I've done didn't show that the construction of
Game
was a huge CPU drain. My procedure was to grabGame.cpu.getUsed()
at the top of my loop, accessMemory
and grab it again. I should repeat that profile, but when I did it with 10 rooms my Memory was <200k and cost twice as much as the Game construction.
-
new Array(2500)
does not create a dense array with 2500 prefilled slots.new Array(2500)
just creates a sparse array with a fixedlength
. But that doesn't mean a dense array is used internally. Maybe that's an implementation detail which differs with different engines, but in V8 it is a sparse array. It will be changed into a dense array if it has no holes, butspatial
always has holes. So you can change it to[]
, but I believe it wouldn't help, because the array is already sparse.spatial
and thefindCache
could be replaced byMap
, that might help a little. The elements offindCache
can also be replaced byMap
, but I'm not sure if it helps to replace the elements ofspatial
withMap
, but we could have a try.You can't create the indices for
look
andfind
with zero overhead.Game
is not created per room, it's created per player. The loop iterates over each object once and fills all the caches for all rooms at once. If you create them lazily, then you must iterate over all objects for each requested room. This is better in case of very rare uses oflook
, but it's worse if you uselook
in several rooms. So this problem must be fixed. One possible way to fix it is, to group the objects by room and use these grouped objects to createspatial
lazily, mayberegister.objectsByRoom
is already such a structure. Iflook
is used much, then this would need just a little more cpu, but if rarely used, then overall memory and engine cpu time is reduced.One question still is (and I didn't find it in code, yet), if
Game.cpu.getUsed() === 0
starts afterGame
is created or beforeGame
is created. I suspect it's afterGame
is created. Because my bot needs at minimum only 0,03 ms to the beginning ofloop
. It's very unlikely thatGame
is created in just 0,03 ms with JavaScript for 23 rooms. That means, if parts of the cache will be created lazily, then they will drain the users cpu time instead of the engines cpu time. So that change would increase cpu consumption of all players who use any kind oflook
and they will start complaining about this change again. In other words: Such a change helps few and punishes many. Keeping that in mind, it would be very necessary to count how much cpu is needed to initialize thespatial
cache lazily and subtract that time from the users cpu time, so that it comes with almost zero cost for the user again, just like now with pre-initializedspatial
which is initialized outside the users cpu time.And don't forget the other options which I mentioned before. Compressing terrain data can save memory at the cost of little cpu, compressing positions of RoomObjects can save memory at the cost of little cpu and allowing much temporary memory for each bot could even be a long term fix for the problem with no cpu cost. Maybe it's an option for huge players to switch back to the legacy model, so that they can access more memory.
-
The documentation for Array implies that the slots are pre-allocated. The documentation agrees with you that the array is not prefilled.
When I fire up the javascript console in Chrome. I can see that
new Array(1024*1024)
increases the heap dump by a little more than 8 MiB.However the documentation is interpreted it's clear the V8 engine allocates 8 bytes per item.
-
That documentation says
JavaScript arrays are not guaranteed to be dense; this depends on how the programmer chooses to use them
.I've tried it in screeps:
{let x = new Array(1000000 * 4000);x[1000000 * 4000 - 1]=123;console.log(x.length);}
works, but I shouldn't be able to reserve 32 GB of ram and use some of it. So it can only be a sparse array. I've tried the same in my Chrome console and it also works. It just fails if I enternew Array(big_number);
because then the array is stringified and that string exhausts my memory.
-
Summary of some memory optimizations:
- Replace
spatial
andfindCache
byMap
instead of{}
. Likely increases performance, impact on memory consumption is unknown. Easy to implement. - Replace elements of
spatial
:new Array(2500)
byMap
. Maybe increases performance, impact on memory consumption is unknown. Easy to implement. - Lazy creation of
spatial
and subtracting the cpu cost from user cpu time. Can decrease engine cpu comsumption for bots which don't depend onlook
. Decreases memory consumption. - Compress terrain data from 2500 bytes down to 625 bytes. Decreases memory consumption, may increase cpu cost a little, breaking change for
Room.Terrain.getRawBuffer
, but that's already documented. - Replace
pos
of allRoomObject
s by a property which wraps the new integer field__packedPos
. into aRoomPosition
. Just like the previous change ofRoomPosition
this decreases memory consumption a little and adds a little more cpu cost for accessingpos
. - Write about memory problems with large empires in the documentation, so that players are prepared that intersharding is a 'must have' for large empires.
Every of these points can be done independently. @o4kapuk Which changes would you support?
- Replace
-
@Xenofix in V8
new Array(N)
only pre-allocates when N < 2^31-1. Try again, with a billion and you should see the memory show up.Replacing
new Array(2500)
with a plain array will reduce memory by a little less than 200KiB per room. It's a tiny PR. Worse case scenario is I'm wrong and memory usage doesn't decrease.Some thoughts on your suggestions.
1 and 2: Change all the sparse arrays to use Map instead looks like a good idea. I don't know if the ROI is worth dev time. I'd hope @o4kapuk would be willing merge in a PR that did the work for them. But you also have remember the current code works. Changing the code at every point that touches an index is a riskier change. There is nothing we can do to make that easier on o4kapuk short of getting screepsplus to run the patch for a few weeks to prove it's stable.
3: I looked into the code after your previous post. The player cpu usage doesn't account for
Game
initialization, (at least in the open source screeps driver). That makes this change much trickier from the PR perspective. It would reduce overall CPU usage but would increase individual player cpu usage.4: I don't think compressing the terrain buffer will help much. Due to how it's set up the terrain buffer ram is shared across the entire runner. The shared ram isn't accounted to the per player ram limit.
5: I suspect any gains from not storing a
RoomPosition
object will get washed out by recreating one every time a player accesses it.6: Maybe not official documentation (it only affects a very small subset of players) but a statement from the devs about the expected room count would probably be a good idea. If I were them I'd just say up to 45 rooms will work, more than than and you might need to get creative with memory.
-
I am running 50 claims with ~205 rooms vision, my heap jumps up around 100~150mb each tick. With 608mb and coming in at around 250mb on the first tick it has barely any slack for garbage collection. It basically flip-flops from tick to tick between 420mb and 550mb.
100~150mb of released heap space each tick is a lot of RAM, dividing this by the 205 rooms I have visible this would mean 600k of memory per room. Not sure if a 2500 element array is significant enough in this regard.. What or why is the server shoving this much data into my IVM each tick, can't it hold on to some of this stuff at least (assuming it's largely the same data every tick anyhow, especially if it's terrain data that is the main contributor)?
-
Running 52 Rooms now, cannot use full CPU anymore due to Heap Limitations (cant setup new Remotes). Disabled Observer rescans yesterday to get some more tolerance, not helping as much as I was hoping though.
-
So any reason we cant have a Slider for Heap per Shard ? Why do Players who spread on all shards get combined 4x the Heap I get ?
-
No, that's the very wrong direction. Memory should not be another limiting factor. We are not writing code in C++ here where we have full control of the memory. If it's just limited because of limitations of the node servers then that's pity for you but that's not a reason to make memory a limiting factor for all the other players!
Btw. intersharding is more complex than staying in one shard, it also adds more overhead for each shard. The extra memory can be seen as an incentive for intersharding. Maybe you take the challenge? What about this idea: Every player gets
20+GCL*10
cpu without a cap, but can only assign max. 300 cpu to one shard. Would that be incentive enough to go intershard? Then players wouldn't hit the memory limit as they strive for intersharding empires to receive full cpu.One last thing I would like to know from you Totalschaden: How much code can be executed per tick? Is the first line of your
main.js
executed? Is the first line in yourloop
executed? How much memory is used before you initialize any modules at the very first lines of yourmain.js
and what's the memory consumption there? That first line should just be executed once per vm reset and the memory consumption there, before you initialize anything shows how much the engine really takes.
-
@totalschaden If there were a slider to allocate heap between shards, it would presumably cap out at the current heap limit for obvious reasons
-
@xenofix I think you miss the point that memory already is a limiting factor such that when you keep claiming rooms with your 300 cpu eventually a significant portion of it gets eaten by garbage collection. It's just the Game objects graphs the game throws up that does not fit in the heap anymore, or it stuffs it so much that the garbage collectors enters a more aggressive schedule making it very costly. Partially migrating to another shard creates more heap slack and restores the garbage collector to an easier more CPU friendly schedule.
A user on 3 shards gets 1800MB of RAM for his code. I only get 600mb cause I stay on one. This is where I agree with Totalschaden, this is not entirely fair. What I also don't understand is why someone with say 16 rooms gets the same 608mb where in this case it's a massive over allocation.
A cleaner alternative is perhaps a hard cap on claims, perhaps with a slider per shard. Scale heap according to what is chosen there.
Either way let's wait and see what they come up with, I understand they are changing some things around in the architecture.
-
@xenofix When its in the State where i took the Screenshot, no code is executed at all, its complete stuck.
-
The memory limit is there because of physical restrictions. That has nothing to do with fairness. Everyone can go intershard to prevent memory problems. That is fair. While I agree that it would be optimal to just have enough cheap memory, after all nowadays 8 GB is nothing...., I assume that we won't get it because of economic reasons.
A new architecture can also solve these problems. But I am still and again very much against a stricter memory cap. The original topic was that memory is not enough for 50+ rooms, it was not that other people's memory is too much.
-
Much easier to go Intershard then dealing with the current heap situation. I already hate the fact that there are multiple worlds/shards now, when I started the game it was 1 Shard only game and thats what i loved in the first place and got me into the game.
My current state of codebase should deal with intersharding no problem and if not, not much adjustment would be required. I would greatly prefer to imagine there is only the shard im currently in and ignore the rest of it. I am pretty sure by the time im forced into intersharding, that i would quite the game shortly after.
-
@xenofix That's pretty much a "it doesn't effect me, so I am against it" argument which is not really contributing anything to this thread, sorry. There is nothing economic about it apart from maybe giving me 100mb would mean giving 300mb to some of those that are already multi-sharding. There is also nothing anywhere that states you should at some point go multishard if you want to scale out further. It's just an emergent constraint that was not directly intentional. We are looking to be enabled not to disable anyone else if that is your fear...
Additionally since the game does already limit CPU and
Memory
why would it not also limit heap memory now that has become such an attractive asset with IVM? Of course currently my heap use is just a drop in the ocean compared to the Game objects graphs which is ultimately the issue, the game doesn't support it's own weight.
-
@tun9an0 said in Heap Problems are back with 50 Claimed Rooms:
Additionally since the game does already limit CPU and Memory
FYI: Memory is the same as heap, per shard.
-
@gimmecookies
Memory
doesn't hit the limit from stuff the game puts in there automatically. I am fine with whatever, I just need more heap, core point of it all.
-
@tun9an0 You get me wrong. It will affect me, so I am supporting to get more memory but I'm also against restricting memory more than necessary.
Your argument seems to be: You hit the memory limit with many rooms, so everyone else should also hit the memory limit with less rooms.
I'm against that for obvious reasons.