Optimizations roadmap


  • Dev Team

    Without going too much into details, the way Cassandra achieves this is because the partition key allows calculating which node(s) are responsible for the given query, and the driver will only ask these nodes. As a simple example (without duplicating data across nodes for fault tolerance), if I have 5 nodes, each of them will contain one fifth of the data, so only one fifth of queries will be handled by it. Thus, throughput load is spread evenly, and adding more nodes helps improving performance.

    This looks pretty much close to how Redis Cluster works. And as far as I understand, Cassandra doesn't have secondary indexes support as well. Does it provide any benefits over Redis then? 

    Effectively, you split the world into shards for the necessary performance benefit, but it’s still a single synchronized game world.

    The fact that it is not synchronized doesn’t make the world non-single, since they are connected through persistent portals.

    Synchronized ticks would mean all shards would have the tick rate of the slowest shard. I don’t think that would make moving attractive. Starting in a empty world with a 5 sec tick.

    Exactly, since the current world will become the first (and the slowest) shard. The idea is to provide better tick rate experience, and making new shards as slow as the current world doesn’t make any sense.


  • Dev Team

    That being said, if we manage to reduce the tick rate of the first shard to some acceptable value due to players moving to another shard, then synchronizing tick rates might become an option.



  • I hate to say it but it sounds more like you don't know how to fix the problem so your goring to throw solutions at it till it sticks. In other words that the entire situation is fundamentally flawed and perhaps a rewrite is needed and not just a "muck with DB settings" approach.  It seems that "sharding" is your rewrite. 

    However I worry that you have not solved the fundamental problem. You hinted at it in one of your last posts. X reads/writes = suck. Sharding may make that less common (cause your doing less read/writes per shard) but the limit still exists. Why not focus on fixing "X".

    Actions don't need to be written to the database all that frequently. They can be stored in memory then flushed to the database. Maybe once every 1000 ticks you can flush to the database. Yes that means a crash is an auto roll back, but just don't crash (gotta love that one). 

    As to world interactions, what does it matter. Rarely in all those read writes are players interaction with more then 1-2 other players. Some maybe 10-20 players. But some kind of status propagation based on visibility could fix that. I mean what really needs to be passed, not much changes per tick "normally". When an attack happens (or when two players creeps are in the same room) more information needs to be passed around, but that's not "often" compared to a creep in it's own rooms.

    In other words I see

    • State info stored in memory most ticks
    • That state shared to people that have visibility to that room and otherwise ignored
    • state is flushed to database  infrequently.

    Now this bounds your hardware and database and what not to number of rooms and not number of players, or what those players are doing. It also removes the "slow part" from the "fast part". If processing isn't a problem and just database-ing is a problem then just don't database.

     

    ALL THAT SAID

    I am very happy that you guys are doing something. At the current rate, you won't have a game in 6 months. These changes are the first glimpse of light at the end of that tunnel. Tick speeds per shard are going to be a huge issue. They need to be "equal" somehow. The screeps world needs to be "one world" somehow. But the fact that your making progress in some direction, is a great thing. It needs to happen. It may cause a "burp" while the player base adjusts, but don't do "something" and you won't have a player base. If your not growing your shrinking. Staying still isn't really an option.


  • Dev Team

    @cotyr We do know how to fix the problem - the world sharding change is the solution much better than the optimizations you propose. If we just optimize things here and there, we can grow the world 10%, 50%, 100% more, and eventually get back to the same issues. With world sharding, we can grow infinitely with the desired performance.

    Regarding the visibility thing - don't forget about observers, globally synced operations like market and terminals, and external APIs fetching game state (we're going to introduce API keys in the future). Getting rid of the persistent database will complicate things to nearly unmanageable state. You basically propose to develop our own distributed database management system, I don't think such a task can be managed by a team of 2 developers.



  • "we do know how to fix the problem - the world sharding change is the solution"

     

    That's kind of my point. Tweaking isn't going to do it. A big change is needed.

     

    As for:

     

    "Regarding the visibility thing - don't forget about observers,"

    I would think that it's rare compared to the number of "in room" read and writes

    "globally synced operations like market and terminals,"

    That can't be taking up that much DB time. Limit it to one market call per tick, Terminals are already limited in such a way with intents. (essentially)

    " external APIs fetching game state (we're going to introduce API keys in the future)."

    Turn it off for now. I know that makes our pretty graphs go away, but that's a fair trade for better tick times/stability. It can be brought back when you flesh out the API key stuff. Lots of games and other companies do that when a secondary part like the "Unofficial" API gets to be to burdensome. If API really is the problem then ditch it. We all know it is "Unofficial" anyway. 

    "Getting rid of the persistent database will complicate things to nearly unmanageable state."

    Maybe, but maybe not. I can't see your mongoDB database engine, but if it's like the opensource one, then you could come up with a way to just not write to the DB. Yes it would add some work, but I'm not sure that it would add that much work. A Pure in memory MongoDB that flushes to disk every 100 ticks or so could be an easy start. 

    "You basically propose to develop our own distributed database management system, I don't think such a task can be managed by a team of 2 developers." 

    IDK, two developers can do quite a bit 🙂  But yes, that is kind of my point. It seems like your going "what can we do in budget (meaning time and money not just money), but what I am saying is that you might not be able to solve the problem "in budget". It may be much bigger then that. To me (not knowing anything internal about the team) It's a big enough problem that all future dev stops on any thing that isn't this problem/solution. For example Stability, Tick rates, sharding is a "Must Have" while GUI, new clients, API keys, etc etc. all become "Nice to Haves".

     

    But as I said in my last post, it doesn't matter. The fact that you guys are going down a path, regardless of the path, is light at the end of a tunnel. It may be a long, twisty, narrow, tunnel, but look there's hope, and I think that's the important take away. "It's being worked on"



  • > This looks pretty much close to how Redis Cluster works. And as far as I understand, Cassandra doesn't have secondary indexes support as well. Does it provide any benefits over Redis then?

    Cassandra does have support for secondary indexes, but using them has a drawback: as secondary indexes are local, queries always have to involve all nodes (see https://pantheon.io/blog/cassandra-scale-problem-secondary-indexes for background), whereas for regular tables (and even materialized views) queries are directed to only a part of your cluster nodes. To my developers and customers, as an alternative I usually recommend using materialized views and/or specialized "lookup tables" which redundantly store data with primary keys optimized for the respective queries. This approach yields best performance for load profiles where data is read more often than it is written (which I guess may be the case for you).

    I'm not too familiar with Redis Cluster. But from what I read (http://bigdataconsultants.blogspot.de/2013/12/difference-between-cassandra-and-redis.html and https://www.quora.com/Which-is-better-Redis-cluster-or-Cassandra) Redis uses a master slave architecture, whereas Cassandra nodes are all equal. I find the latter approach superior as it allows spreading not only read loads, but also writes.


  • YP

    >> " external APIs fetching game state (we're going to introduce API keys in the future)."

    > Turn it off for now. I know that makes our pretty graphs go away, but that's a fair trade for better tick times/stability. It can be brought back when you flesh out the API key stuff. Lots of games and other companies do that when a secondary part like the "Unofficial" API gets to be to burdensome. If API really is the problem then ditch it. We all know it is "Unofficial" anyway.

    @coteyr: yeah.. it's the "unofficial" api because it is the api the game client uses to communicate with the servers... so it's unofficial to use it for other purposes.

    do you really think it's a good idea to turn the api off that is used by the game client to communicate with the servers? how do you want to play the game?

    I think the game will get really boring if the code only runs inside the server and no one can see the gamestate anymore because they removed all external access.



  • @W4rl0ck: It's very easy to limit API access to clients only if you control the clients. Hell, it's a browser based game, with a great community. Set a cookie, or better yet a header, have the web server look for it and tell the player base to back off.  It doesn't need to be super secure, hell, if we were told to stop using it outside the clients most everyone would comply. 


  • YP

    @coteyr: that's not the point. the client is getting the gamestate out of the database. If you don't update the database there is nothing the client can show. you are suggesting to turn of the api that is needed by the client... you just suggest stuff without thinking or understanding how stuff works.

    It doesnt matter if it's the client that is acessing the data or a script that creates stats. web requests can get cached and is already rate limited to prevent problems ... web access is not the problem the server has.


  • Culture

    > @W4rl0ck: It's very easy to limit API access to clients only if you control the clients. Hell, it's a browser based game, with a great community. Set a cookie, or better yet a header, have the web server look for it and tell the player base to back off.  It doesn't need to be super secure, hell, if we were told to stop using it outside the clients most everyone would comply. 

    Honestly I'd prefer they just asked us to rate limit the requests. It would be easy enough to add that into the client API code we wrote so all projects could start using it.

    That being said I'm curious how much third party usage is really causing issues-

    * The League website only updates every 6 hours, and it already has rate limiting built in to prevent it from hammering the API (this is why the upgrade takes more than 20 minutes). All it's doing is reading- there are no writes occurring. 

    * The "screeps-stats" project is set to buffer and rate limit as well. Every 10 seconds or so it reads a point in memory, optionally reads a point from segments, and then it writes back a single console command. There's a secondary system which also reads market orders every few minutes, which in turn also does a room lookup (to see who owns the room- similar to the lookup that happens when you browse the map)- this happens once per transaction at most though.

    * The "screeps-console" project mostly just reads the websocket. I'm not sure how much load this puts on things but if it would help I could look into setting a timer to kill the socket when the console was idle for too long- this was something I was thinking about doing anyways.

    Outside of all that I can't imagine that the third party tools are putting a larger load on the system than the game client itself is. That being said I also can't see what intel tools each alliance has built for themselves, just the ones the culture has, so while I know we're being smart about ratelimiting and using external resources (like the league's "rooms.js" file) I'm not sure others are. So if it really is worthwhile to put in some rate limiting on the node and python clients (since i'm assuming they're the most used ones) just let us know what you think reasonable numbers would be.


  • Culture

    Another note, the API is already rate limited, we can only hit it upto a few times per second before it heavily rate limits, websocket shouldn't be that much load, its using redis pubsub in the background so shouldn't even be touching the database, many things such as Memory are in redis already, so doesn't touch the database, there is still API calls that hit it, but overall our third party code shouldn't be hammering the database near enough to be an issue.


  • Culture

    On another point, I think you are all grossly underestimating the appeal of a second shard. I do not think that tick times should be the way to drive people to the new shard, as I think there are a ton of other reasons why players would be smart to use both shards.

    1. New players will use the less densely populated shard.

    Seriously, the new player experience sucks. If you spawn next to one of the "undiplomatic" alliances you are either going to eventually get absorbed or killed. If you spawn next to one of the alliances that's more friendly you're still going to either have to join up or limit your expansion space, and as a single player you're going to get squashed if you try and attack. Now imagine you've got five friends who want to join, possible as an alliance- where are you going to plop down?

    As territory has solidified there's less room for new players. Expanding the edges works, but doesn't scale. I love the idea of scaling "up" (on the Z axis, with rooms stacked on top of each other) as a way to deal with this. ALso, if we're totally honest with ourselves here even if the novice and spawn zones keep showing up on the old shard new players are more likely to go to the less densely packed area anyways.

    2. Established players will use the shard for defense.

    If I'm a player who has rooms on the "ground floor" (the existing shard), and I know that only a small amount of players have code to move between shards, I would be absolutely stupid not to have at least one backup room positions above my main world rooms. If someone attacks me and wipes out all of my rooms on the main world, but has no ability to go to the second world to take out my other rooms, then I can simply continue to send upgraders and room builders in to retake my lost rooms. It would be absolutely silly for established players not to take advantage of that.

    3. Established players will need to establish presence for strategic purposes.

    Why send troops through the main shard to attack another player when you can shove them up a level, have them walk over to the rooms and bypass creeps and observers in order to mount a surprise attack? To do this you're going to need pathfinding on both worlds, and to do that properly you'll want to have creeps and possible observers in the other shard.

     

    4. Smaller alliances may just up and move.

    If you open up enough space some of the smaller alliances- who have shown themselves willing to mass respawn together in the past- might just move upwards to avoid having to fight for space in an already crowded world.

    5. Open territory is valuable enough to drive people.

    Finally, I really feel that simply having the territory available will make it get used. if your options are to engage in a two week fight over rooms, or expand upwards for basically free, then you're going to expand upwards.

     

    For these reasons I find the whole argument about needing a smaller tick size to motivate people as premature. There are a ton of reasons why people would use the second shard and I think tick times are pretty down on the list. The tick time issue is also the largest issue that people seem to be upset about with this shard idea, so if you eliminate that difference I think this whole thing will go much more smoothly.


  • Dev Team

    Lower tick times isn't just the motivation for people to move. It's the motivation and the entire purpose for implementing world sharding at all. If we're fine with 5-second tick rate, then the world sharding system is not needed, and we can switch to another tasks like power creeps.


  • Culture

    > Lower tick times isn't just the motivation for people to move. It's the motivation and the entire purpose for implementing world sharding at all. If we're fine with 5-second tick rate, then the world sharding system is not needed, and we can switch to another tasks like power creeps.

    This statement makes me feel like you're not actually reading what people are saying here, or at least are not understanding it. Obviously we want lower tick times. But not at the expense of having two unequal worlds. 

    If you make a new shard, lock it to the same speed as the existing shard, AND PEOPLE STILL MOVE OVER TO THE NEW SHARD, then tick times for both shards should drop. What I am trying to say is you don't need to use tick times being faster on the new shard as motivation to move people over, as there are already a ton of things that would motivate people to do so (I even made a list above).

    This seems like the best of both worlds. You get multiple shards, and people migrate to the new ones which reducing load on the first shard. That in turn reduces tick times for the whole system, but doesn't cause people to get upset that one shard is "better" than the other. It also doesn't introduce all the strategic garbage that two separate tick times would bring (ie, i can use my spawns on the second shard to build troops twice as fast as the people on the first shard).

    So, just to be clear-

    * People want faster ticks.

    * People do not want different shards to have different tick rates.

    * People will migrate over to the new shard regardless of its tick rate.

    * The primary shard will have it's tick rate increase as people offload CPU to the second shard.


  • Dev Team

    @tedivm, I’m quoting your earlier post here:

    New players will naturally spawn to the new server but older players (the ones who have supported the game the most) have established themselves and many of them will not make likely make the jump. I don’t think the shards are likely to balance out soon.

    It’s still an open question whether the number of established players willing to respawn on the new shard is enough to reduce the load of the old one. In order to significantly improve performance they need to not only create colonies, but to respawn completely. Your reasons are fine, but they may not convince many established players to do so.

    And in the worst case we get two shards at terrible game speed instead of one.


  • YP

    I'm not against shards with different tick times... I think there could also be seasonal shards with a predefined life span like 4 month with separate ranking pages or shard with other changed constants or rules or stuff like that.

    Instead of just saying tick times must be eveywhere the same I think it's more important to think about what would be the problems with shorter tick rates on other shards. It's hard to say without more details how it would be implemented.


  • Culture

    I don't believe players will "jump" to the new server- I have no intentions of respawning- but with the right game play mechanics in place they would most certainly move over part of their code base.

    I will admit though that the more I thought about sharding as a game mechanic the cooler it sounded, so my earlier concerns have died down a bit compared to the tick rate concerns.

     

    > It’s still an open question whether the number of established players willing to respawn on the new shard is enough to reduce the load of the old one. In order to significantly improve performance they need to not only create colonies, but to respawn completely. Your reasons are fine, but they may not convince many established players to do so.

    So what you're saying is that you're hoping the tick times themselves will be enough to push people over. My point is I don't believe it's necessary. 

     

    > And in the worst case we get two shards at terrible game speed instead of one.

    The worst case scenario from the players perspective is that all of the loyal players who have been supporting this game for years are going to get screwed over in favor of new players on the new shard, with their only main option being to respawn on the new server. To be that's a much worse scenario that having two shards with lower tick rates.

    You're also ignoring a big thing here, which is that even if the tick times don't increase immediately they should stop slowing down. If we're stuck living with 4.5 second ticks, well, we've been living with that for months now. At least we don't have to deal with 8 or 10 second ticks, and once the shards balance out more (which will happen due to the factors I brought up) the tick times should start going down.

     

    If you look through all the posts the biggest complaint about sharding is the different tick rates. If you had said the tick rates were the same from the start I don't think there would be many real issues at all here with this mechanic. Why not just start with this mechanic to begin with, and if people don't end up moving consider changing the tick rates after?

     


  • YP

    I have looked through the posts and only found 3 or 4 players who say tick rates have to be synchronized... some multiple times but I wouldn't call it "all the people" by now. Some have concerns about how the top lists would work for example, but that is a question that could be solved. 

    I don't see a reason why new players,  that want to start in a new shard, should have to live with 5 second tick rates.. the tick rates will probably synchronize when the shard population converges. I think the current tick rate is bad for "old" players.. but it is even worse for new players that only have a single room to watch and work.


  • Culture

    I personally am not concerned about the differing tick rates, as as artem and others have pointed out, they will start converging at some point on their own. At this point I think sharding is a good solution, maybe in the future a way to recombine them may be developed. 

    Is sharding itself going to be available on private servers? I can completely see people creating a cluster of small low-cost servers and running a larger PS on them.

    Any idea what the max world size of the shards are going to be? I personally would like to see them smaller than the existing world, maybe 50x50 or 100x100 at most.


  • SUN

    I'm not worried about the different tick rates either; I feel every problem that has been identified can plausibly be solved satisfactorily without kneecapping the tick rates of new shards.

     

    Even the comparative increase in spawn rates could solved by rate limiting the portals somehow.