Auth Tokens

W4rl0ck

100kb/s would be 0.8 MBit/s per user (or 8.2 GB / day) just for stats... how do you think that would be tenable if you want to support that for every active user?

For an even more stark example, the code upload limit should likely be closer to 720 / hour or more, given that the "baseline" is users editing code in the online editor might save every 5 seconds during active development, and we know the infrastructure can support this.

I would really like to see someone coding for an hour with a average save frequency of 5 seconds That's like saving and uploading code every second tick.

If you don't rate limit that authentication mechanism, then the external tooling will just find ways to use it so it can bypass the rate limits.

That would only work if your script solves captchas. And if you do that actively to circumvent limits set by the game I would expect your account to get banned.

tedivm

100kb/s would be 0.8 MBit/s per user (or 8.2 GB / day) just for stats... how do you think that would be tenable if you want to support that for every active user?

That's definitely not the case- it's a worse case scenario that isn't likely. Assuming one segment per tick per shard, and three second ticks with a player spread across three shards, the segment would have to be completely full of completely random data to hit that target. If the data isn't random the compression used by the API would drop the number significantly.

Even without compression the segments are not likely to be completely full- saving that much data (and thus paying for the JSON.stringifycall) would use up a lot of CPU so people have incentive to only store what they are using. I"m a fairly high GCL player who collects a lot of stats and my segments tend to average around 50kb for stats- which turns into 7kb when compressed (which I just tested using real statistics segments).

artch

Alright, now after reading some of the comments here, I'd like to make another clarification.

Tokens' purpose is to regulate automated use of API endpoints. Automated means human-less here. Such use may involve automated stats gathering or some automated actions during long (more than an hour) sessions. This explains such low limits for some endpoints, since they are not supposed to be automated in general.

However, if you use tokens in some third-party client or another software which involves human presence, then rate limiting shouldn't be the case at all, like in the official client. For that purpose we should probably develop a method to reset all tokens timers at any time in the official client. It would look like a "Reset" button in the "Auth Tokens" section with reCAPTCHA attached to it. If you (not your automated software) have faced some rate limit and it blocks you (not your automated software), then you can easily press that button and continue. We can even develop an UI-less page containing that reCAPTCHA that your client can embed in an <iframe> to handle this scenario easily.

Now to specific questions.

artch

@jbyoshi No, private servers won't include this system.

artch

@tedivm

The rate limit on reading segments will really hurt the screeps stats programs, which currently store stats one per tick. This will be even worse for people who are multiple shard. Even at three second ticks users are only going to be able to get less than a third of their statistics with this system. Combine if with the rate limiting on reading memory (only once per minute) and statistics programs are effectively dead.

Not dead, but needing a refactor. You have to collect per-tick stats in a memory segment and flush it once per minute to third-party software. Collecting something every tick is the load profile that we'd like to eliminate with this new system.

I think the rate limiting on uploading code should be 240 per day, rather than 10 per hour. This would result in the same effective rate limit but would allow people to handle debugging a lot easier. I imagine there will be a lot of salt if people upload a bug but can't work around it due to the upload limit.

It's an option, but we have to consider the other side: with the new "Reset" button per-day limits would be easily circumvented by clicking it once a day, rather than once an hour (which is impossible for most human beings).

A new endpoint that allowed us to pull multiple segments at once would alleviate a lot the pain for the stats programs. With this we could grab all the statistic segments in one go, making it so each stat read only cost 1 memory read, 1 segment read, and 1 console call regardless of how many ticks are being processed.

Makes sense, we'll consider.

It would be nice if we could request an exemption, or at at least higher limits, for some third party tools. Specifically speaking I would like to request a higher limit for the League of Automated Nations website and account (which is only used for completely public information). Otherwise it's going to take a pretty massive rewrite (which I will not have time for in January due to work and travel) to get it to fit into the limits.

We might disable CAPTCHA for the specific user, but we have to define the roadmap of when this rewrite is going to be done, we can't allow this exception to stay for good.

artch

@ags131

After looking at these rate limits, I'm not sure thats viable without spreading the requests over several IPs to counteract the rate limiting, which would be a headache to manage.

All rate limits are user-based, not IP-based.

Another impact is currently most users request stats every 15 seconds, these limits effectively reduce that to once per minute when pulling from Memory, making stats useless for monitoring anything other than long averages.

Reducing the pulling interval is our goal here, as explained above. Please consider aggregating the per-tick stats and pulling them at once.

artch

@tedivm

Are there any plans to add additional endpoints in to the token system? Specifically I think it would be useful to add the "my orders" and "wallet" endpoints to the system so that people can still collect stats about them but not have to give out a full access token.

Yes, it's possible.

artch

@ags131

Any chance we can get an endpoint to query a tokens access? Being able to determine what a token can access would be helpful in cases for example, where a user selects an option to pull from segment 5, but has only granted access to segment 10.

Sure, makes sense.

artch

@tedivm Including all GET endpoints might mean a bit more than a normal third-party software needs to know. It will allow to read, for example, user email, subscription details, and other sensitive data. Is it really different from giving out a full access token?

artch

@cgamesplay

The rate limits are intended to reduce demand on Screeps infrastructure.

This.

In this case the limits should very likely be set so high that only problematic scripts would ever trigger them. For example, requesting a memory segment (100 KB) from each shard (3) once per tick (~.3 Hz) would round out to about 100 KB/s of bandwidth. If supporting that is tenable, the limit should be .3Hz (or 1080 / hour).

We also should consider backend CPU overhead (e.g. for gzip compression) and internal LAN overhead for such operations.

artch

Another side project here is how we should rate limit the websockets. This is not implemented here yet, but we need to come up with some solution eventually. One option which is currently being debated is to limit the connection rate itself:

When you connect to a websocket using a token for the first time, you have 1 hour timeout. After the timer is expired, the websocket connection drops.
After that all websocket sessions will drop in 15 seconds with a 60-second reconnect timeout.
You can use the "Reset" button in your account settings to restore the 1-hour timeout again.

tedivm

@tedivm Including all GET endpoints might mean a bit more than a normal third-party software needs to know. It will allow to read, for example, user email, subscription details, and other sensitive data. Is it really different from giving out a full access token?

Take the quorum dashboard as an example. It's a completely read only application- there's nothing in it that lets a user change game state. It would be a perfect use case for a "read only" token. If someone were to hack the system a read only token means they wouldn't be able to affect the game. I do plan on adding messaging to the dashboard.

This obviously isn't critical, but would fall under the "nice to have" category.

Not dead, but needing a refactor. You have to collect per-tick stats in a memory segment and flush it once per minute to third-party software. Collecting something every tick is the load profile that we'd like to eliminate with this new system.

The other stats program (not the one made by me) already has support for sending statistics using the console instead of segments. I'd worry that most people are going to bypass this limit switching away from segments to console based (at least until that gets rate limited as well).

Another side project here is how we should rate limit the websockets. This is not implemented here yet, but we need to come up with some solution eventually. One option which is currently being debated is to limit the connection rate itself:

There are really two use cases for the websocket and third party applications (that aren't full on clients)-

Recording console data, which the proposed rate limits would make impossible to do as each websocket would only be able to be used for 15 seconds before being disconnected for a minute. Without the ability to record console data it is very difficult for people to trace back older bugs that occur when they aren't at the system, and it would make systems like quorum a bit more difficult since it depends on an app to essentially mirror the console data.

This could also be frustrating for people who use the stand alone console, at least after the first hour. If the connect for the first time and it works for an hour that's great, but then if their reconnects later in the day are ratelimited until they hit a button in the UI that could be frustrating for them. Rather than have the token hit ratelimiting mode (15 seconds on, 60 seconds off) permanently after an hour of usage you may want to have it reset back to "full hour" mode after a few hours. This would effectively stop automated reading of the console, but would mean less frustration for people who are bouncing in and out of the custom console program.

Getting room objects. The "battle reporter" bot uses this to define the category of a battle and then report it to slack and twitter. This bot rarely takes more than a minute to run it's tests, and only looks at rooms that the battle api endpoints say are active. I think this should be able to stay within those limits, but it's going to be really tight.

Why not rate limit based on usage, rather than pure timeouts? Only allow X number of console websockets at a time, and rate limit room objects so only Y amount can be locked up each minute?

To add to that if you created a new API endpoint for pulling in room object data I bet a ton of people would use that instead of using the websocket, which would allow you to rate limit it using the ratelimiting system from above. This may work better than a pure timeout based system, as players can queue their requests, dump out as many as possible during that 15 second window, and then pause until it can query again- resulting in roughly the same amount of load but condensing it to a spike instead of spreading it out.

JBYoshi

I agree with @tedivm. Personally, I don't see how limiting the time for web sockets would help much (as opposed to some other type of limit). An application that is always connected to the web socket (but is otherwise reasonable in its usage) would be no different than leaving your Screeps client on overnight.

hyramgraff

@artch A "read-only" token doesn't need to be a "read-everything" token. I don't know if there's a better description that's widely used for a token that's only authorized for read access to non-sensitive data.

tedivm

Another issue that's come up is that the rate limits are making development difficult. Would it be possible to have the rate limits lifted or removed for PTR?

ags131

For the moment I've reverted my code pushing to using username/password, I've hit the rate limiting 3 times in a row this morning trying to work on cross-shard code.

ags131

I have another minor request for the tokens: Add an option to add a short comment or label to the token in the UI, that would make it easy to tell which tokens are used where. For example, commenting with 'local dev', 'screepsplus', 'stats', etc

On that note, the ability to manually enter paths for the token would be nice too. Allows a bit more flexibility than the current full-access or limited selections.

CGamesPlay

Thanks for the reply!

Do you have anything you can share about the impact these proposed limits would have on current usage patterns? Perhaps it would be good to turn on these rate limits in "warning mode" for a few weeks to gather feedback on what reasonable limits feel like?

The websockets endpoint specifically I wish would be rate limited on bandwidth rather than on connection duration. My external console takes effectively no bandwidth but stays connected for hours, even a 0.25 KB/s bandwidth limit would be completely acceptable to me.

As far as the UI-less reCAPTCHA page to clear the rate limits: I actually think this is pretty fine. I'm imagining the deploy script would catch an error and print out a link for the user to click, then just keep refreshing in the background until the user did it.

Sounds like stats are a problem for you guys, so the rate limits on that make sense. I do wish it were more usage-based, like e.g. maybe these endpoints have a CPU cost associated with them that drain directly from your bucket. This creates an incentive for users to optimize their stats collection, and you can adjust the CPU cost of the endpoint as required.

bonzaiferroni

The screeps3D project was planning on an early release to showcase the work so far. Based on this discussion, it sounds like there are still some issues to be considered for 3rd-party clients (overall data use including websockets, resetting rate limits). In light of that, it probably seems best not to do a release when it is uncertain what kind of issues it might cause for the public server.

About the rate limits, I'd humbly request some other option than the manual reset. It would not be very good user experience to be scrolling around the map and occasionally have to do a CAPTCHA. I'm not a web-dev so I looked up the invisible reCAPTCHA and I'm not sure that will be possible to do in a non-web environment like unity3d. Of course I understand that the dev-team has limited resources to accommodate the needs of a 3rd party client, so I'm not expecting it. It might be best to put the project on hold until there is something available.

artch

@tedivm

Recording console data, which the proposed rate limits would make impossible to do as each websocket would only be able to be used for 15 seconds before being disconnected for a minute.

We need to come up with a solution that allows legit console usage like tracking errors and short messages, but disallow abusing it to send large amounts of data.

To add to that if you created a new API endpoint for pulling in room object data I bet a ton of people would use that instead of using the websocket, which would allow you to rate limit it using the ratelimiting system from above.

Makes sense, we'll look into that.