3 Steps To Rate Limiting Success

On the face of it, rate limiting can almost seem counterintuitive. We spend all this time opening up our product or service via an API, only to implement an “ah ah ah” and a wagging finger — a la Jurassic Park’s Dennis Nedry — when users make too many requests.

If we fail to indicate what “too many” requests is, we risk frustrating users. However, when we make it clear, we risk opening up our APIs to bad actors. This is just one part of what the team from Zuplo, an API gateway and management platform, has called “the subtle art of rate limiting.” In fact, they define the protection rate limiting can offer as one of their three pillars of an API program.

Zuplo’s Nate Totten joined us at our Austin API Summit in 2024, where he spoke extensively about balancing the tradeoffs of consumer experience and performance. Below, we’ll cover the key takeaways of his talk, along with some major rate-limiting blunders…

Watch Nate Totten present at the Austin API Summit:

Why Do We Need Rate Limiting?

We’ve written extensively about rate limiting in the past, covering everything from the basics of API rate limiting to a deep dive into different algorithms to implement rate limiting. On the off chance you’re not already using rate limiting, be sure to check them out.

At the beginning of his speech, Totten asks rhetorically, “Am I really going to be attacked?” The answer, he says, is “yes, but not by who you think.” A clumsy for loop, Totten suggests, or some other error on your part can result in you hammering your own API… and large bills.

Rate limiting is great for protecting resource consumption. But, with all of that said, there are bad actors out there looking to take advantage of APIs that lack rate limiting. Unfortunately, it seems like some companies didn’t get that memo.

At the beginning of 2024, a Trello breach saw more than 21GB of data leaked, including more than 15 million user email addresses. The hacker responsible, “emo,” told BleepingComputer that the information was acquired using an unsecured REST API.

Specifically, emo fed a list of 500 million email addresses into the API (used by developers to query public information about Trello profiles) to check if they were linked to a Trello account. Using the returned account information, they were able to create profiles for 15 million users.

Trello has since tightened up rate limiting around their APIs. Still, the fact that someone could make calls to the tune of 500 million email addresses — a number far larger than anyone making legitimate queries would ever realistically need — has raised more than a few eyebrows.

1. Make Rate Limiting Transparent and Observable

Canonical responses are pretty straightforward. For instance:

Status – 429
Status Text – Too Many Requests
Header – Retry-After: 3600
Body – Use Problem Details format (IETF RFC 7807)

However, Totten acknowledges that disclosing limits (like the 3600 shown in the header above) makes it easier for malicious users to maximize resource consumption without hitting limits. Not disclosing them, on the other hand, makes it hard for legitimate consumers to avoid doing so.

He suggests that the formality of your relationship with users could dictate the amount of information you share with them. For instance, don’t disclose rate limiting information to free tiers. Instead, only disclose it to contracted partners. However, he highlights that bad actors can probably figure this information out anyway.

Visibility is, Totten says, a key factor to consider throughout the entire rate limiting process: “When you add rate limiting to your API, you’ll get a lot of support requests…especially if you just add it where you didn’t have it before.” In an ideal world, he continues, “both you as the developer and your customers will have some visibility into what’s happening in the system.”

In this context, visibility means checking things like requests per second (RPS), a breakdown of requests by bucket (like IP or user), and rate-limited responses. You can consider sharing this information with your users, as long as you trust them with it, in the same way that you might share uptime or server status updates. The more clarity you give them about using your API, the more you can reduce incoming support requests regarding rate-limiting (and other) issues.

2. Balancing Optimization and Latency

Whether as developers or consumers, we all want APIs to have certain characteristics: speed, reliability (high uptime), good usability, and security. Sometimes, however, improving one of these characteristics can negatively impact another.

IP-based rate limiting, for example, risks punishing those who share addresses (like services operating from the same data center or even a WeWork facility). Rate limiting by user or app may be more intensive on the developer side but improves the overall user experience.

Likewise, Totten points out that completing a rate limit check before allowing a request to proceed results in a slower API for everyone because it incurs latency on every request. He suggests a more lenient approach: running checks asynchronously and caching blocked users:

Store known consumers (and retry time) in a local cache.
Check the rate limit in parallel with performing the primary request.
If the rate limit check comes back blocked, update the cache and (if the primary request is not complete) override the response.
Otherwise, the next request will be blocked anyway.

A downside? “A certain amount of requests will,” Totten acknowledges, “get over the quota.” But that tradeoff, he argues, is typically worth it. “Rather than making every single request pay that penalty,” he says, “we can increase performance and decrease latency.”

3. Rate Limiting Is Not the Be All and End All

In July 2024, hacking group NullBulge leaked more than 1.1TB of Disney’s Slack data (alleged to include unreleased projects, raw images and code, and links to internal APIs/web pages). In fact, most of the information leaked appears to be fairly benign — a terabyte of random dog pictures, memes, and screenshots, according to some commenters.

Although Disney has since pointed the finger at an inside man, an early theory circulating during the writing of this article suggested that — like in Trello’s case — the leak may have been due to an unsecured Slack API endpoint.

Actually, Slack already imposed rate limiting on its APIs and offers DLP features for data loss prevention. However, it should go without saying that we aren’t privy to the extent of Disney’s engagement with those features or their broader security stack.

Even so, it’s interesting to note that a hacker could still make thousands of calls per hour within the confines of Slack’s rate limiting. In fact, they could grab all 16,735 files in the torrent file shown above in less than three. A good reminder that rate limiting is not, on its own, a silver bullet.

As useful as rate limiting can be, it should only be considered one aspect of your approach to API security. For a truly secure API, you’ll want to follow other best practices, such as using a gateway, OAuth, conducting regular audits and tests, and so on.

Nevertheless, as we’ve seen above, rate limiting remains a valuable way to limit access (or at least the rate of that access) to your APIs, and it’s not a bad tool to have in the toolbox.