Everything You Need To Know About API Rate Limiting Posted in Design J Simpson April 18, 2019 Last updated: September 29, 2023 Discover the benefits of API rate limiting and ways to implement it. As API developers, we need to make sure our APIs are running as efficiently as possible. Otherwise, everyone using your database will suffer from slow performance. Performance isn’t the only reason to limit API requests, either. API limiting, which is also known as rate limiting, is an essential component of Internet security, as DoS attacks can tank a server with unlimited API requests. Rate limiting also helps make your API scalable. If your API blows up in popularity, there can be unexpected spikes in traffic, causing severe lag time. So, how exactly do we rate limit our APIs? In this article, we’ll delve into the main strategies and industry standards around rate limiting. We’ll showcase effective rate limiting libraries and frameworks, and demonstrate sample code needed to implement request queues, throttling, and algorithmic-based rate limiting. How To Limit API Requests And The Importance Of Rate Limiting Let’s start by taking a look at what rate limiting is. Then we’ll take a look at how rate limiting works. We’ll also touch on the importance of rate limiting. What Is API Rate Limiting? If you dole out unlimited access to your API, you’re essentially handing over the keys to the kingdom. Anyone can use your API, as much as they want, at any time. While it’s great that people want to use your API and find it useful, unbridled open access can decrease value and limit business success. Rate limiting is a critical component of an API product’s scalability. API owners typically measure processing limits in Transactions Per Second (TPS). Some systems may have physical limitations on data transference. Both are part of the backend rate limiting. To prevent an API from being overwhelmed, API owners often enforce a limit on the number of requests, or the quantity of data clients can consume. This is called application rate limiting. If a user sends too many requests, API rate limiting can throttle client connections instead of disconnecting them immediately. Throttling lets clients still use your services while still protecting your API. However, keep that in mind there is always the risk of API requests timing out, and the open connections also raise the risk of DoS attacks. Best Practices For API Rate Limiting One approach to API rate limiting is to offer a free tier and a premium tier, with different limits for each. There are many things to consider when deciding what to charge for premium API access. For cost estimates, read our piece on API Pricing or API business models for ideas. API providers will still need to consider the following when setting up their API rate limits. Are requests throttled when they exceed the limit? Do new calls and requests incur additional fees? Do new calls and requests receive a particular error code, and, if so, which one? For a complete overview, let’s see some examples. Below we’ll see how a few API providers are communicating rate limiting guidelines to their developer users. GitHub GitHub’s developer documentation clearly outlines how they offer different rate limits depending upon the type of API requests. In general, they cap requests at 5,000 per hour per user access token or per OAuth-authorized application. GitHub also provides a unique API-driven way to communicate the status of rate limits for each account. By sending a GET request to /rate_limit, you can retrieve the rate limit status for an authenticated user. The response showcases the user’s limit, as well as their used and remaining requests categorized around each endpoint. Communicating this way is a nice, programmatic way to showcase rate limit statuses, as it can help developers avoid exceeding their limits or inform them of when to upgrade to the next necessary plan. This can be especially helpful for large API catalogs or variable pricing plans. LinkedIn LinkedIn API segments things into application rate limiting (the total calls an application can make in a day), as well as member rate limiting (the total calls an individual application user can make in a day). Similar to GitHub, the exact limits vary depending on the endpoint in use. Interestingly, LinkedIn API rate limits aren’t published in the generic developer documentation. Instead, they can be found within the Analytics tab of the developer dashboard. This is a handy visual way to communicate usage and application limits to your API users. Bitly Bitly, the popular link-shortening service, has two overarching types of rate limits: platform limits and plan limits. The platform rate limits enforce per-hour, per-minute, and per-IP rate limits. Then, monthly rate limits depend upon the user’s plan, which ranges from free to enterprise-level. (A good example of how closely rate limiting is tied to API productization). Similar to GitHub, you can programmatically retrieve Bitly rate limit information. You can either send a GET request to /v4/user/platform_limits to see your platform limits or send a GET request to /v4/organizations/{organization_guid}/plan_limits to view plan-specific limits. How To Throttle API Calls Seeing as how there are numerous ways for a program to connect with an API, there are also various ways you can throttle API traffic. Limiting calls made to third-party APIs and applications are some of the most challenging situations to limit. For example, if your clients call the Google Maps API directly, there’s not much you can do to limit that. You’re just going to have to pay for the appropriate level of data usage. If the rate-limited API is accessed via some form of backend process, it’s decidedly easier to limit the API queries using the backend code. Say your API only allows 20 requests per second. You can set up a process that only allows 20 requests a second to pass through. If all of those requests are happening synchronously, it might not make a difference, but you can quickly see a difference when it comes to asynchronous tasks. If your process is implemented in Node.js, for example, you could use the bottleneck package. const Bottleneck = require(“bottleneck”); // Never more than 5 requests running at a time. // Wait at least 1000ms between each request. const limiter = new Bottleneck({ maxConcurrent: 5, minTime: 1000 }); const fetchPokemon = id => { return pokedex.getPokemon(id); }; limiter.schedule(fetchPokemon, id).then(result => { /* … */ }) If you’re using Ruby and tools like Sidekiq, you could use plug-ins like sidekiq::throttled or sidekiq enterprise for rate-limiting. Every programming language will have its own version of throttling or rate-limiting. Look into the libraries and packages available in the language you’re working in to see what’s out there. What You Need To Know About Rate Limiting Many services that use REST APIs feature API limiting as a defense against DoS attacks and overloaded servers. Some APIs feature soft limits, which allow users to exceed the limits for a short period. Others have a more hardline approach, immediately returning an HTTP 429 error and timing out, forcing the user to send a brand new query. Setting a timeout is the easiest way to limit API requests. Just set the timeout limit and then return the following message to your users. const delay = interval => new Promise(resolve => setTimeout(resolve, interval)); const sendMessage = async params => { await delay(1000); return axios(params); }; This method works like a charm for those looking to get something up and running quickly. It’s a good approach for those who’ve recently switched from PHP to Node.js. Node.js handles all kinds of asynchronous requests, as well, so you might need a more permanent solution. Request queues are ways to achieve this. Three Methods Of Implementing API Rate-Limiting There are numerous ways you can rate-limit your API. Here are three of the most popular ways to go about API rate-limiting. 1. Request Queues There are a lot of request queue libraries out there, and each programming language or development environment has its own commands. This means a lot of the hard work has already been done for you. There are also a few request-rate-limiter libraries already out there. One particular library sets the rate limit at two requests per second and places the rest in a request queue. There are plenty of request queue libraries out there that are ready to be used. They’re about as close to plug-and-play as you can get in API development. Android Volley Volley is a particularly popular request queue library for Android developers. Not every Android Library can take advantage of Volley, as some of them require more extensive networking capabilities. Take a look at your Android library’s documentation to make sure it’s compatible with Volley. Amazon Simple Queue Service (ASQS) Amazon’s Simple Queue Service (ASQS) is a ready-made request queue library that is perfect for request and messaging queues. The project is regularly maintained, so you won’t have to constantly debug your hardware or software to make ASQS work. Setting Rules For Request Queues To illustrate the best way to set rules for rate-limiting libraries, we’ll be using npmjs, a package manager for JavaScript. Npmjs features a lot of request queue libraries, so you don’t have to code everything from scratch. It’s also got a healthy development community around it, so there’s a lot of support and guidance available should you run into any problems. To start, you’re going to use the ‘request’ library to make HTTP requests. We’re going to use the ‘request-rate-limiter’ library, and it’s easy to use and configure for a variety of uses. const RateLimiter = require('request-rate-limiter'); const limiter = new RateLimiter(120); // 120 requests per minute const sendMessage = params => limiter.request(params); sendMessage('/sendMessage?text=hi') .then(response => { console.log('hello!', response); }).catch(err => { console.log('oh my', err); }); This rate-limiting library automatically limits the number of requests that can be sent to an API. It also sets up the request queue automatically. This also means you don’t have to worry about how many requests are sent to the API, as they’ll be added to the queue. 2. Throttling Throttling is another common way to practically implement rate-limiting. It lets API developers control how their API is used by setting up a temporary state, allowing the API to assess each request. When the throttle is triggered, a user may either be disconnected or simply have their bandwidth reduced. Possible at the application, API, or user level, throttling is a popular method to rate-limit APIs. Therefore, there are several commercial products on the market ready-made for developers. Progress offers the Hybrid Data Pipeline offers throttled API access for: IBM DB2 Oracle SQL Server MySQL PostgreSQL SAP Sybase Hadoop Hive Salesforce Google Analytics The utility also features built-in functions to filter the query results that are returned to the client, such as $count, $top, and $skip. They also offer OpenAccess SDK for proprietary APIs. OpenAccess SDK offers a standard SQL interface such as ODBC, JDBC, ADO.NET, or OLE-DB. OpenAccess SDK easily integrates with most security and authorization systems, making it a useful firewall between APIs and back-end systems. 3. Rate-limiting Algorithms Algorithms are another way to create scalable rate-limited APIs. As with request queue libraries and throttling services, there are many rate-limiting algorithms already available. Leaky Bucket The leaky bucket algorithm is a simple, easy-to-implement rate-limiting solution. It translates requests into a First In First Out (FIFO) format, processing the items on the queue at a regular rate. Leaky Bucket smooths outbursts of traffic, easy to implement on a single server or load balancer. It’s also small and memory-efficient, due to the limited queue size. Fixed Window Fixed window algorithms use a fixed rate to track the rate of requests using a simple incremental counter. The window is defined for a set number of seconds, like 3600 for one hour, for example. If the counter exceeds the limit for the set duration, the additional requests will be discarded. The fixed window algorithm is a simple way to ensure your API doesn’t get bogged down with old requests. Your API can still be overloaded using this method, however. If a slew of requests is made when the window refreshes, your API could still be stampeded. Sliding Log A sliding log algorithm involves tracking each request via a time-stamped log. Logs with timestamps that exceed the rate limit are discarded. When a new request comes in, the sum of the logs are calculated to determine the request rate. If the request exceeds the limit threshold, they are simply queued. Sliding log algorithms don’t suffer from the stampeding issues of fixed windows. It can get quite expensive to store an unlimited amount of logs for each request. Calculating the number of requests across multiple servers can also be expensive. Sliding log algorithms aren’t the best for scalable APIs, preventing overload, or for preventing DoS attacks. Sliding Window Sliding window algorithms combine the best of Fixed Window and Sliding Log algorithms. A cumulative counter for a set period is used, similar to the fixed window algorithm. The previous window is also assessed to help smooth outbursts of traffic. The small number of data points needed to assess each request makes the sliding window algorithm an ideal choice for processing large amounts of requests while still being light and fast to run. Final Thoughts: The Effects of Rate Limiting Today, we’ve looked at some of the best ways to limit API requests, but what are the effects of rate limiting APIs? It’s never been more essential to make digital creations as efficient as possible. According to a study from Dimensional Research, 80% of app users will only try and use an app that’s giving them problems three times before uninstalling it. 36% of app users report developing an unfavorable opinion towards a brand due to app performance issues. Unregulated API requests can also lead to slow page load times for websites. Not only could this leave an unfavorable opinion with your customers, but it can also tank your SEO rankings. With the prevalence of mobile Internet traffic, Google is factoring Page Speed more and more in their algorithm rankings. Ensuring digital content is as fast and efficient as possible is vital in today’s global economy. Mobile app speed and page load times can fluctuate wildly from country to country. You don’t want to alienate your international customers with bloated, slow-to-load apps and websites. As we’ve seen, rate limiting is an essential skill for developers of all traditions. If you’re using API requests in any regard, consider these rate limiting techniques to increase security, business impact, and efficiency across the board. The latest API insights straight to your inbox