If an API is implemented correctly, the number of users utilizing a service can be staggering. Millions of users and devices connect to the internet every day, utilizing APIs to perform calculations, convert media, and even help cure cancer.

The API developer’s dream come true, however, can also be a nightmare. The dream is to be the most popular API, the best service possible, facing huge adoption rates. The nightmare? As the number of legitimate users and calls rise in an API environment, more often than not, so does the number of illegitimate users and calls.

How can a developer stem this flood of API requests? Putting rate limits on calls to the API is a wonderful thing to implement, and when done correctly, fixes much of the inherent issues of large user bases.

What is Rate Limiting?

An easy way to contextualize the fundamentals of API architecture is comparing APIs to the average large city. A city has certain functions, and these functions require connections to one another. These roads have limitations both in the physical context (i.e. the size of the road) and in safety contexts (how many cars are safe to have on the road at the same time).

In the same way, API connections have these same limitations in their “roads”.

When developing an API, a developer needs to consider the physical limitations of a system — that is, the bandwidth, the number of concurrent connections that can be routed, and how much data can be transferred between the system and its users. There is only so much data that can be pushed through a system.

In the safety context, a developer needs to consider the limitations of a system, so as to prevent overflowing. Just like an over-limit road results in congestion and accidents, so too does an over-limit logical connection.

From a business context, API providers can implement rate limiting as a profit and cost negation technique. By requiring high-volume users to pay for premium plans, the increased operational expense can be negated and turned instead into a revenue stream.

The API provider has a silver bullet for these issues — rate limiting. Rate limiting is the process by which an API rejects requests for a variety of reasons, ranging from having too many concurrent connections to the requester forming a poor request for high amounts of data. By implementing rate limiting, the developer essentially installs a spigot which can be relaxed to allow for greater flow, or tightened to reduce the flow within the system.

Additionally, you can consider rate limiting a security feature in the greater scope of “CIA” (that is, Confidentiality, Integrity, and Availability), one of the fundamental concepts of network security. By ensuring that overflows of requests don’t occur and users are unable to overload the system, the API is kept available to all users, even if at a limited rate of accessibility.

Finally, one of the greatest benefits of rate limiting for businesses is the fact that limiting requires an implementation of analytics and metrics — a useful tool in the API provider’s arsenal.

More on API Architecture: What Data Formats Should my API Support?

3 Types of API Rate Limiting

Rate limiting is an art, not a science. There are a variety of ways to control rates that don’t use blanket policies, and instead opt for dynamic limits, which can go a long way to negating this caveat. There are three main types of rate limits used:

Developers can set user rate limits. These limits are tied directly to the user’s API key. After a certain amount of requests, further requests are denied, and after a set period of time, this counter resets, allowing for new requests.

Server rate limits are a good choice as well. By setting rates on specific servers, developers can make sure that common use servers, such as those used to log in, can handle a lot more requests than specialized or seldom used servers, such as data conversion devices.

Finally, the API developer can implement regional data limits, which limit calls by region. This is especially useful when implementing behavior-based limiting; for instance, a developer would expect the number of requests during midnight in North America to be lower than the baseline daytime rate, and any behavior contrary to this without a good reason would suggest questionable activity. By limiting the region for a period of time, this can be prevented.

Business Considerations

While it could be argued that rate limiting has a dampening effect on commerce between users and systems by reducing the amount of concurrent calls, this is rather short sighted — contrary to what might be expected, rate limiting can actually have a huge positive impact on commerce.

First and foremost, rate limiting establishes availability for a wider range of users by ensuring the single users or regions don’t dominate the concurrent connections to a server. This in effect ensures that more users with more diverse needs can connect to a wider collection of servers, increasing the userbase and potential revenue. In the long run, preserving functionality for a wider range of users has a net positive effect, even when balanced with the immediate reduction in potential requests.

Rate limiting also has the potential to create an additional stream of revenue by allowing for a freemium subscription model which reduces or eliminates these limits. A free user utilizing thousands of connections is expensive — if a partner wishes to make that many connections, however, they can be offered a subscription model that helps reduce the costs of their API utilization, and provides as many connections as they wish.

This subscription approach can be adopted for users as well. A great example of this type of monetization is Pusher, which offers both free and paid versions of their API.

As an alternative solution to hardware implementations, some API Gateway designs can provide a measure of rate limiting as well. By tying a system together behind a central gateway and limiting the amount of connections, concurrently logged in users, and the requests coming from the server carrying the gateway, a de facto limit can be implemented.

This is all a balancing act — limit rates too much with too high an entry price, and customers are driven away. Allow for no rate limiting, and reduced security and missed revenue streams make for poor performance over time. Balancing the user and developer experience of an API is perhaps one of the most important elements of API development. Finding the “sweet spot” for each API should be a personal and integral part of rate limiting implementation.

Implementing Rate Limiting

There are many ways you can implement rate limiting. One of the most common and easy ways to do so is to use internal caching on the server. Take this example in Alternative PHP Cache:

array apc_cache_info ([ string $cache_type = "" [, bool $limited = false ]] )

This function simply checks to see if an IP is cached in the current memory string, and if it is, checks the bool limit against concurrent connections. This can likewise be done using nginx:

http {
    limit_req_zone $binary_remote_addr zone=one:10m rate=1r/s;

    ...

    server {

        ...

        location /search/ {
            limit_req zone=one burst=5;
        }

This case is far more variable than the Alternative PHP Cache solution above. By setting both the static limit (1 request per second), and the burst limit (5 at one burst), connections can be dynamically limited to allow for regular usage and high-volume utilization.

As another implementation, Redis utilizes rate limit patterns:

FUNCTION LIMIT_API_CALL(ip)
ts = CURRENT_UNIX_TIME()
keyname = ip+":"+ts
current = GET(keyname)
IF current != NULL AND current > 10 THEN
    ERROR "too many requests per second"
ELSE
    MULTI
        INCR(keyname,1)
        EXPIRE(keyname,10)
    EXEC
    PERFORM_API_CALL()
END

Simply, this limits the interactions with the server as in the other examples, but also creates a dynamic counter for each connected IP, expiring every 10 seconds, and tracking the utility rates of the connected IPs. This solution also serves an error when the rate limit is reached.

Example Rate Limits

To see what these limits actually look like, let’s look at Twitter’s published rates. Below is a section of their official public Rate Limits chart. These rates are separated between user authenticated and app authenticated, as the limits per user and per app are slightly different under different circumstances.

Take a single call — GET friendships/show. This call returns information about two arbitrary users. This is a commonly used call, as it forms the basis of much of Twitter. Accordingly, users are allowed 180 requests for every 15 minutes, and the app is limited at 15 requests per 15 minutes.

Compare this to a call like GET followers/list. This call returns a cursored collection of sorted users, and is limited at 15 for both the user and the app.

With these two calls, we can see exactly why rate limiting is so powerful. Consider the computing power behind the first call. Two arbitrary users have several properties between them, and these properties are simply defined. This is a simple call with a simple return, and can easily be handled even in volume.

The second call is far trickier. The list of users for an arbitrary user is not only variable in size from minuscule to gigantic, these results need to be ordered, formatted, and presented, and maintained in memory to allow for browsing.

We can see why variable rate limiting by usage is so helpful. The second call is clearly far more resource intensive, and accordingly, is limited to prevent the high-data volume calls from dwarfing the calls for simple relationships.

Sometimes the limitation is not one of usage, but of purpose. Looking at Twilio’s rate limiting reveals quite an interesting limit:

Outgoing Calls
By default each Twilio account can make 1 outgoing call per-second. If your account needs more, please contact our sales team about options for higher limits.

Inbound Calls and SMS
Twilio places no limitation on the rate at which a number can receive inbound calls or SMS messages. Twilio will make an HTTP request to the request URL for each call and message received at your Twilio number. Therefore, please make sure your server is capable of handling the load if you are expecting a large amount of concurrent inbound traffic.

These limits make sense when you consider what Twilio does — as a voice and SMS handler, having more than one outgoing call makes no sense, as the average user is only going to be utilizing this communication one way. With integrated answering machines and hold services, however, this limit on incoming calls makes no sense, as they will be automatically handled by the backend.

API developers should carefully consider the specific computations which should be limited. As in the case above, data can be limited by function, but this is only part of the picture. Limiting by access level (i.e. user or application), limiting by region, and even limiting by the request type is all possible under variable rate limiting.

Also read: API Gateways to Direct Microservices Architecture

Conclusion

Rate limiting is an incredibly powerful tool to implement, but it must be implemented with savvy understanding of your userbase and their requirements.

Utilizing an effective rate limiting system establishes security and availability for your API, and if balanced with the inherent caveats, can lead to increased security, dynamic control, and even new revenue streams. Think of it as the central highway to your API-city — do you want a five lane modern highway, or a dusty dirt road? One leads to great success and accelerated growth, and the other to mediocrity.