How to Safely Throttle High Traffic APIs

Too much traffic can be a dangerous thing. To many application developers, this seems like a good problem to have – traffic is exactly what you want for your service, so accordingly, the more the better. The simple truth is, however, that too much of a good thing can be very dangerous – and in the API space, this can have dramatic effects.

Today, we’re going to look at exactly how high, untamed API traffic can be dangerous, and what the effects of untracked traffic actually are. We’ll discuss the first steps any provider can take towards rectifying this situation, and discuss a few methodologies that will help retain high volume traffic while maintaining security and impeccable functionality.

Why Too Much Traffic is Dangerous

While traffic, even high traffic, is generally a good thing, being unprepared for high traffic can result in massive threats to the concept of CIA.

CIA is a fundamental information technology security concept upon which all security solutions are based. Generally speaking, CIA is summarized as “Confidentiality, Integrity, and Availability”. In a situation of high traffic, a variety of passive and active threats collude to damage the foundations of CIA:

  • Confidentiality: Under high traffic loads, especially malicious traffic loads under the spectre of brute force attacks, systems may buckle under the weight of traffic, causing systems to fail or otherwise perform outside of expected behavior. This can result in vulnerable systems revealing data that would not otherwise be vulnerable, betraying the confidentiality of the data at hand.
  • Integrity: Extremely high traffic is part and parcel with dropped packets, failed calls, and incomplete data transfer. Accordingly, the integrity of data can easily be damaged, as clients and servers can fall out of synchronization with one another. While this is certainly a passive threat compared to the active threat incurred by Confidentiality, it nonetheless can cause serious issues, especially when the data mismatch is in heartbeat data, authentication and authorization tokens, and other vulnerable systems.
  • Availability: This is the most obvious threat of high traffic, and the most common. Over-traffic can result in a situation in which too many people are asking for too much data from an already overtaxed server ecosystem; if this is not properly addressed, nobody can get what they need or want, regardless of the size or type of data requested.

While these are serious issues, they’re only scratching the wide range of possible issues that can arise in a system dealing with an excess of traffic. Accordingly, planning for this possibility of extreme data is incredibly important, and should be part of any proper data management plan.

First Steps Towards Managing Traffic

While there are some specific, powerful solutions that we will discuss shortly, there are some basic first steps you can take to manage API traffic. While the specifics of each step are entirely decided by the specific requirements of the system in which they are placed, these solutions as a general rule of thumb can apply in a broad sense to almost any system that handles any significant amount of API traffic.

A big part of the traffic management puzzle is in the server architecture itself. While APIs can, and do, function simply on local resources on finite machines, the modern web space has on offer a wide range of virtualization systems that can be easily tied into by APIs of any size.

Adopting a cloud solution is a very good choice for traffic management, as they, by their nature, allow for the creation of virtual machines and the release of physical systems to handle excess traffic when the situation arises. This makes the actual hardware that the API relies upon scalable in nature, and is a huge first step to handling load balancing.

Additionally, adopting scalability within the design process as a conceptual basis is hugely important. Adopting a scalable language and framework can allow for traffic to be spread across multiple endpoints and systems, spreading the load across a wider structure.

While these are very general solutions, and in fact are wholly dependent upon the situation in which they are placed, they are still great first steps towards handling traffic management. That being said, there are some more specific, actionable solutions that can build upon these general approaches to magnify their effectiveness and lead to greater traffic management.

Rate Limiting

Rate limiting is perhaps the most common methodology for managing API traffic, and for good reason – not only do most frameworks and systems provided for some sort of rate limiting, the implementation of these limiting systems are very, very simple. The basic idea of rate limiting is perfectly represented by a traffic officer – when traffic is normal, the officer waves cars through the intersection, managing the system for a healthy flow. When traffic is abnormal, however, the traffic officer can refuse entry, redirect cars, and even close down streets.

In an API, rate limiting does exactly this. Rate limiting is a stop sign, telling the requester that they’ve requested enough data, and to either wait for the results or get off the network. This can be implemented in a wide variety of ways, but generally speaking, they fall under a few basic categories.

Dynamic Limits are rate limits that are predicated upon the API key assigned to the user staying within a range of requests. These requests, granted to the user for active use on the API, have an upper limit and some sort of time frame. When requests stay under the limit during the duration stated, the API key limit is considered unmet, and traffic is allowed to freely pass through. When the user exceeds the limit within the given amount of time, however, the traffic is rejected, and the user is notified that they must wait until their counter resets.

Server Rate Limits function a little differently. Whereas Dynamic Limits place limits on users, Server Rate Limits place said limits on servers, isolating typically low traffic servers from high traffic. By setting a baseline on the server and rejecting or limiting traffic in excess, you can proactively respond to high API traffic without purposefully interacting with it.

Regional Data Limits, on the other hand, move out of the server realm and into the regional physical space. In the same way that Server Rate Limits restrict access to servers based on heuristics and baseline analysis, regional limits do much the same for actual physical regions.

This makes sense for many applications – for instance, if you provided a web-service to find nearby landmarks for a specific city’s board of tourism, it would be strange if your system were to then receive insane amounts of traffic from Egypt or China. While most situations are not as black and white, it would be strange to find an English-only application trending in Russia or China.

As said previously, rate limiting is very attractive due to the fact that it’s relatively easy to implement. As an example, we can look at how Redis utilizes a rate limit pattern to restrict traffic:

keyname = ip+":"+ts
current = GET(keyname)
IF current != NULL AND current > 10 THEN
    ERROR "too many requests per second"

This small chunk of code limits the rate of server requests, creating a dynamic counter for each IP that expires every ten seconds. When the utility rate exceeds the stated rate in the code, the user is told that they have issued too many requests per second, and their traffic is rejected. Something this simple, this small, can be used to great effect to handle larger flows of traffic, negating many of the issues that heavy traffic bring.

But this is really just a small, local solution. What should be done for bigger problems? How does this concept scale up?

Building a Backend for Frontend

Another great option for managing API traffic is the concept of the Backend for Frontends, or BFFs. The basic concept of a BFF is to create a collection of complex interlinked microservices that function as a facade for different aspects of a greater system. In other words, each element of your API is broken into additional functions that are then accessed via facade APIs, turning your traffic from “single source” to “multiple source”.

By abstracting functionality from a single system into a system of complex tendril microservices, an application or service is moved from a single monolithic system towards a collection of microservices. This broadly spreads traffic across multiple nodes rather than forcing it down a single path.

One way to think about backends for frontends is to think about a bed of nails. A popular carnival attraction, most people are familiar with how it works – weight, spread over a great many sharp nails, is carried safely, as the relative weight on each nail is lower due to the spreading of weight. BFFs function much in the same way, preventing your single entrance into your API ecosystem from being overwhelmed with the sum total of data all at once.

API Gateway

API Gateways are a proven, time-tested methodology for managing API traffic, and are most commonly used in the enterprise space. While this is very much a limiting system, it’s much more a de facto one than a direct, actionable one. Gateways function by limiting the amount of concurrent users, connections, edits, and more to the gateway itself – by acting as somewhat of an arbiter into the network in general, these gateways act like a real-world gateway, limiting the manner and way in which data is accepted.

Gateways essentially function as another layer of abstraction between the API and the traffic flowing into the API, and as such, acts to balance this traffic by the nature of spreading data. This is hugely powerful for obvious reasons.

Interestingly, API Gateways can interact with microservices to bring many of the same functions and benefits of the BFF structure. By invoking a gateway as a manner of connecting to multiple internal endpoints and microservices, and then aggregating these results, you are essentially creating a micro-network in the same way that a BFF does, though in a much more structured and rigid way, without dependence on the concept of a facade.

For this reason, Gateways are considered a “De facto” limitation rather than direct, as the actual limitations on traffic are passive due to how the gateway is constructed, and what the gateway will allow.

Traffic Management Provider

Sometimes, internal solutions are not enough – in these cases, external solutions can be depended upon for even greater gains in management. A wide variety of management providers have sprung up in recent years, handling data in such a way as to create an artificial network in front of your servers, providing a balanced buffer. So-called Content Delivery Networks can allocate data more efficiently, providing caching systems and load balancing as a package for better management.

While some of these CDNs are indeed built into a given virtualization system, such as AWS, other CDNs operate independently of your virtualization product, allowing for already existent systems to be tied into such artificial networks.

A great example of such a system would be CloudFlare. CloudFlare essentially functions as a layer between clients and servers, intercepting data and manipulating it to load traffic. By directing this traffic to locally closer virtualized resources, stopping bad traffic from ever reaching the internal systems, and spreading heavy traffic across multiple CDN nodes, such a solution can almost transparently handle heavy traffic in a very clean, easy, efficient, and cheap manner.


Whatever your chosen solution, traffic management is a vital part of the modern API ecosystem – and it’s only to become more important. As more and more users enter the world wide web and consume data over a wide variety of APIs, the ability to balance this traffic, reject malicious traffic, and still serve actual legitimate traffic is going to become increasingly important and increasingly difficult.

Thankfully, there’s a huge range of solutions that are proven effective, and if well-implemented, can help negate the greater threats of traffic mismanagement nightmare scenarios.