Back in 2006, Google discovered that an extra 500 milliseconds of latency could drop traffic by as much as 20%. That was twelve years ago; now web users expect faster responses than ever before, and as an API practitioner, it’s your duty to make that happen!

Building ultra fast APIs isn’t easy, especially not when you’re trying to balance low latency with the conflicting design principles of openness and flexibility. So, to help you out on your quest to building speedy APIs, we’ve collected five of our favorite ideas for decreasing latency without making any barbaric sacrifices!

1) Design Practical Endpoints

The first and most design-oriented suggestion for improving API speed is to create practical, user-focused endpoints for your API. While this doesn’t reduce latency itself, it will minimize the number of calls developers need to make, thereby reducing the cumulative latency of those calls and making your API feel faster.

If developers need to make one call to find the user-id associated with a given email, and then another call to find the corresponding address, the whole process is going to take twice as long — no matter how much you optimize the latencies of the individual calls themselves — than if developers can go directly from the information they have to the information they need.

Of course, designing practical endpoints isn’t always as easy as it sounds. It’s hard to predict the exact circumstances behind developer motives; even then, having dozens of disjointed and application-specific endpoints can get messy quickly, and definitely doesn’t lend itself to one of those fashionable microservice architectures.

Now instead of making two separate calls we may just call a single endpoint and get all of the required information. But remember that there is a tradeoff. To get a more robust API we’re stepping away from a REST design pattern and binding together two related entities – user and address. –Wojtek Erbetowski, Polidea

2) Support Partial Resources

Our next suggestion for speeding up your API is to support partial resources. The best example of a partial resource is a partial response, where requests are modified with the fields parameter so developers only receive the portions of data they ask for in a request.

By serving partial a response, you reduce the total size of the request, meaning there’s less data to send and it can be completed faster. As an added bonus, excluding unnecessary data can make it easier for developers to parse the response!

An alternative partial resource is the patch request. In this case, developers can “patch”, or update, select fields of data instead of rewriting the entire resource. Same story: smaller requests means faster requests.

3) Compress Data Responses

If you want to reduce latency without sacrificing any data, compression might be the answer. In this case, you can use a tool like gzip to compress larger responses before you serve them. Of course, this means your developers will have to extract the responses on the client-side.

This should make your API a tad faster in terms of latency, but the downsides are added load on the server (as it compresses the data) and added load on the client (as they extract the data). For some — especially those serving large payloads (such as high resolution images, audio files, or video clips) — the payoff is worth it, but you’ll have to determine this for yourself!

An easy and convenient way to reduce the bandwidth needed for each request is to enable gzip compression. Although this requires additional CPU time to uncompress the results, the trade-off with network costs usually makes it very worthwhile. – Blogger Developer Portal

4) Negotiate for Effective Formats

Content negotiation is the mechanism which allows developers to choose the format which best suits them, when multiple are available. While this sounds like an approach to supporting legacy file types — and it is — it’s also a fun and unique way to reduce latency for some requests.

Every so often, a new image file format comes along that does a better job of compressing images than JPEG or PNG. We’ve seen Google pushing its own Webp format, (purportedly 26% smaller in size compared to PNGs), and new formats BPG and FLIF are contenders, packaging images in less and less space and time then their predecessors. – Bill Doerrfeld, Nordic APIs

See, if you don’t use content negotiation, you probably only support the older, bulkier file formats for image, video, and more. By implementing content negotiation, you allow those supporting newer, sleeker formats to receive smaller files and thus faster responses.

As we explain in our article on the topic of content negotiation, this is a surprisingly easy feature to implement:

Browsers can send information as part of each request about the representations they prefer, with q-factors to denote the usage preference relative to other languages, text formats, and image types. Then, the server responds to best fit these needs.

5) Stream Where Applicable

As another more specific approach to building faster APIs, consider a streaming API. With streaming APIs, the developer makes an initial request, and the server continually sends responses back as new data is made available.

Compare that to the alternative: data is made available, sits around for a while until the developer makes a request, and is only then sent off from the server. Streaming eliminates the repeated requests sent from the developer to the server, effectively halving the latency!

Case Study: Twitter
In addition to using regular payloads, developers can request data from the Twitter API by means of a streaming API. This way, Twitter updates the developers every time their chosen users publish a Tweet or update their profile, instead of the developers sending repeated requests. This cuts out the time it takes for developers’ requests to reach Twitter servers, and, as an added bonus, eliminates unnecessary requests.

Rather than delivering data in batches through repeated requests by your client app, as might be expected from a REST API, a single connection is opened between your app and the API, with new results being sent through that connection whenever new matches occur. This results in a low-latency delivery mechanism that can support very high throughput. – Twitter Developer Portal

Final Thoughts

Speeding up an API isn’t easy, and almost impossible if you’re unwilling to make some compromises. However, we’ve seen that there are definitely options for reducing latency and otherwise improving the effective speed of your API without making too many sacrifices, especially if you’re willing to build with purpose and other key design principles in mind.

Thomas Bush

About Thomas Bush

Thomas Bush is an enthusiastic freelance writer from the United Kingdom, who loves breaking down tough topics into bite-sized articles. Covering everything from cryptocurrencies to medicine, and now APIs, you can find out more about Thomas on LinkedIn or on his website at