Don’t Underutilize These 5 Amazing HTTP Performance FeaturesSo, you’ve published a web API? Well done! You’re serving it over HTTP(S), right? Most developers see no reason to distrust the protocol that’s been holding the web together for almost 30 years. HTTP is very performant, scalable and reliable – in fact, it has multiple nifty performance features to make sure developers can make the most out of the applications built upon it. In this article we explore a few of them, because even though most popular web servers support them, you may need to enable or configure them, and the first step is understanding them.

This article is not meant to give you a step-by-step instruction in how to enable and configure these features in your particular web server, but rather to explain why they are needed, how they work, and in what situations you should use them. Here we go!

1: Caching

Caching might be the performance feature with the most impact, but it is also one of the most complex and error-prone ones. The basic idea is that the client should not need to re-download data that it has previously downloaded – the problem is that of deciding which data the client already has and whether it has changed since the client downloaded it.

There are a few ways of dealing with this. The most common approach is to include a response header that tells the client how to cache the data, if ever. But since HTTP is an old and versatile protocol, there are several different cache headers and different ways to use them. Here is a quick overview of the most commonly used cache headers:

  • Cache-control is the most commonly used headers for, well, cache control. It lets the server specify how long the client should cache the data, and which parties along the chain between server and client should or should not store it. For instance, it is possible to tell proxies and content delivery networks (CDN) not to store a response, while still telling the client to keep it around. However, cache-control alone is limited in that it cannot actually tell the client to only download new data, since the cache validation is based solely on a timeout.

  • Etag is a header that gives the client an identity of the resource being requested. Typical etag implementations involve either using a hash of the data as the identity, or a timestamp, or a combination of both. This saves the client from having to download data that has not changed. The server will include the document’s current etag value in the Etag header:

The client can then, on subsequent requests, tell the server to only return the resource if the etag has changed, using the If-None-Match header:

If the document has not changed, the server can simply return a HTTP 304 Not Modified status, instead of having to send the entire document. Of course, this means that the identity implementation has to be good, so that different documents do not get the same Etag and vice versa. It is also important to remember that while Etag solves a lot of cache problems, some data may be sent in vain. The client still has to make requests to the server to validate its cached data, whether it is still valid or not. But combined with cache-control and smart selection of parameters it can get you very close to the goal.

  • The Vary header is used to tell intermediate parties, such as proxies, CDNs, and cache engines, that different classes of clients should get different data. That way, clients that request compressed data (using the Accept-Encoding header) can get gzipped data, and clients that do not support gzip can get the raw data, all between the cache layer and the client, without having to go all the way to the server. It is also possible to vary the caching on other HTTP request headers, for instance User-Agent, which lets intermediate caches differentiate between users with different browsers.

  • Expires is an older header that tells the client when the resource should not be kept in the cache. It is not as versatile as Cache-control, but is very easy to implement and understand, so it is still used extensively.

Remember to test your app extensively (duh) when using caching, since it is very easy to get it wrong. The more complex your setup is, the more pitfalls you’ll potentially build for yourself. But there is also potential for a great performance and scalability boost.

2: Keep-Alive

HTTP almost always uses TCP as an underlying protocol, and there is one property of TCP that can cause severe performance problems if not handled correctly. Well, there are several, but there is one specifically worth highlighting.

If a TCP sent by one party is lost, the party will transmit that packet again after a period of time. Modern network stacks cleverly use round trip times of successful packets to figure out an ideal timeout – for instance, if packets normally take 1 millisecond to arrive, the party can safely assume that any packet that has been in transit for 10 milliseconds is probably lost, and so it can retransmit that packet and stop waiting for the original one to arrive.

But if your application is built in a way such that each HTTP request uses a new TCP connection (instead of reusing an existing one), that historical data is not available. In this case, most network stacks set the initial timeout to 3 seconds. That means that if a packet is lost, it will take 3 full seconds before it is retried! This can be a huge problem for clients with a shaky connection, for instance mobile users.

So what can be done? HTTP has a feature called Keep-Alive, which enables the client and server to maintain their TCP connection, even when the first HTTP request-response cycle has been completed. That way, subsequent requests will use the same TCP connection and any lost packets will be retransmitted much sooner. This is of course only useful if your application involves multiple HTTP requests from each client.

3: Request Pipelining

You can further optimize performance of applications where clients send multiple requests by pipelining them. When request pipelining is enabled, the client and server agrees that the client does not need to wait for a response before it sends the next request. This way, you can achieve much higher throughput. However, the responses will still come in the same order as the corresponding requests, so a particularly slow response will still hold up all the responses coming after it. This is called head-of-line blocking and is being addressed in the next version of HTTP called HTTP/2. More on that below.

Pipelining vs no pipelining comparisonNote also that pipelining will only always work for requests that do not change the state of the server, for instance GET or HEAD requests (shame on you if you change server state with these requests). Requests that do change the state, such as PUT or DELETE, can be pipelined, provided that the client is sure that subsequent requests do not depend on the state of previous requests. Otherwise, the client will see inconsistent server state between the responses which may or may not break your app.

Non-idempotent requests (that cause a new unique action each time they are made), such as POST, are generally not safe to pipeline. Most often these are implemented as a block in the pipeline, to make sure not to screw up the state for any other requests that might depend on it.

4: Compression

One easy way to save time when transmitting data is to compress the data. HTTP supports multiple formats of compression, but the two most commonly used are GZIP and deflate. In theory they are similar (they use the same compression algorithm but with different headers and checksum), but in practice there has been a lot of confusion with deflate. Since many browsers have been implementing it incorrectly, even though deflate can be faster than GZIP, the general consensus is to avoid deflate. The effect has been that GZIP is the default compression format for most server software. However, there might still be clients that only support deflate, so the best thing is to make the server support both.

So how does it work? Typically, the client tells the server (via Accept-Encoding header) that it supports some types of compression. The server then compresses the data payload (not the headers) of the HTTP response using that compression scheme and serves it to the client. Depending on the type of content, the data can be up to 90% smaller when compressed.

It is important to remember, though, that compression is not free. It will take some CPU resources on the server to compress the data, and some resources on the client to decompress it. That can lead to performance problems on the server because it uses up the CPU on compression of HTTP data, and if the data payload is small and the network is fast, the added compression delay might actually make the data take longer to reach the client than when no compression is used. As always, try it out with your scenario and see if it works for you.

5: Partial Content

You shouldn’t send data that has already been sent or should not be sent for other reasons (see the caching discussion above). HTTP gives you tools to optimize your app the way you want it. Here are some ways you can use HTTP to serve or accept partial content:

  • Sometimes the client does not need to read the body of the HTTP response to get the information it’s looking for. For instance, it might only be interested in the size of a resource, which can be seen in the headers. Or it can be used to find out what HTTP features the server supports. In that case, the client can make a HEAD request to the server. A HEAD request is just like a GET request, except that the server does not send a message body, only the headers.

  • When it comes to POST and PUT requests, the client might have a hefty payload for the server. But in some cases the client might not be allowed to make requests — for instance if some HTTP header is missing or invalid (authorization is often implemented using headers). The Expect header can be used to send only the headers of the request, and tell the server to come back with a 100 Continue status code if the headers are A-OK. This way, the client does not need to send that big payload until it is certain that the server will accept it.

  • The server also has the option to not send a huge response body all in one go. It can use a 206 Partial Content response status to tell the client that there is more data coming, and split the body into multiple HTTP responses. This means that the initial HTTP response will only contain the first part of the data, and it’s then up to the client to request the following parts and stitch those responses together to form the whole document.

So how does the server know that the client expects this? Well, the client has to ask for it. One way for the client to do it is to use a HEAD request to find out the size of the resource, and in doing so, the server can tell the client that it supports partial responses by returning the Accept-Ranges header. Then the client can simply make multiple requests, each with a different byte range in the Range header.


So what does the future have in store? Well, the next version of HTTP (not so surprisingly called HTTP/2) was released last year, after having started its life as a Google-developed protocol called SPDY. One of the major differences is that HTTP/2 is based on binary frames that are transmitted over TCP streams, as opposed to ASCII based text messages. This change enables a bunch new features, some of which will bring some performance enhancements, including:

  • Compression of headers – this means that not only the body of a response can be compressed, but the headers too. In some cases, where there are lots of headers (for instance, lots of cookie data), it can provide a significant speed boost.
  • Better support for pipelining via true message multiplexing, which eliminates the problem with head of line blocking mentioned in the section about pipelining. This also means that multiple parallel requests can be sent over the same TCP connection, instead of having to open multiple connections between the client and server.

Summing Up

So, what have we learned? Well, for one thing, HTTP is very complex and feature-rich, and configuring it correctly and optimizing your setup can significantly boost performance and reliability. Though the usage can vary, many of the optimizations that are tailor made for web browsing scenarios also carry over nicely to API design. But these optimizations don’t come for free — be sure to test under real world conditions so that they successfully increase the robustness of your service. Happy optimizing!

Joel Kall

Joel has an M. Sc. in Media Technology from KTH, where he did his master thesis programming ad systems on set-top-boxes for Videoplaza and Ericsson. Since then he has founded three companies, worked as a consultant in web and systems programming, and is currently co-founder and senior developer at Loop54, a company providing on-site search as a service for e-tailers via a REST API. At Loop54, Joel works closely with his pet mathematician (and good friend) to develop new algorithms to make sure the search engine can understand what users are looking for.