If there’s anything that has the most “bang for the buck” in the IT world, it’s optimization. Optimization, or the process of simplifying, offloading, or otherwise reducing the processing demand of an entity, can have significant benefits to almost any system, and when done properly, can take even the most monolithic system into the realm of high efficiency and extreme usability.

One of the best spots where optimization can see significant gains is in the API response package generated from client requests. These responses are often needlessly bloated by data that may or may not be needed for whatever reason, and simplifying these responses can lead to dramatic increases in efficiency.

Today, we’re going to discuss exactly that, diving into the concept and process of API response optimization. We’ll discuss why the gains seen in this space are so beneficial to both the developer and the end user, and what this optimization functionally looks like. We’ll also take this out of the theoretical and give some real-world examples of them in action.

Data Handling Methods for Optimization

How do we go about optimizing an API response? While there are a wide range of third party solutions, many of the methods employed can actually be used on the codebase itself. As such, in this piece we will be discussing general approaches rather than advocating for any specific solutions.


Pagination is the principle of separating responses into batches of content, browsable through selective response requests. In other words, think of pagination in an API in the same way you would with a book – if you only want to reference a set number of pages, you can do so, and navigating those pages via attached numbers allows for clear segmentation and more efficient browsing.

In the same way, pagination can serve to optimize responses while still preserving the larger amount of data being fed through to the user. This can be done in a wide variety of ways, but fundamentally speaking, at its most basic, pagination should at the very least allow:

  • Segmentation of responses into set block units (i.e. 10 responses per page, 20 responses per page, etc.);
  • Limitation of total responses for the developer (i.e. limit pagination to the first 1,000 entries, paginated into 10 unit block pages);
  • Standardization: Consistent approaches to pagination require at least the use of some standard terms or synonyms (such as the use of “next”, “last”, etc. for cursor navigation).


A great example for pagination can be found in the HAL PhlyRestfully primer, and is as thus:

    "_links": {
        "self": {
            "href": "http://example.org/api/user?page=3"
        "first": {
            "href": "http://example.org/api/user"
        "prev": {
            "href": "http://example.org/api/user?page=2"
        "next": {
            "href": "http://example.org/api/user?page=4"
        "last": {
            "href": "http://example.org/api/user?page=133"
    "count": 3,
    "total": 498,
    "_embedded": {
        "users": [
                "_links": {
                    "self": {
                        "href": "http://example.org/api/user/mwop"
                "id": "mwop",
                "name": "Matthew Weier O'Phinney"
                "_links": {
                    "self": {
                        "href": "http://example.org/api/user/mac_nibblet"
                "id": "mac_nibblet",
                "name": "Antoine Hedgecock"
                "_links": {
                    "self": {
                        "href": "http://example.org/api/user/spiffyjr"
                "id": "spiffyjr",
                "name": "Kyle Spraggs"

In this example, what’s being done is pretty simple. All the results are broken down via pagination, and a specific method of navigating these entries is provided by “first”, “prev”, “next”, and “last.“ The total entries is specified, as is the count, via “total” and “count.”

By breaking out results into these very easy to navigate paginated forms, you not only reduce a lot of the complexity of the results, and thereby benefit from optimized transformation, you also come out with a much better user experience, as they are no longer inundated with 498 entries on a single page.


Filtering is a very powerful tool, expressly allowing for the limitation of results per the parameters from the requester itself. This is very effective, as it not only reduces the total amount of calls that are made and results that are displayed, it also helps to very specifically determine which resources are fed to the user per their own requirements.

This has the added effect of giving real, tangible optimization, while also providing a better user experience that gives the illusion of greater efficiency. While on average, we’d prefer to go for actual optimization rather than an illusion, the end result is a system that works better, and gives only what is actually requested.

It should be noted here, however, that filtering, when overly complex, can actually have an inverse effect for optimization. Some complex filtering logic can in fact be so complex that the client might require feedback from the server itself before applying the additional logic of the filter. This would have an inverse effect on the user experience, and while in theory the last call would be optimized, the calls leading to this filtered content would indeed not be.

Accordingly, filtering needs to be applied effectively.


An effective example of filtering can be found in CapitalOne’s SparkPay API. SparkPay handles its filtering by the following syntax:


What really makes these filters powerful are the comparison operators that are provided by the API. “eq”, “not”, “like”, and other operants like this allow for a huge amount of flexibility within the results themselves, and when paired with the conjunction operators such as “AND”/“OR”, allow for an even greater amount of specificity to the actual results displayed.


Ranges are a great example of further restricting the results you’re receiving based upon a specific structure from the user. When the range header data of an API request defines a specific start and end, only the specific elements within that range are considered applicable for the request.

For instance, if a content range is limited to only content containing a specific status code or specific handling ID, we can limit the actual size of the response package, and offload the processing of the data from the client side of the equation and onto the server. While this does entail extra handling from the server, if the database is properly optimized itself, utilizing views and indexes for specific information sets, this can be extremely negligible.

What this actually comes down to in usage is the user setting a range of data that is to be perused, the server accessing an indexed database for that range, and then returning the data in a clean, simple, easy, pared down format.

This obviously has some rather large impact on response size – by limiting the actual amount of data to only what is needed, we are culling the “chaff”, and delivering the “wheat”.


IBM notes in its documentation at the IBM Decision Optimization Center the use of ranges specifically for the delivery of content related to HTTP status codes by stating a request with a given range defined under content-range, under the following structure:

Range: items =  - 

There are a range of responses that the API documentation notes, specifically referring to each method by the HTTP verbiage, but fundamentally speaking, the ranges are used as filters for each content response. When a 200 response is noted, the range given by the requester can result in delivering all 200 responses recorded for a specific server in a specific area – by delivering this data, and only this data, the response sends content that is needed for the process at hand.

Also about the API response: Best Practices for API Error Handling

Avoiding Underfetching and Overfetching

With all of this hinging on the concept of reducing the amount of data that is being transmitted, let’s look at the concepts of underfetching and overfetching, and the results of each.

These two concepts are very much “what it says on the tin” kind of concepts – simply put, over-fetching is delivering more data than is necessary or useful to the client, and under-fetching is not responding with enough data, often requiring a secondary call to another endpoint to complete the data set. These two can occur from the client-side, with poorly formed constraints to ranges and bad filtering, but they can often occur on the codebase as a symptom of poor scaling or design.

In terms of over-fetching, the issue is often the result of a poorly formed request meeting a default package response that is overly broad. When these two issues collide, with the user not specifying their requested data correctly and the API assuming they want literally everything they have, you get responses that are overly broad to the point of being absolutely useless to the end user.

On the other hand, under-fetching is almost always an issue on the server side. While incomplete requests can indeed come from the client submitting a request prematurely, under-fetching often comes from an API that has scaled out to add additional endpoints or nodes that handle greater data amounts. This is fine for handling more data, but without proper documentation or even implementation of collated endpoints, this can result in a client hitting a single endpoint, expecting everything they want, and then either getting incomplete results or error codes for unsupported parameters.

There are some ways this can be rectified. First of all, with proper API planning an architectural review can go a long way to ensuring these problems don’t crop up. By planning for average data usage and also considering the edge case as something that must be supported, poorly formed requests can be met with advice on how to properly form a request, rather than just an empty delivery. Likewise, understanding scaling while maintaining previous functionality is key to ensuring all data expected is actually delivered.

To take this a step further, something like GraphQL is very powerful against this type of issue, as the server does not have to guess what is wanted – the client specifically states their request, and what they get is only what they wanted.

It should be noted that over-fetching and under-fetching are not necessarily a problem with a specific language or framework, but instead is inherent in REST design. Accordingly, being aware of this as part of a greater consideration of API response size is important. Over-fetching obviously bloats these responses, but under-fetching can result in what could be done over a single or two calls being done over five or more calls, resulting in the end effect of a single call’s worth of data taking five calls to generate.

Balancing Experience

One of the great benefits of optimizing the response package is the fact that experience can be properly balanced for the actual qualities of each aspect of the package itself. In other words, depending on what the request looks like, the bulk of processing and restriction can be shifted to the responsible party.

Let’s take a look at a theoretical API to see how this would work for the provider and for the requester. Let’s imagine an API that serves geolocational data for a shipping company. The API shares locations of vehicles, average delivery time, average packet weight, payment processing, and additional ancillary features.


In our first example, there is no optimization of the API response package at all. When managers call the API to see where vehicles are, they get the sum total of vehicle data, payment processing information, etc.

For managers, this is problematic, as many of them use lightweight mobile devices in order to move from site to site. As such, the amount of data that is pushed to the devices is extremely heavy, and often results in slow loading, as the data is being served on a mobile-centric authenticated website.

This is especially problematic for the payment department, as they are only concerned about payments, but must wade through additional data for each call. The problem doesn’t just stop there, either – for each department, additional data that is unneeded ends up clogging the processing systems, and results in rather slow updating of internal data.

While the issues internally are problematic, the situation is even worse for the customers who have ordered shipping. The API that serves vehicle data also serves the expected delivery date, and this date is often slow to update due to the overwhelming amount of data that is being pushed through the system. This results in slow updates for customers, leading to lower satisfaction for the service in general.


Now let’s look at an optimized solution. Learning from their mistakes, the API developers have taken the monolithic API, and have redesigned it. First of all, the API is broken into a series of microservices – there’s no need for a single API to do so much and pass so much data. Straight off the bat, this results in increased efficiency, as the data being pushed by a single API has been broken into five APIs, theoretically cutting the maximum traffic load by 1/5th.

Once the API was properly broken into a series of microservices, the problem of large response packages was looked at. First, pagination was implemented as a method for the general managers to view active deliveries in a navigable way. Because the mobile site relies on showing data relevant to the manager, pagination has allowed for easier browsing and a greater amount of control over specific groupings of this data.

Taking it a step further, filtering was applied so that the payment processing offices could view payment details for specific trucks and regions. By applying this filtering, huge amounts of data can be thrown out from the request package, meaning that the payment offices, who handle a greater amount of calls than most departments in the organization, utilize less of the network while more efficiently handling the needed data.

Finally, ranges were specified as an option for the call itself. Now, the customer can choose to view the exact shipping information pertinent to their tracking ID number(s). This was put into place rather than customer ID lookups or other such solutions because, in this case, more than one shipment can be open at a time for a single consumer, yet the consumer may only care about a single shipment, and not the rest.

While this has obvious impacts on the organization’s efficiency, a big part of the savings actually comes in the form of reduced physical infrastructure need. By reducing the complexity of the API into a series of microservices, and reducing the actual response load dramatically, less infrastructure is needed, which not only greatly benefits the offices in rural counties, but indeed lowers the overall cost of operation company wide.

Final Thoughts: Why We Gain From Response Package Optimization

What is it about optimization that leads to such dramatic gains? Put simply, it’s a more direct line of communication between the resource provider and the resource requester. This means the response package is uniquely prime for optimization processes, and will have distinct and visible impacts on both sides of the equation.

At each stage of the API request and response encounter, the size and level of optimization has direct relation to the experience of the entity in question, whether it’s the developer or the receiver. In this way, even a small decrease of 5% in terms of processing required and size of response package can have a compounding effect on not only user experience, but developer experience.

Because of this, optimization is hugely powerful. The actual effect on experience due to faster loading, less overall stress on the network, and optimized codebase compounds dramatically, especially over thousands of calls.

Optimizing the response package is one of the most important techniques a provider can use when trying to make an efficient system. The more efficient the system, the less problems you should expect. The fact that it improves developer and consumer experiences should be enough – but even disregarding that, the operational benefits incurred by optimizing the response package are numerous.

What do you think is the best way to optimize the response packages for web APIs? Let us know in the comments below!