How to Avoid Overfetching and Underfetching

A couple of years ago, I was asked to review the AssemblyAI Speech-to-Text API, which had been voted the Best Public API of 2020 here at Nordic APIs. I ran this marvelously innovative API through the paces with a reading of Edgar Allan Poe’s Ligea. I pasted the lengthy results into a Google Doc as part of my review and sent them off, feeling satisfied, only to get a bemused response from my fearless, thorough, and endlessly patient editor. “You know we can’t publish a 300-page article, right?”

This humorously embarrassing incident, in addition to being a cautionary tale of always double-checking your work before submitting an assignment, is also a useful illustration of overfetching, the technological equivalent of over-sharing.

Overfetching is always a risk when dealing with APIs. So is underfetching, which might be slightly less risky but no less irritating or deadly for productivity. Let’s take a little look into overfetching and underfetching, first by defining each and then looking at some strategies to help you avoid these common pitfalls in your own development projects.

What Is Overfetching?

Although a result of user error instead of the software itself, our Ligeia illustration at the beginning is one example of overfetching. To start with a brief definition, overfetching is simply when an API request returns too much data that you’re not going to use. The simplest practical scenario would be requesting a list of usernames from a database and having the entire archive returned.

First of all, this is highly inefficient to the point of being counterproductive. You can’t find the data you need, and you’re left sorting through the vast mountain of data you receive. This can be more time and labor-intensive than simply looking up the data manually.

Secondly, and even more importantly, the true risk of overfetching becomes apparent when dealing with data limitations. Imagine what could happen if you’re on a network with a pricey data limit. How much could terabytes of unwanted information cost when you’re charged for overages?

What Is Underfetching?

Underfetching is less likely to bankrupt your organization, but it can be just as much of a problem for your program’s performance and efficiency. Again, to start with a definition, underfetching is when an API call doesn’t have enough data sent to an endpoint, necessitating another call.

Every additional API call adds precious processing time to each interaction, lengthening your app or website’s response time. This can be the difference between life and death for an app, especially for mobile users, who will abandon an app if it takes too long to respond.

Both overfetching and underfetching are known to be some of the biggest pitfalls of using REST. Fortunately, there are numerous ways you can avoid overfetching and underfetching in your development projects.

How To Avoid Overfetching and Underfetching

Overfetching and underfetching are known to be some of the biggest drawbacks of using REST. Since it’s so prevalent, there are some dedicated solutions to the problem. You can also include techniques in your own code if you want to stick with REST.

Let’s look at some popular methods for avoiding overfetching and underfetching in your apps.

GraphQL

Overfetching and underfetching have been plaguing developers using REST to query APIs for some time. It’s one of the main reasons the query language GraphQL was created in the first place.

It’s notoriously complicated to structure return data to suit everybody who will ever use your API. Overfetching and underfetching are both common results of improperly formatted data.

GraphQL entirely eradicates this issue by acting as a gateway for every API call and interaction. Raw data is consumed and then converted into GraphQL’s strongly typed format. All requests are consolidated into a single call, and GraphQL queries will only return the data you request.

Getting the data you need from GraphQL requires understanding how GraphQL queries are structured. Understanding GraphQL resolvers is helpful, as well, to make constructing GraphQL queries as easy as possible.

Fortunately, GraphQL queries are nearly identical to JSON files, so anyone familiar with working with APIs shouldn’t have much difficulty grasping the concept.

To illustrate the point, here’s an example of what a GraphQL query looks like:

query {
  student(id: "student1") {
    name,
    courses {
      title
    }
  }
}

Now take a look at how resolvers are structured in GraphQL:

const resolvers = {
  Query: {
    student: (root, args, context, info) => { return students[args['id']] }
  }
}

root specifies the result from the Parent, in this case, Query. args are the variables passed on to the resolver. context is additional data like common configurations, and info is field information such as fieldName, fieldNodes, returnType, and so on.

The necessary configuration can be stored inside a schema file. A hypothetical GraphQL schema file might look like this:

type Query {
    student(id: String!): Student
  }

  type Course {
    id: String!
    title: String
  }

  type Student {
    id: String!
    name: String
    courses: [Course]
  }

Resolver files can be stored in their own file as well, to help keep things tidy. An example resolvers.js file could look like this:

var students = {
  'student1': {
    id: 'student1',
    name: 'karthik',
    courses: ['math101', 'geography201']
  },
  'student2': {
    id: 'student2',
    name: 'john',
    courses: ['physics201', 'chemistry103']
  },
};


var courses = {
  'math101': {
    id: 'math101',
    title: 'Intro to algebra',
  },
  'geography201': {
    id: 'geography201',
    title: 'Intro to maps',
  },
  'physics201': {
    id: 'physics201',
    title: 'Intro to physics',
  },
  'chemistry103': {
    id: 'chemistry103',
    title: 'Intro to organic chemistry',
  },
};

const resolvers = {
  Query: {
    student: (root, args, context, info) => { 
      return students[args['id']]
    }
  }
}

module.exports = resolvers

This resolver accepts an ID as an argument, which then returns the student name from the corresponding object.

But GraphQL isn’t the only way to avoid overfetching and underfetching. Let’s finish up with a few tips to avoid these pitfalls when you’re using REST.

Create Proper Endpoints

One common cause of both overfetching and underfetching is not having the right endpoints. Sending too much data to a single endpoint can cause the data dumps that result in overfetching. Improperly formatted endpoints can necessitate multiple calls to yield all the data you need, though.

Following a logical endpoint strategy is one way to avoid these problems. One such strategy might be to have an endpoint for every object in your JSON file. This can sometimes lead to too many endpoints, however, resulting in a disorganized and messy file structure.

Pagination

Last but not least, pagination is one of the simplest ways to avoid overfetching if you’re using REST as a query language. Pagination breaks down the data returned from an API into a customizable result.

Pagination can be configured in nearly any way you can think of. Offset pagination tells the database if any objects need to be skipped in a query. Keyset pagination will return any query with a timestamp to a page of predetermined length until the end of the results are reached. An example of a keyset pagination query could look like this:

GET /items?limit=20&created:lte:2019-01-20T00:00:00

Seek pagination returns results past a certain point.

GET /items?limit=20&after_id=20

Cursor pagination lets you specify how many results are returned per page, as well. Be advised, many APIs have a maximum of 100 results. An example of a cursor pagination query requesting 100 results:

"tickets":[
   "..."
],
"meta":{
   "   ""has_more":true,
   "   ""after_cursor":"xxx",
   "   ""before_cursor":"yyy"
},
"links":{
   "   ""next":"https://example.zendesk.com/api/v2/tickets.json?page[size]=100&page[after]=xxx",
   "   ""prev":"https://example.zendesk.com/api/v2/tickets.json?page[size]=100&page[before]=yyy"
}

Overfetching and Underfetching: Final Thoughts

Overfetching and underfetching aren’t only a problem in REST. They’re a common problem in languages like Ruby, also. Whether you’re designing or consuming APIs, it’s something you should wrap your mind around to ensure your software, apps, and websites operate at peak performance.

As we have seen, using a query language like GraphQL eliminates the problem entirely by consuming raw data and converting it into a uniform format. GraphQL has its drawbacks as well, of course, so it’s worth having some additional methods on-hand to eliminate these issues.

If you’re an API developer, simply spending some time thinking about API design will help you avoid overfetching and underfetching. If you’re an API consumer, formatting your queries correctly can also prevent these issues.

Learning how to deal with overfetching and underfetching is integral to using APIs properly. It ensures you won’t get buried beneath a landslide of unnecessary data and that your apps will be as responsive as possible.