Making Fast APIs: Lessons Learned From 40 Years of SQL

SQL is old — about 40 years old, in fact. In the tech world, where a new framework or language pops up seemingly every day, 40 years feels downright archaic. Why is such an old language relevant to building APIs in 2018?

Because SQL is so old, it has a long history from which to learn. Joakim Lundborg, data scientist and engineer, points out that “SQL is a 4th generation language. It gives the user of this ‘API’ (if you consider it an API) full control over what data they are getting. And if you look at what’s been happening in the API space the past couple of years, both with GraphQL but also with other things like a JSON API … you get more control over what fields you are fetching.”

To Joakim, the parallels between SQL and current API development trends are clear. APIs are giving more control to applications and API consumers. There is less reliance on strict API definitions, like REST APIs have historically provided. Instead, consumers can choose exactly what they want to fetch based on their particular needs. This is very similar to the access patterns used when hitting SQL databases directly. This means those 40 years of SQL have a lot to teach today’s API developers in terms of speed and performance, regardless of the actual databases in use.

This post was inspired by Joakim Lundborg’s presentation at the Platform Summit 2017:

What Makes Something Fast?

It’s important to be on the same page about what fast really means when it comes to API design. This is challenging to define, but ultimately it comes down to thinking about the user, and following a few simple rules.

Rule 1 – Never Let the Humans Wait

According to Lundborg, “The core thing to think about when you design for performance is to think about the persons who will be exposed to the results of your API.” These people will vary from project to project. They might be the end users, or the application developers themselves.

Regardless, API design must always consider the human. Remember that humans are impatient creatures. They want things to happen immediately, and will seek out alternative API services if the performance is too bad. Don’t make them wait.

Rule 2 – Latency > Throughput

In the case of humans, when it comes to speed, the important thing is not raw throughput. A typical, absolute measure of speed might be the number of GB of data which are pushed through the API per second. To the average human, though, that is a lot of data.

What really matters is how long it takes from a button being clicked to seeing what they want on screen. APIs should be designed with this in mind, and prioritize latency (time waiting for the data) over throughput (the amount of raw data).

Latency Numbers Every Programmer Should Know- 2018. (Source / Fork)

The above chart illustrates the order of magnitude difference from accessing data in various locations. Locally cached data is on the left, with servers in the bottom right. APIs start out in that final section with the most latency, giving them a disadvantage, particularly as perceived by humans. It is essential for good API design to consider speed improvements that help mitigate this latency.

The NoSQL Movement

Before considering what SQL can teach API developers directly, there is a lot to be gleaned from the weaknesses of SQL. A whole movement was started by those who were frustrated with SQL but weren’t content being unhappy. Rather than sitting idly, they decided to improve the status quo by taking new approaches to database technologies. There are several lessons from these methods that can help any application requiring access to data.

Lesson Learnt – Model Data After Your Access Patterns

When freed from the framework of SQL, NoSQL technologists discovered a new flexibility in data structuring. Rather than being tied to a traditional structure of tables, NoSQL advocates structure data in a way which reflects how the user is accessing it. If the user is getting key/value pairs, a key/value store makes sense. Likewise, if the user is retrieving a bunch of graph data, use a data store that is able to easily handle graph data.

Lesson Learnt – It Pays to Keep Things Simple

SQL can make it hard to understand what’s going on in the backend, obfuscating the methods by which data is retrieved. When the database causes errors or performance issues, debugging is rough and sometimes problems simply can’t be solved.

On the other hand, a simple database that only allows for key/value access doesn’t have a lot of room for mistakes. While SQL wouldn’t support this simple structure, NoSQL offers the ability to keep data minimally structured. This simplicity can pay off greatly in terms of speed, scalability, and debugging.

Lesson Learnt – Consistency > Minimum Speed

NoSQL databases brought a new consistency to access speed, something which is crucial for the end user experience. With SQL, response times are inconsistent and it’s challenging to understand the discrepancies without a very high level of expertise.

Consistency is more important for users than the fastest (or slowest) speeds possible. Users crave consistency, they don’t want outliers. They want things to work exactly the same way, every time, so that outcomes are reliable. This holds true for any method of data access.

How to Make a Fast API

At the end of the day, when it really comes down to it, how is a fast API made? Some themes have already emerged from the weaknesses of SQL, but there are many more concrete steps which can be taken to improve the speed of any API. These come from lessons gleaned from decades of data access, and have a high impact on perceived speed as well as measured throughput.

1. Give API Consumers Control of What They Fetch

SQL makes it possible to retrieve exactly the desired data in one go. With joins and other fancy tricks, data can be aggregated across tables and returned without any extra information. Unfortunately, APIs often take away this ability and require users to request pre-determined collections of data.

Giving the consumer full control will improve API performance by reducing the weight of the data returned, thanks to very specific requests for data fields. It may also minimize the number of requests, if the consumer or application is able to get all required data in one go. This is one thing that is already starting to take off in today’s development trends, powered by technologies like GraphQL.

Read More on GraphQL in our free eBook: GraphQL or Bust

2. Help API Consumers Understand What’s Slow

A danger of giving API consumers more control over the data they fetch is that they can create situations in which API performance is quite poor. This means that query planning can no longer be an afterthought – the end users must be able to understand what they’ve done and why it is slowing things down.

The solution proposed by Joakim in his talk is to steal the EXPLAIN method from SQL. In SQL, placing the EXPLAIN command in front of any query will return data explaining the database’s plan for executing the query. The data includes information like which tables will be consulted, which aggregations will work, which plans were considered but discarded, and so on.

At this time, a comprehensive API equivalent doesn’t really exist. API consumers, especially those using GraphQL APIs, could really use such a solution. Some tools are heading in the right direction, like pipe tracing, but this is an area quite ripe for improvement.

3. Know the Data Access Patterns

If it’s not possible to give API consumers full control of what they fetch, the API can certainly be optimized for common data access patterns. These are discovered by logging user actions, and actually taking the time to read the logs. From here helpful information can be aggregated about commonly accessed data sets, API consumer actions which currently require multiple API calls, and more.

After recognizing common access patterns, the first lesson from the NoSQL movement comes into play. As much as possible, the API should be modeled after the most common access patterns. This will allow the majority of users to get the most relevant data faster.

4. Avoid the N+1 Problem – Understand the Available Data

Consider an SQL query like this:

SELECT thing FROM huge_list
JOIN all_the_related_things

This query might make the DB admin a bit upset, but it is capable of requesting all necessary data in a single query. Next, consider a pretty typical REST API call:


const things = await fetch_huge_list();
const result = things.map(thing => {
fetch_all_the_related_things(thing);
}

In this example, the API consumer requests an array of data. Once they receive the data, they iterate over each item and make an additional API request to collect additional data for that specific item. This is known as the N+1 problem, where a single (the +1) request is made, and then for N items in the returned array N requests are made to the API. Not a huge deal when N is just a few items, but it can very quickly spiral out of control for large datasets.

Historically, this was referred to as an ORM (Object-Relational Mapping) problem. When placing the ORM layer on top of the SQL database, this removes the end user’s ability to freely access nice abstractions like SQL joins. Depending on their particular needs, this could cause massive performance issues.

Today, ORMs are not the problem – rather, it’s a case of a bad abstraction. The same problem will appear in any API which is designed to force data requests to be made in multiple steps. In many ways, according to Joakim, “puritan REST API design is the worst thing you can do for API performance because it forces you into this problem.”

The solution here is to have a complete understanding of the API and how it is being utilized. The design of the API must consider both the data available in this particular API, and how the data will be fetched. From there, good API design will make it easy for users to fetch the important stuff all in one go, rather than making a loop full of requests.

5. Design the API Experience for Low Latency

Thinking about the API experience through the eyes of the API consumer is very important. Remember that it’s crucial for the API to feel fast, rather than be measured as fast. The average human API consumer cares more about how quickly the data reaches them than raw throughput.

This is the key step, according to Joakim, who says “You don’t want to design your API to take several steps to get the data you want.” Consider ways to make the API feel fast and have low latency, even if objective speeds for large amounts of data aren’t quite the same.

6. Design the API to Cache Data

Another way to improve latency in an API is by providing the ability to cache data locally. In this graphical representation of latency, APIs are in the network category. This is the category with the highest latency, making it the most painful for the end user.

By designing an API capable of caching data, this moves access from the red (network) category, into the black and blue areas. Having this data available locally reduces latency, making a magnitude of difference. The cache itself can also be optimized by placing it as close to the end user as possible, reducing most local latency.

7. Design the API to Prefetch Data

One way to mitigate the slowness caused by users making multiple API requests is to design the API with the ability to prefetch relevant data. By doing things like studying access logs, it’s possible to determine actions that API consumers frequently do next. From there the API can start to request this data before the user does.

An example of API prefetching would be how Instagram begins to upload photos in the background of the application, even before the user hits the upload button. While this doesn’t improve raw throughput, it has a huge impact on perceived speed. Designing APIs to allow for this sort of thing is crucial, and like caching it makes a magnitude of difference.

8. Remember – It’s Not About the Tools, It’s How They’re Used

At the end of the day, the particular database or language used to power an API doesn’t really matter. It’s totally possible to design an API using any hot new technology, like GraphQL, that still performs terribly. On the other hand, when designing with performance in mind, an old-school SQL database can power a lightning fast API. It’s not about which tools are used, but rather about fully understanding them and utilizing them in the smartest way possible.

Conclusion

SQL could definitely be considered one of the first APIs. 40 years of database design, implementation, and access has a lot to teach about how consumers request and use data. These lessons point to clear methods and approaches which improve API performance. At the end of the day, it really boils down to thinking about the consumer, structuring data appropriately, decreasing latency, and considering perceived speed over raw throughput. In production, monitor logs for common data access patterns and continue to iterate and provide efficient ways to access frequently used data. Keep things fast, and more importantly consistent, and API consumers will keep coming back.