5 Things That Cause High Latency in Your APIs (and How to Fix Them)

Nothing frustrates a user more than a slow or non-responsive website or application. This is especially true in ecommerce, where slow-loading pages lead to high bounce rates and lower conversion rates. Often, the hidden culprit behind delays is a high-latency API.

Many app developers integrate multiple third-party APIs, cumulatively adding more latency. As more APIs enter the mix, developers look to those third-party API providers to make low latency and high performance a priority. In this article, we’ll highlight five common things that slow down APIs and ways API designers can fix them.

1. Poor Database Performance

One of the most common causes of high API latency is the database. API designers often run into database-related issues that slow down the API, such as:

Missing indexes: A database needs proper indexing. Otherwise, it will perform full table scans, which slows down queries.
Connection overhead: Every time an API interacts with a database, it establishes a fresh connection, which adds latency to each request and can be resource-intensive (unless you pool connections).
N+1 query problem: Where an API endpoint makes an initial database query to fetch a list of items and then executes additional queries (N+1 queries) to fetch the associated data for each item in the list. These extra queries slow the database and the API down.

How to Fix

Start by working with your backend developers or engineers to make sure frequently queried fields have proper indexes and add indexes where missing. Next, implement connection pooling to better manage connections and reduce connection overhead. Finally, try fixing N+1 queries using joins or batch queries so the API doesn’t make too many small queries.

2. Blocking Operations

Some APIs perform long-running or compute-intensive tasks, such as processing large file uploads or rendering AI-generated images. Blocking operations means that when the API sends a request to the server to perform a specific task, it waits until the server responds and completes that task. It then moves on to the next one. Waiting for each operation to be finished one by one linearly significantly increases API latency and decreases API responsiveness.

How to Fix

If you’re just starting to plan and design your API and are running into blocking issues, you should consider designing it to be asynchronous instead of synchronous. An asynchronous API can handle multiple requests concurrently, preventing blocking operations. It uses mechanisms (callbacks, promises, and futures) to specify what happens once asynchronous operations are completed.

If you need an existing synchronous API to perform long-running or compute-intensive tasks, you’ll need to incorporate asynchronous patterns like status resources, callbacks, and webhooks. Incorporating asynchronous processing into your API allows it to handle long-running tasks efficiently and deliver faster response times.

3. Lack of Caching Techniques

API requests often involve accessing the same data or performing the same computations repeatedly, making server-side caching critical to building efficient APIs. Without a caching strategy, these redundancies create unnecessarily slow response times. In terms of APIs, caching involves temporarily storing data, computation results, or database queries so that they are only loaded once and not multiple times.

How to Fix

There are several server-side caching mechanisms you could implement to improve API latency. For example, you could define HTTP caching headers, such as Cache-Control and Etag. These definitions provide clients with caching instructions and the version of each resource. If your API accesses a database and you’re using a shared cache, you’ll need to select a suitable caching approach, such as:

Cache aside: Before pulling data from a database, the API checks the cache to see if the data already exists there. If it does, the API uses that instead.
Read through: When a cache miss occurs, the cache itself fetches data from the database and stores it.
Write back (also known as write behind): Data changes are written to the cache first and then written to the database asynchronously after a set period of time. This approach has a risk of data loss if the cache crashes before data is written to the database.
Write through: When data is written, it’s sent to both the cache and the database simultaneously.

A memory store tool like Redis or Memcached allows you to implement in-memory caching, which can significantly speed up responses from API requests. You can also use these tools to apply the shared cache approaches mentioned above.

4. Excessively Large Payloads

If your API returns massive payloads, it can take a lot of time to parse and send the data to the client. After all, the larger the payload, the longer it takes to reach the client, resulting in higher latency. Some APIs over-fetch data and return far more data than the client needs. For example, an API might return a user’s entire profile and history when the client only requires the user’s name and profile picture.

How to Fix

Design your API so that it only sends the minimum data necessary to complete each request. You can do this by applying payload optimization techniques, such as pagination and field filtering. You could also use data compression or Protocol Buffers to reduce payload sizes. If your API returns responses in JSON format, consider minimizing it by reducing unnecessary fields and nested JSON objects. By applying these payload optimization approaches, you can improve the latency of your API.

5. High Network Latency

If the network has high latency, your API will slow down or become non-responsive. Factors that impact network latency include:

Distance between the client and the server
Number of network hops
The type of load balancer you choose

Some network performance issues are outside your control. However, you can take steps to avoid network-related high latency.

How to Fix

Start by hosting your servers and databases at locations closer to API consumers. Another way to bring your API closer to consumers is to use a content delivery network (CDN) for cacheable content. CDN servers are typically distributed across multiple locations.

You can also opt for a cloud service like AWS Global Accelerator to reduce network hops for dynamic API calls. Next, try reducing the number of network hops via methods like optimized transit routing or virtual private cloud (VPC) peering.

Finally, you may need to move from an application load balancer (ALB) to a network load balancer (NLB) if ultra-low latency is critical to your API or you anticipate unexpected and extreme traffic spikes. All these steps can help you mitigate network speed issues.

Design Your APIs with Speed in Mind

With most modern websites and applications depending on numerous internal and external services, the cumulative impact of latency cannot be ignored. When designing each API, consider low response times as a factor critical to product success. By focusing on these five areas, you’ll provide app developers with fast APIs that ensure a positive experience for end users that helps increase conversion rates.

AI Summary

This article explains five common technical causes of high API latency and outlines practical design and implementation strategies to reduce response times and improve application performance.

Poor database performance is a major contributor to API latency, often caused by missing indexes, excessive connection overhead, and inefficient query patterns such as the N+1 query problem.
Blocking operations increase latency by forcing APIs to wait on long-running tasks, which can be mitigated by designing asynchronous APIs or introducing non-blocking patterns like callbacks, status resources, and webhooks.
A lack of effective caching leads to repeated data access and computation, making techniques such as HTTP caching headers, shared caches, and in-memory stores critical for improving response times.
Excessively large payloads slow down APIs by increasing transfer and parsing time, which can be addressed through pagination, field filtering, compression, and more efficient data formats.
Network latency caused by physical distance, routing complexity, and infrastructure choices can be reduced through geographic optimization, content delivery networks, and appropriate load balancing strategies.

Intended for API providers, backend engineers, and API architects responsible for designing, operating, and optimizing high-performance APIs.