Metrics are crucial for any product. So, what are some KPIs that API companies should monitor? Here are 13 useful infrastructure and product metrics.

The Triangle of API Observability and Analytics

When it comes to API observability and analytics, your metrics can be thought of as forming a triangle: infrastructure metrics for stability, application metrics for solving business operational problems, and product metrics for managing classical business issues.

Metrics are also dependent on where you lie in the product lifecycle. A recently launched API will focus more on improving design and usage while sacrificing reliability and backward compatibility. Whereas a team supporting a well-adopted enterprise API may concentrate more on driving additional feature adoption per account and give precedence to reliability and backward compatibility over design.

Who Cares About API Metrics?

There are typically a few teams that care about API metrics:

  • Product Management: API product managers are in charge of roadmapping API features, ensuring the right API endpoints are being built. They must also balance the needs of customers (whether internal or external) with engineering time and personal constraints.
  • Business/Growth: Business-facing teams, such as marketing and sales, are not thinking in terms of API endpoints. Instead, they are mostly interested in customer adoption and ensuring customers are successfully using the APIs. They also appreciate knowing where users come from and which could be new sales opportunities.
  • Application Engineering/Platform: API developers are responsible for adding new features to APIs while debugging application-specific issues in the API business logic. These products could be API-as-a-Service, plugins, integrations for partners, APIs incorporated in a larger product, or other types.
  • Infrastructure/DevOps: Infrastructure and DevOps teams utilize metrics to ensure the servers are running and that limited resources are correctly allocated. This data could be useful for multiple engineering teams.

Infrastructure API Metrics

Here are some helpful infrastructure API metrics to consider tracking. Many of these metrics are the focus of Application Performance Monitoring (APM) tools and infrastructure monitoring companies like Datadog.

1: Uptime

While one of the most fundamental metrics, uptime is the gold standard for measuring the availability of a service. Many enterprise agreements include an SLA (Service Level Agreement), and uptime is usually rolled up into that. Many times, you’ll hear phrases like triple nines or four nines. These refer to percentage figures that measure how much uptime there is per year.

Availability %Downtime per year
99% (“two nines”)3.65 days
99.9% (“three nines”)8.77 hours
99.99% (“four nines”)52.60 minutes
99.999% (“five nines”)5.26 minutes

Of course, going from four to five nines is far harder than going from two to three nines, which is why you won’t see five nines except with the most mission-critical (and expensive) of services.

With that said, certain services can actually have lower uptime while ensuring graceful handling of outages without impacting your service. Uptime is most commonly measured via a ping service or synthetic testing such as via Pingdom or UptimeRobot. You can configure probes to run on a fixed interval, such as every minute, to probe a specific endpoint such as /health or /status. This endpoint should have basic connectivity tests such as to any backing data stores or other services. You can easily publish these metrics on your website using tools like Statuspage.io.

More sophisticated ping services called Synthetic testing can perform more elaborate test setups such as running a specific sequence and asserting the response payload has a particular value. Keep in mind, though — synthetic testing may not be representative of real-world traffic from your customers. You can have a buggy API while maintaining high uptime.

What is Synthetic Monitoring?
As the name implies, synthetic monitoring is a predefined set of API calls that a server (usually a Monitoring service) triggers to call your service. While it doesn’t reflect real-world user experiences, it is useful to see the sequence of these APIs perform as expected.

2: CPU Usage

CPU usage is one of the most classic performance metrics that can be a proxy to application responsiveness. High Server CPU usage can mean the server or virtual machine is oversubscribed and overloaded, or it can mean a performance bug in your application, such as too many spinlocks. Infrastructure engineers use CPU usage (along with its sister metric, memory percentage) for resource planning and measuring overall health. Certain types of applications, like high bandwidth proxy services and API gateways, naturally have higher CPU usage, along with workloads that involve heavy floating-point math such as video encoding and machine learning.

When debugging APIs locally, you can easily see the system and process CPU usage via Task manager on Windows (or Activity Monitor on Mac). However, you probably don’t want to be SSH’ing and running the top command on a server. This is where various APM providers can be useful. APMs typically include an agent that you can embed in your application or on the server that captures CPU and memory usage metrics. It can also perform other application-specific monitoring like thread profiling.

When looking at CPU usage, it’s essential to look at usage per virtual CPU (i.e., physical thread). Unbalanced usage can imply applications not correctly threaded or an incorrectly sized thread pool.

Many APM providers enable you to tag an application with multiple names so you can perform rollups. For example, you may want to have a breakout of each VM metrics, like _my-api-westus-vm0_, _my-api-westus-vm1_, _my-api-eastus-vm0_, etc. while having these rolled up in a single app called _my-api_.

3: Memory Usage

Like CPU usage, memory usage is also a good proxy for measuring resource utilization. CPU and memory capacity are physical resources, unlike other metrics, which may be more configuration-dependent. A VM with extremely low memory usage can either be downsized or have additional services allocated to that VM to consume additional memory. On the flip side, high memory usage can be an indicator of overloaded servers.

Traditionally, big data queries, stream processing, and production databases consume much more memory than CPU. In fact, the size of memory per VM is a good indicator for how long your batch query can take as more memory available can reduce checkpointing, network synchronization, and paging to disk. When looking at memory usage, you should also look at the number of page faults and I/O ops. A common mistake is configuring an application to allocate only a small fraction of available physical memory. This can cause artificially high page virtual memory thrashing.

Application API Metrics

4: Request Per Minute (RPM)

RPM (Requests per Minute) is a performance metric often used when comparing HTTP or database servers. Usually, your end-to-end RPM will be much lower than an advertised RPM, which serves more as an upper bound for a simple “Hello World” API. This is because a server will not consider latency incurred for I/O operations to databases, 3rd party services, etc.

While some like to brag about their high RPM, an engineering team’s goal should be efficiency and attempt to drive this down. Certain business functions requiring many API calls can be combined into fewer API calls to reduce this number. Common patterns like batching multiple requests in a single request can be very useful, along with ensuring you have a flexible pagination scheme.

Your RPM could also vary depending on the day of the week or even the hour of the day — especially if your consumers exhibit lower usage during nights and weekends. Some situations warrant tracking more fine-grained application metrics, such as RPS (Requests per Second) or QPS (Queries per Second).

5: Average and Max Latency

One of the most important metrics used to gauge customer experience is latency. While an increase in infrastructure-level metrics like CPU usage may not actually correspond to a drop in user-perceived responsiveness, API latency definitely will.

However, tracking latency by itself may not provide a full understanding of why an increase occurred. Thus, it’s important to follow how latency is affected by API changes, such as releasing new versions, adding endpoints, or changing the API schema. This can help reveal the root cause of latency increases.

Since problematic endpoints may be hidden when looking only at aggregate latency, it’s critical to look at latency breakdowns by route, geography, and other fields. For example, you may have a POST /checkout endpoint that’s slowly been increasing in latency over time, which could be due to an ever-increasing SQL table size that’s not correctly indexed. However, due to a low volume of calls to POST /checkout, this issue is masked by your GET /items endpoint, which is called far more than the checkout endpoint. Similarly, if you have a GraphQL API, you’ll want to look at the average latency per GraphQL operation.

We put latency under application/engineering even though many DevOps/Infrastructure teams will also look at latency. Usually, an infrastructure person looks at aggregate latency over a set of VMs to ensure the VMs are not overloaded, but they don’t drill down into application-specific metrics like per route.

6: Errors Per Minute

Like RPM, Errors Per Minute (or error rate) is the number of API calls with a non–200 family of status codes per minute. Tracking your error rate is critical for measuring how buggy and error-prone your API is.

It’s essential to understand what type of errors are occurring. 500 errors could imply code errors on your end, whereas many 400 errors could imply user errors from a poorly designed or documented API. This means when designing your API, it’s vital to use the appropriate HTTP status code.

You can further drill down to see where these errors come from. Many 401 Unauthorized errors from one specific geographic region could imply bots are attempting to hack your API.

API Product Metrics

APIs are no longer just an engineering term associated with microservices and SOA. API-as-a-product is becoming far more common, especially among B2B companies who want to one-up their competition with new partners and revenue channels. API-driven companies need to look at more than just engineering metrics like errors and latency to understand how their APIs are used (or why they are not being adopted as fast as planned). The role of ensuring the right features are built lies on the API product manager, a new role that many B2B companies are rushing to fill.

7: API Usage Growth

For many product managers, API usage (along with unique consumers) is the gold standard to measure API adoption. An API should not be just error-free but should demonstrate growth over time. Unlike requests per minute, API usage should be measured in longer intervals like days or months to understand real trends. If measuring month-over-month API growth, we recommend choosing 28-days, as it removes any bias due to weekend vs. weekday usage and differences in the number of days per month. For example, February may have only 28 days, whereas the month before has a full 31 days causing February to appear to have lower usage.

8: Unique API Consumers

Since a month’s increase in API usage could be attributed to just a single customer account, it’s important to measure the number of unique monthly customers. Monitoring your Monthly Active Users (MAU) can provide the overall health of new customer acquisition and growth. Many platform teams correlate API MAU to their web MAU to get full product health. If web MAU is growing far faster than API MAU, this could imply a leaky funnel during integration or implementation of a new solution. This is especially true when the company’s core product is an API; such is the case for many B2B and SaaS companies. On the other hand, API MAU can be correlated to API usage to understand where increased API usage came from (New vs. existing customers).

9: Top Customers by API Usage

For any company focusing on B2B, tracking the top API consumers can reveal how your API is used and where upsell opportunities exist. Many experienced product leaders know that many products exhibit power law dynamics, with a handful of power users having a disproportionate amount of usage than everyone else. Not surprisingly, these are the same power users that generally bring your company the most revenue and organic referrals.

This means it’s critical to track what your top ten customers are actually doing with your API. You can further break this down by what endpoints they are calling and how they’re calling them. Do they use a specific endpoint much more than your non-power users? Maybe they found an “ah-ha” moment with your API, whereas other consumers haven’t.

10: API Retention

Should you spend more money on your product and engineering or put more money into growth? Retention and churn (the opposite of retention) can tell you which path to take. A product with high product retention is closer to product-market fit than a product with a churn issue.

Unlike subscription retention, product retention tracks the actual usage of a product. While the two are correlated, they are not the same. In general, product churn is a leading indicator of subscription churn since customers who don’t find value in an API may be stuck with a yearly contract while not actively using the API. API retention should be higher than web retention. Whereas API retention looks at post-integrated customers, web retention will include customers who logged in but didn’t necessarily integrate with the platform yet.

11: Time to First Hello World (TTFHW)

TTFHW is an important KPI for not just tracking your API product health but your overall Developer Experience (DX). Especially if your API is an open platform attracting 3rd party developers and partners, you want to ensure they can get up and running as soon as possible. TTFHW measures how long it takes from the first visit to your landing page to a first transaction through your API platform. This is a cross-functional metric tracking marketing, documentation, tutorials, to the API itself.

12: API Calls Per Business Transaction

While more equals better for many product and business metrics, it’s important to keep the number of calls per business transaction as low as possible to reduce overhead. This metric directly reflects the design of the API. If a new customer has to make three different calls and piece the data together, the API does not have the correct endpoints. When designing an API, it’s essential to think in terms of a business transaction and what the customer is trying to achieve, rather than just features and endpoints. A high number of calls per business transaction may also mean your API is not flexible enough when it comes to filtering and pagination.

13: SDK and Version Adoption

Many API platform teams may also maintain a bunch of SDKs and integrations. Unlike mobile, where you just have iOS and Android as the core mobile operating systems, you may have tens or even hundreds of SDKs. This can become a maintenance nightmare when rolling out new features. You may selectively roll out critical features to your most popular SDKs, whereas less critical features may be rolled out to less popular SDKs. Measuring API or SDK version is also important when it comes to deprecating certain endpoints and features. You wouldn’t want to deprecate the endpoint that your highest paying customer is using without some consultation on why they are using it.

Business and Growth

Business and growth metrics are similar to product metrics but focus on revenue, adoption, and customer success. For example, instead of looking at the top ten customers by API usage, you may want to look at the top ten customers by revenue, then by their endpoint usage. For tracking business growth, it would be beneficial to use analytics tools that support enriching user profiles with customer data from your CRM or other analytics services to better understand who your API users are.

Conclusion: Track the Right API Metrics

For anyone building and working with APIs, it’s critical to track the correct API metrics. Most companies would not launch a new web or mobile product without the proper engineering and product instrumentation. Similarly, you wouldn’t want to launch a new API without a way to track the right API metrics.

Sometimes, KPIs for one team can blend into another team, as we saw with the API usage metrics. Also, there can be different ways of looking at the same underlying metric. However, teams should stay focused on looking at the right metrics for their team. For example, product managers shouldn’t worry as much about CPU usage, just like infrastructure teams shouldn’t worry about API retention.