API metrics are perhaps the single most important factor in improving any API system. Metrics are inherently valuable – tracking data on API usage, availability, uptime, and other insights is pivotal to keeping a consistently healthy platform. That being said, it’s an unfortunate truth that many developers do not leverage API analytics to their full power, simply preferring to consider metrics a business tool, and little else.
This is a shame. While API metrics undoubtedly serve a very important business role in the modern API landscape, they can be leveraged to greater heights, serving to amplify business choices and technical solutions in a true approach towards platform empowerment.
In this piece, we’re going to discuss API metrics at large, citing their importance and specific value to API providers. With this in mind, we’re going to tackle a few metrics that are vitally important to track, and identify strong methodologies for deriving this data.
Data is extremely valuable as it can be used to make an intricate system more transparent. While this is obviously a benefit to web API hosts, it can be hard to pinpoint what specific indicators should be monitored. For this reason, here are a few KPIs, or Key Performance Indicators, to consider when generating and reviewing API metrics.
Availability and Uptime
The most important metrics when it comes to something as living and high-demand as an API are availability and uptime. Whereas uptime describes whether the service is “on” or “off,” availability is a bit more nuanced, as it tracks how often the service has failed, for how long, and for what purpose.
Let’s imagine a server that is serving media data via an encoded stream. When discussing uptime, we would want to know a percentage of the time where the service was active. This metric might be 99.999%, which is an absolutely monster metric – this would mean that in a year, the system was only down for a total of 5 minutes and 15.6 seconds.
That’s not really a useful metric, however – it only tells us the total amount of time in which the resource was technically unreachable. Availability is more concerned with not only whether the resource was available or not, but whether or not the access was full, unrestricted, and unhampered. While an API could have 99.999% availability simply by being able to call the API endpoint, if the authentication server is overwhelmed by connections and rejected 1 in 5 connections, in theory that uptime is maintained while the availability is terrible.
Accordingly, when working within these metrics, we should absolutely contextualize them, and extract meaning from them. Uptime is not everything, and without uptime, availability cannot be determined – they must work in concert with one another.
Responsiveness and Latency
Similarly, tracking responsiveness and latency adds a whole secondary layer to the concept of availability, by strictly considering whether or not data was easily interactable and was indeed responsive to requests within a normal, specifically defined time set.
In our example, we could have 99.999% uptime, and we could even have 99.999% effective call service, but if each call took 5 hours to respond to, then our user experience is not what might be extrapolated from the availability and uptime metrics by themselves. If the latency between call made, call responded to, and data sent is too high, and responsiveness itself so astronomically low, the actual usability of an API is questionable.
Conversely, we could have great responsiveness, say within milliseconds, and yet still have bad availability or incredibly high latency. This might occur when the server processing the API data itself is separated from the endpoint in which the call is made against, resulting in many calls failing, but those calls that succeeded being serviced immediately to a slow external line. This would basically be a low availability, fast response, high latency interaction, which is a trifecta of bad situations.
Whereas availability and uptime describe the values of a data service, they don’t necessarily in themselves contextualize the data as such. Responsiveness, however, does – especially when this data is broken out geographically. Being able to identify failures in data processing per geographic location and then contextualizing this within the low and high responsive values can help contextualize the absolute values created by analyzing availability and uptime.
In other words, availability and uptime tell you if there’s a problem – responsiveness and latency are cues into identifying why problems exist.
A hugely powerful metric is the KPI of endpoint valuation. When we engage in the discovery of this KPI, what we’re really asking is this simple question – what’s going on with the endpoints? This is a general question, and it entails a lot of data that is hugely valuable. For endpoint analytics, consider finding answers to questions like:
- Frequency: Do we have endpoints that are used more often than others? At what rate are they used?
- Utilization comparison: What are our least used endpoints, and do we have any data as to why they might be so under-utilized?
- Traffic: What endpoints are hit the most with malicious traffic?
- Vulnerabilities: When data breach does occur, or when penetration testing has shown vulnerability, what endpoints are responsible?
The responses can inform providers about the health of an API in general, including platform insights such as:
- Bloat: If we have a great number of unused endpoints, it suggests we are supporting large portions of codebase that is no longer needed – this is unnecessary bloat.
- Service Efficiency: If we have one or two endpoints that are constantly hit, we need to look at them. If they cover multiple services, those services should be broken into their own endpoints for a true microservice approach.
- Security: If we have vulnerable endpoints, we have an issue in our codebase allowing for vulnerabilities to be easily detected and exploited.
The list goes on and on, but simply looking at the endpoints rather than the codebase itself can inform much of our understanding concerning the system at large. Think of these metrics as symptoms to a doctor – while issues might be at any step in the API system, identifying the symptoms at the endpoint level informs how we may solve the issue.
So far we’ve talked about technical concerns, but there are also business KPIs that inform us as to the API developer experience. One such data metric is the idea of conversions, or the rate at which a consumer takes a desired action. This is typically an action such as clicking through an ad on a website, but in the API space, it can be everything from registering for a premium account to submitting social information.
The key concept in conversion is that if the user sees value in a product, they are more likely to proceed. Let’s go back to our media encoding example to see how this works.
In our case, the overwhelming conversion data suggests that users are willing to register for the free account, but are unwilling to convert to a premium account. This tells us a few things about our API, when paired with additional data:
- Our availability data suggests that premium accounts do not have high availability due to a relatively lower amount of server power dedicated to premium account authentication;
- Our uptime data suggests that all accounts are satisfied with their uptime rates, as the rate exceeds 99.99%;
- Our latency shows that, while there is some data slowing due to non-premium delivery methods, overall our latency is acceptable.
Looking at this data, we can extrapolate what our conversion data suggests. While our latency is acceptable regardless of account type, it seems that we’ve not allocated enough space to premium account authentication, resulting in many users not seeing the actual value of the premium system.
This can easily be rectified by separating the authentication server for premium accounts from the typical server stack. This increase in user experience could result in not only greater premium account adoption (and thereby an increase in revenue), but, in theory, a better user experience in general.
Generating KPI Metrics
Now that we’ve identified some KPIs that providers should start looking for, how exactly does one derive this data? While some of it could in theory be generated by in-house solutions, there are a great number of third-party solutions that can easily be incorporated into already-existent APIs with minimal effort.
While not every solution is going to be appropriate for your given use case, at least one of these should in theory be applicable with the language or architecture currently employed.
Galileo hits all of our major KPIs. While the solution is definitely aimed towards the enterprise, each of the KPIs the service uses are of great value to smaller APIs too.
One of the big offerings from Galileo is the fact that if offers real time logging of request and response, and has a system that allows for replay of said requests. While this is obviously valuable in a debugging situation, it also sheds light on the user experience.
As part of these data sets, consumer usage is a key metric tracked by Galileo as well. The ability to to track the popularity of services and most used endpoints is valuable, and can play a huge role in identifying issues in the overall architecture far before they become an issue.
It should be noted that Galileo offers an enterprise solution in the form of Mashape Enterprise, a service that uses the same fundamental structures as Galileo, but is designed from the ground up to work with larger, more complex architectures with affordable integration options. That being said, Galileo is still usable by small APIs, and its real time applications make it rather attractive.
Mashery API Analytics
The inclusion of Spotfire is a good idea for visualization, and it ties into the general approach that Mashery has seemed to take on with its inclusion of the “Executive Summary” featureset – a metrics solution designed specifically for business purposes, for presenting metrics to stockholders, employees, and other executive level employees.
This doesn’t necessarily mean that Mashery is restricted only to enterprise spaces – as with any choice on this list, adapting Mashery to your environment is easily done with the wide range of documentation and community support enjoyed by adoptees of the solution. That being said, Mashery is expressly meant for enterprise and business-focused use.
Going in a slightly different direction, 3scale offers metrics as a tool integrated into a larger infrastructure. As such, 3scale offers its metrics for APIs that are built upon the infrastructure itself. That being said, the offerings from 3scale do make the considerations alluring. 3scale is designed to be highly scalable, and its integration of API solutions like rate limiting and billing into the system itself makes the proposition a valuable one.
The only caveat with 3scale is that it continues with the proclivity for API metrics providers to be enterprise-centric. There is a price barrier that might be hard for smaller teams to hit, and the idea of integrating an API into a larger infrastructure might not be feasible for some APIs, especially those who, by contract (typically as part of a B2B agreement) simply cannot route to external platforms and infrastructure stacks.
That being said, it’s still a very viable solution, and with so much power and resource at the hands of the implementation, it’s a hard solution to argue against.
API Umbrella is different in that it’s a completely open source, free implementation. It’s also different in how it actually handles the analytics generation itself. Many providers, including those discussed so far, require data to be sent from an API into their endpoint – you’re essentially sending data from one API network to another API metric, and deriving statistics from this interaction.
While this is not a fundamentally bad approach, it can lead to increased latency. This of course offsets some of the analytics that are generated, and as such, can result in having analytics on the system-affected data rather than the raw data itself.
This is not how API Umbrella works, specifically because of how the system is designed. It’s not an external system – rather, it’s a layer between your API and the external connections. API Umbrella calls this a “proxy that sits in front of your APIs.” As part of this, there’s a huge benefit incurred in the form of a less affected data stream, and thereby greater seamless analytics generation.
That being said, it’s a relatively less-used solution, and as such doesn’t have as much data, experience, or resources backing it as the other solutions in this list do.
When business owners discuss analytics, they often talk about it as a feature, or something that would be good to implement but is not entirely required. This can’t be further from the truth – choosing to forgo metrics is choosing to not be educated as to the function, interaction, use, and nature of your API.
This has obvious implications in technical terms, but has even clearer implications as to your business success and chances of adoption. Accordingly, metrics should be viewed as important as they truly are – a key aspect of a healthy API system, and an absolute requirement for any provider.
We hope this has helped explain that, and frame the conversation within some key KPIs that are infinitely important to an API’s health and success. Let us know in the comments below what your preferred KPIs are, and what solutions you’d love to see discussed in this conversation.