API.expert Report Finds Increasing Overall API Quality

Posted in

Performance is everything in the cloud — especially for integrations. Quality APIs with excellent availability and low latency can truly streamline the customer experience. On the other hand, high downtime and inconsistencies across regions can cause significant pains for software development teams and end-users alike. Thus, software owners with globally distributed APIs need to take special care when crafting their networks, choosing a Cloud Service Provider (CSP), and optimizing for delivery across locations.

So, what determines a quality integration in 2021? To mark baseline API performances across the industry, API.expert and APImetrics recently released a report that collates an impressive amount of data. By analyzing millions of calls to hundreds of APIs over two years, the report depicts average performances across the cloud industry while highlighting star API performers. While the overall performance trend is a net positive, the findings did reflect a drop in cloud performance in mid–2020, likely due to the rise of many new digital innovations ushered in by the pandemic.

Below, I’ll review the key takeaways from the API Cloud Performance Analysis Report. I also met with David O’Neill, CEO and co-founder of APImetrics, to see how API owners should react to the results. According to O’Neill, transparency in API performance and outages will be critical to both mature internal DevOps and advance the industry at large. “Ecosystems survive or die based on the information people have about the ecosystem,” said O’Neill.

The Report Process

To gather data for this report, API.expert called 300 APIs throughout 2019 and 2020, hitting popular APIs within categories like corporate infrastructure providers, financial services institutions, social networks, and search engines. Making calls from 85 data centers worldwide, roughly every five minutes, the group consistently tracked failures like 5xx errors, network errors, content errors, slow responses times, and redirects. O’Neill estimates that, in total, the team collected about 60 TB of API call data.

The report found that pass rates can vary tremendously. Overall, the top API performer was DocuSign, with pristine availability and no measurable downtime. The second best API had 18.5 minutes of downtime. On the other end of the spectrum, API.experts tracked five and a half days of downtime for the lowest performer. Impressively, while other APIs slowed due to increased traffic in mid–2020, Slack stayed relatively resilient, found the report.

Overall Availability and Quality

When it comes to measuring service availability, the golden target is 99.999%, or “five nines.” This equates to about five minutes of downtime per year. However, out of the 32 major corporate infrastructure services tracked, only one service met this goal (DocuSign). The majority of APIs sat in the 99.9% category (105–1053 minutes of outage per year).

Though general availability shows room for improvement, overall performance quality is markedly improving. API.expert gauges an API’s quality with an aggregate Cloud API Service Consistency (CASC) score. As we’ve previously covered, CASC is like a credit rating for API performances. A 9.00+ CASC ranking is a very healthy functional API. Throughout the last few years, more and more APIs have moved into this elite group; 11 in 2018, 21 in 2019, and 28 in 2020.

The report also found major improvements in other areas. Over the past two years, DNS lookup times have improved compared to previous years. In 2019 and 2020, there was a median DNS lookup time of 12 milliseconds (ms) for all clouds and regions. At AWS, this figure has dropped to 4 ms since March 2020.

Multicloud, Regions, and COVID–19

Most clouds followed roughly the same trend throughout the last two years — latency continued to decrease between the beginning of 2019 and the middle of 2020. Interestingly, a spike in latency didn’t occur until well into the pandemic, in July 2020. “Our working assumption is that actual digital transformation around APIs and cloud services didn’t happen immediately,” said O’Neill. The data seems to show that significant time was required to transition to working from home and introduce other major digital shifts.

AWS had the best ranking in terms of mean total time, at 499 ms in 2019 and 2020. Azure has consistently been about 90ms slower than AWS and Google from late 2019. IBM Cloud charted significantly more connect time compared to other clouds. In terms of regions, European and North American are on average about 400 ms faster than their counterparts South America, East Asia, South Asia, and Oceania.

Bolstering API Quality

Amid increasing digital demands, it’s reassuring to see improvements being made throughout the industry that affect API performance. However, a crucial component of an API’s reliability is its availability. Since most APIs only achieved 99.9% or below, an increased effort will be required to improve availability and match rising API performance expectations.

Also, it’s good to note that all clouds aren’t created equal — different regional zones can still deliver vastly disproportionate outcomes. “There is a perception that cloud is homogeneous; clearly that’s not true,” said O’Neill. Smart multi-region support will thus be critical to support consistent user experiences across zones.

Understanding the entire cloud landscape helps to set benchmarks, but it’s not as helpful for improving day-to-day operations. To bolster API quality, the report recommends introducing continual API monitoring, understanding the differences across geographical zones, reducing errors affecting DNS lookups and latencies, and comprehending the impacts of API failures on user experience.

Another avenue for change is more cultural. “There’s a disconnect between front of house people and people eating in the kitchen,” described O’Neill. “DevOps needs to come out of Ops and come into customer success.” Part of advancing this agenda, according to O’Neill, will be increased transparency into API statuses and outages that developers, architects, and even business analysis can comprehend.

Final Thoughts

I don’t think a report of this magnitude on API and cloud performances has ever been conducted. This is possibly due to the sheer effort and cost burden required to integrate (and maintain integrations) with so many disparate APIs, let alone call thousands of endpoints nearly 300 times a day. (O’Neill confided that the team had to literally open bank accounts with some of these financial providers to successfully imitate real-world API calls!)

Kudos to APImetrics for digging into the weeds and offering up this data to the community. It will be interesting to see how API.expert continues to monitor global API statistics. I’ll be especially curious to check in with the site to track the state of performances as we enter the new normal business conditions post-pandemic.