API Monitoring: A False Sense of Security Patrick Poulin July 13, 2020 Why uptime and performance monitors fail to catch so many API errors Large companies with Testing Centers of Excellence (TCOE) have tended to divide API testing and API monitoring between two separate teams that operate in silos. For years, this siloed approach worked fine. It was okay for QA teams to focus on UI testing following the testing pyramid: 80% UI testing, 20% API testing. However, with 83% of web traffic now being powered by APIs (Akamai), a new reality has led to an inversion of the testing pyramid. At the same time, many TCOEs have lagged evolving their monitoring strategies. In this article, we explore the dangers to Centers of Excellence (and DevSecOps) if their organizations continue to stick with siloed API testing and monitoring strategies. Siloed API testing and monitoring is a root cause of the growing prevalence of costly bugs and vulnerabilities affecting large organizations today. The Problem with “Synthetic Monitoring” Many of the popular monitoring platforms that you use every day are now offering a form of API monitoring, but the platforms use labels such as “synthetic testing,” “synthetic monitoring,” or “proactive API testing.” While “synthetic” may imply the combination of things, it should not be confused with integration testing. In reality, the synthetic tests from these platforms are basic API acceptance tests that only allow you to monitor API performance (ping response times) as well as check for HTTP error codes, and content errors in headers and bodies. Before entrusting your entire API infrastructure to these synthetics, consider that most “Synthetic Monitoring” products can only check on single endpoints, not full API consumer flows. This severely limits the synthetics from detecting and diagnosing functional errors that are due to issues with integrations, databases, and other areas that are in a state of constant change. To maintain continuous quality throughout the API lifecycle, Centers of Excellence increasingly prescribe continuous API testing that can run with a data-driven framework (to apply dynamic and constantly updated assertions) in any environment before and after APIs go live. These API tests can be automated via a CI/CD platform to run on every build. So the question remains – why not schedule them to run continuously (every 5 minutes) and against production and staging environments? You could simply turn the API test into an API monitor. In my experience, three key problems arise from not evolving your API monitoring strategy. Without the ability to check entire API consumer flows, the synthetics can contribute to those costly three problems: Poor Monitors: In siloed API testing and monitoring, monitors are often written outside of the QA team by stakeholders with insufficient domain knowledge to design tests that can accurately check for uptime in real-world conditions. The synthetics do nothing to solve this bottleneck, such as allowing technical and non-technical stakeholders to work in parallel to support the business case. QA specialists with high domain knowledge should be able to easily design, build, and automate API tests that cover the entire API consumer flow. And then reuse those API tests as API monitors: one silo, one workflow, one pane of glass. Security Vulnerability: Many APIs contain sensitive data, and the data and test results should stay in-house. Further, synthetic monitors often require a tunnel that exposes APIs for monitoring beyond your firewall. Most API testing data should be private and protected. The best solution is to deploy API testing (and monitoring) that stays behind the firewall. Lack of Global API Perspective: You have a team in charge of building and testing your critical new (and old) APIs. By handing the duties of monitoring to another department, you are creating a new silo – with a new team that is using different tools and has less domain knowledge. This causes a critical loss from an inability to get a single overall perspective of general API quality. Advantages of Functional Tests as Monitors Consider the following story of how a large ticketing and live events software company detected and solved a substantial API bug by using functional uptime monitors (Source: AudienceView Case Study): API reliability is top of mind for a live events software company that depends on a vibrant API ecosystem to ensure their live events app meets their customers’ every need. So they created a partner API that was due to launch on January 1st. During the month before launch, they created a series of end-to-end API tests that ran as part of their automated builds. The tests passed, and at the end of December, they soft-launched the API before informing their partners. Then they turned on functional uptime monitoring and were surprised to find a problem in their infrastructure: The servers were performing poorly, and they didn’t understand why. Integration testing in the functional uptime monitor had checked on the user flow (from logging in, searching, and checking out) and found that a session was created on the webserver each time. The session would eventually expire. However, a reasonable number of flows per hour could clog the webserver for good, which reduced performance or even caused a complete halt. The simple login process could not have triggered the memory leak. The session had to accumulate excessive data that was collected during the user flow. Testing in production is critical to find lots of problems that automated testing as part of a build will not. For example, I have seen problems such as misconfigured API managers, database categorization problems, caching issues, and many more. In the story, we saw what happens when Uptime is replaced with Functional Uptime – a far more useful metric that more accurately captures real-world scenarios to find problems such as the memory leak with the live events software company. Nothing seemed wrong with uptime or performance when the automated tests were run occasionally. But when the tests were turned into consistent monitors, they found the memory leak. Deploying the monitors is effortless when they can be built from existing automated end-to-end tests. Simple ping monitors wouldn’t have pushed the servers enough to reveal the memory leak until well after the problem was live and impacting their customers. Functional uptime monitoring is the best way to uphold a competitive SLA. Use the API Experts In summary, strong API monitoring for competitive internal and partner SLAs starts with a great API test written by stakeholders with the necessary expertise and domain knowledge to validate the logic and function of the API truly. With the right start, you can eliminate many costly and risky bottlenecks as you reuse holistic API tests as functional uptime monitors. You should be able to run these modern monitors every 5 minutes against pre-production and production environments. Then you can measure the success of your testing and monitoring solution with a better uptime metric – functional uptime.