API Discovery: How Do You Count APIs?

When we discuss APIs, it’s often easy to forget that this topic is not as well-defined and strictly understood as it might seem. As with any topic that deals with complex systems, issues begin to pop up when you ask simple questions. A good example is the Ship of Theseus. When something changes parts, when does it become something significantly different? Is it still the same ship when 50% of its parts are changed? What about 75%?

This might seem an odd reference to make, but at the 2024 Austin API Summit, Rob Dickinson, VP of Engineering at Graylog, asked a philosophical question of the same type with significant real-world implications. What exactly is an API, and how do we count it? The answer is surprisingly more convoluted than you may think.

Below, we’ll look at Dickinson’s answer to see how to count an API. We’ll also consider why this is an important distinction to make for discoverability purposes and for producing API inventories that can manage security and risk at scale.

Watch Rob Dickinson, VP of Engineering at Graylog, present API Discovery from Crawl to Run at the Austin API Summit.

Why Does API Discovery Matter?

Before diving into this topic, we should ask why it even matters. API discovery is the process of detecting and categorizing APIs within a service. This discovery begins to form the backbone of our understanding of the attack surface and the relative risk of decisions made within that reality.

Without an understanding of what APIs make up a service, it can be difficult to even understand the surface area that is under threat with any degree of reliability. This gap in understanding further makes it difficult to address issues that may exist or to plan for recovery in the event of a significant threat or successful attack.

A lack of understanding of this type also means we lack fundamental metrics to quantify risk, both as static threats and as changing ones. Without metrics on what’s being tracked, there can be no security, as there is no “goal” that indicates a successful security posture.

In this talk, Dickinson states that the goal is simple: the ability to say, “We currently have X APIs, where Y are new, and Z need immediate attention.” But therein lies the main philosophical problem — what even is an API?

Also read: 7 Reasons Why Cataloging Your API Inventory Matters

What Actually Is An API?

In the API world, being able to define what constitutes an API is absolutely vital. There’s just one issue with that — this definition may change from person to person, let alone organization to organization.

Dickinson notes that there are some reasons this topic is not as simple as it might first seem. Firstly, API best practices aren’t terribly well understood, and for many, changes in the number of endpoints, the nature of services, or even the internal or external nature of each may change the fundamental count of APIs on hand. Ideally, the metrics we are searching for should be universally applicable, so how we count should also be relatively universal and portable.

APIs are also “dark” compared to websites or emails. Whereas it’s obvious to most what a website is, APIs are harder to connect to and use, requiring substantial knowledge to even begin using a simple API. This, paired with the rapid change inherent to APIs and the different cultures underpinning their technologies, has resulted in the reality that APIs are somewhat loosely defined, even to experts, and are almost entirely opaque to the layman.

API Discovery Example

Let’s look at an example of this problem in real code. Below are three hypothetical requests made within a system.

REQUEST A

POST coinbroker.io/user

{
"first_name":" Rob",
"last_name":" Dickinson",
"email":" rob@resurface.io"
}

REQUEST B

GET coinbroker.io/quote

{
“Account_token”:”4b86cd3f-ccaf-445b-b099”,
"Amount_usd":" 6",
"coin_type":" BTC"
}

REQUEST C

POST coinbroker.io/order

{
“Account_token”:”4b86cd3f-ccaf-445b-b099”,
“Quote_token”:”552cd9da-2ff4-4dfe-b2eb”,

How many APIs are shown here? Many would answer “three,” and those people may be correct. However, the answers “1” and “2” could also be correct. Is each request being made its own API? Each has its own endpoint, but they share similar data, so is it the path that decides?

What about the rest of the service? Is this a collection of microservices? In that case, is every microservice its own API, or does the entire service constitute a single API?

This is the fundamental problem that Dickinson is trying to showcase. There’s no clear, perfect answer in any of this logic, yet they are all fundamentally correct — so how do we actually count an API?

Reasonable Ways to Count APIs

There are many ways to solve this problem, and many of them are quite sensible. We could count the fully qualified domain names (FQDNs), which could give us an idea of the number of specific APIs. But what about routes? Okay, maybe we can combine FQDN and routes to get something else.

But what about the difference between internal and external APIs? What about deprecated APIs that are still usable as options but whose functionality exists in a different core API? These questions raise more problems than solutions. The answer here may be deceptively simple: to count via specifications. In this approach, the specification dictates the number of APIs, and the definition is less important.

A New Problem: State of Change

This approach, while effective, does introduce a new problem. How do we count APIs by lifecycle state? Rogue or unmanaged APIs are still APIs, even if they’re not specified in the specification, and should be counted. Dickinson’s solution is to consider these issues in terms of categories:

Rogue or unmanaged APIs are new and need review;
Monitored and supported APIs are actively maintained; and
Deprecated, or zombie, APIs are dead but still relevant to the overall count.

This introduces a more contextual count. Now, instead of just having a pure number to track, we have a number that is relevant to their state in the lifecycle, allowing us to progress into an inventory of APIs that can be tracked over time. This, paired with self-description through introspection, allows API specifications to determine their definition and surface usable data.

What About the Risk?

To level up this usable data, we should also assign risk. Knowing these existing categories of APIs is helpful, but understanding the specific risk metrics was a huge benefit of this process we first discussed.

With our specification in hand, we need to start looking at what metrics could be useful. User activity, specific context, network traffic, and even recent changes to the APIs should all be tracked, allowing us to contextualize recent changes to the API and, thus, the API attack surface.

Now, we’re talking about more than just a specification — we’re including developer intention and real-life use. Threats are both internal and external, and intention is as important as practical use, so runtime monitoring and logging are vital to this process.

While there is no current standard risk metric, tracking this data can begin to paint a picture that can be used to assign these scores internally with some validity. Rob specifically notes that Graylog views this process as a calculation based both on request and response data, noting that the response is often where the data leaks and unwanted behavior occur.

Stages of API Discovery

From this, we can finally make a model for API discovery that meets all of our needs:

Walk: Create an API inventory. Use an API specification to define your API collection and determine what is considered an API internally. Document and track these APIs and assign runtime tracking to generate data internally for use in security posture evaluation.
Run: Track changes to your API inventory. Add additional context by categorizing these APIs. Track their changes over time through a consideration of lifecycle and end user purpose to enrich data and provide substantial context to the API collection.
Rocket ship: Track changes in risk metrics for your API inventory. Finally, track changes to your risk metrics as your API evolves and is used in the real world. This will allow for the calculation of relative risk and a metric-based approach to security posture that is informed by reality rather than philosophical musings.

Conclusion

For many, this may seem obvious — especially for advocates of API specification — but for others, this is a revelation. Making it clear that the definition is less important than the mode of sharing that definition is vital to understanding your security posture. Security is metrics-based: you need some measure of how successful and secure your posture is, and this approach enables that for the broadest array of businesses, API designs, and paradigms.

What do you think of this talk? Are there other ways APIs should be counted? Let us know in the comments below!