Beyond MCP: Scaling AI Enablement for API Landscapes

The Model Context Protocol (MCP) was published in November 2024 and has seen tremendous success and adoption. That is not a big surprise, as AI is still very much trending, and MCP has turned into the glue that allows LLM-based applications to interact with resources and tools. Without MCP, LLMs remain in their closed worlds or can only use custom integrations. In contrast, MCP enables both providers and consumers of resources and tools to provide access to them in a standardized and interoperable way.

Is MCP all we need, then? We see clear signs that this is not the case. MCP is now starting to run into clear scaling issues because you cannot simply expose a large environment of resources and tools to your LLM and expect that to work well. It’s not MCP’s fault, because it was never intended to solve that class of problems. But that means we need to think carefully about the steps required to manage and evolve our environments if we want to build AI applications at scale.

This article presents a five-step model, which is based on our conversations with many medium to large-sized organizations. On your AI journey, you will start seeing the side effects of all these steps, so it’s better to be prepared, think about what problems you may run into tomorrow, and start planning for them today.

1. More Tools

Step one is for organizations to expose an increasing number of tools to AI applications. They may do this by exposing existing APIs through MCP, they may build new APIs as required and expose those through MCP, or they may custom-build MCP servers. Regardless of how they do it, the result is more tools for AI applications to work with.

For experimentation and prototyping, this is a good approach, but it will quickly exhaust LLM context, sometimes even with a single complex API, and certainly with larger API landscapes when these are exposed in an unfiltered way.

But the moment when this model breaks down can be delayed by making sure APIs are described in a better way, giving LLMs the best possible chance to understand and select tools.

2. Better Tool Description

If the main problem does not seem to be (mostly) the complexity and size of APIs and API landscapes, then one tactical step can be to improve the quality of API descriptions, making it easier for LLMs to understand, select, and use tools.

To some extent, this is quite a standard activity that many organizations already have in place as part of their API governance. They use tools such as linters, and the goal is to improve API quality for developers, maximizing developer experience (DX). This second step does very similar things, but with a focus on agent experience (AX).

Tools for API scoring can help to automate quite a bit of this process, and by automating them (for example, by including scoring within CI/CD pipelines), quality gains can be achieved without sacrificing too much velocity.

However, even with very high-quality descriptions, at some point, the set of tools becomes too large for LLMs to deal with. At this point, it becomes necessary to invest in a new method for tool selection.

3. Dynamic Tool Discovery

One of the key differences between traditional API use and AI is that, in the traditional model, developers choose APIs, write applications, and hard-code those API dependencies into their software. In this scenario, tool selection happens at design time, when the application is designed and developed.

That is different in AI scenarios. Here, agents receive tasks, plan how they will accomplish them, and look for tools that they need along the way. This happens at runtime, not when the agent is developed. In this case, we have a much more dynamic scenario, where tool selection occurs at runtime.

This means that dynamic tool selection is a good idea to begin with. But conveniently, it also allows us to manage the size of the tool landscape. If there are many tools that are well described and there is a search mechanism with good precision and recall, then we can handle much larger tool landscapes.

Interestingly, if API scoring also aims to improve tool discoverability, then the benefit becomes two-fold: good tool descriptions help with the use of individual tools, but also with discovering these tools in the first place.

The exact architecture of search and ranking depends on what AI consumers are expecting. Currently, that may very well be the model of resources and tools that MCP is using, but a year or two from now, there may be a new standard in this space. However, if you invest in expressive tool descriptions and good coverage, it’s likely that adjusting searching and ranking to a new tool model won’t be too much effort.

However, so far we’re still accepting that tools probably are based on APIs, and that these APIs are the way they are. That may happen to be too fine-grained, so that even relatively simple tasks require a relatively complex set of interactions across one or possibly even multiple APIs. If that’s the case, now may be the time to invest in better tools.

4. Workflow-Aligned Tools

There are things that agents have to do that don’t add much value to their end goals. For example, let’s assume an agent needs to figure out whether a customer is eligible for a certain action. That may require a set of interactions across a variety of systems. Instead of having to figure that out, it would be much more efficient to have a workflow that can be invoked by the agent that returns a simple yes or no answer, without any need to figure out the exact process that integrates a variety of APIs.

A good way to design these workflows is to observe interaction patterns, and these observations can be obtained from logs. This also adds a valuable way to secure agent interactions through a single access point, so that agents don’t have to implement their own security but can use a well-designed infrastructure. That pattern is very much analogous to the API gateways that started gaining popularity around 15 years ago when API usage started proliferating.

Workflows are deterministic, meaning that they can be trusted. They are also much more energy-efficient than letting LLMs figure things out every single time. Last but not least, they also allow us to now hide the APIs that are wrapped by the workflow, thereby making sure that we reduce the overall number of tools and therefore simplify and improve tool search and selection.

There is one more step we can take, which is about risk appetite with AI. Some companies may like them as “creative agents,” but they don’t like the idea of using them in production. Even then, they may play a very useful role.

5. Safe Tool Usage

By using sandboxes that can be individually configured and deployed, it becomes safe to experiment with tools that are now based on mocked APIs instead of production APIs. That means we can start testing things earlier and with more confidence, and we can be sure that nothing bad happens with our production APIs.

It is up to the risk appetite of an organization how to use this sandbox model. Some may choose to run AI applications only in sandboxes. That is good for testing, provides insights into what AI applications are doing, and also allows us to surface those workflows that help improve the overall tool quality of the landscape.

Other organizations may choose to use the sandbox model as a way to improve the tool quality of their landscape, thereby removing some of the lower-level APIs from the visibility of AI applications. But then they may run AI applications in production, having the confidence that important workflows are now solidified in deterministic models, while still allowing AI applications to combine these based on their needs.

MCP: Just the Beginning of AI Enablement

In this article, we present a five-step model for how to scale your landscape for AI readiness. The takeaway is that MCP is great as a starting point and necessary as the current standard of how LLMs can access their environment. However, there are many more aspects involved in scaled AI enablement than just exposing APIs through MCP.

We see quite a bit of evidence that the “scaling MCP isn’t trivial” story is surfacing in various places and ways. This means organizations will have to prepare for this journey sooner or later. But starting to think about this now, and starting with simple first steps (such as adding AI-aware API scoring to your API governance), you can get ahead of the curve and prepare to scale up your organization’s API initiatives.

AI Summary

This article outlines a five-step model for scaling environments that rely on the Model Context Protocol (MCP), explaining why simply exposing more APIs through MCP is insufficient for building reliable, large-scale AI applications.

Organizations initially expand tool availability by exposing existing or new APIs through MCP, but this quickly strains LLM context limits.
Improving API descriptions and adopting API scoring enhances agent experience and supports more accurate tool selection.
Dynamic tool discovery shifts API selection to runtime, enabling scalability when paired with high-quality descriptions and effective search mechanisms.
Workflow-aligned tools abstract complex multi-API interactions into deterministic, energy-efficient workflows that improve security and simplify discovery.
Sandboxed environments and mocked APIs allow safe experimentation and provide a pathway for controlled production use based on organizational risk appetite.

Intended for API architects, platform engineers, and technology leaders preparing their organizations for AI-enabled environments at scale.