The Role of APIs in Retrieval-Augmented Generation (RAG)

AI has been rapidly evolving since ChatGPT became widely available in November 2022. Countless business owners, developers, and AI evangelists have jumped on the bandwagon, thrilled at the prospect of unlocking their productivity and freeing themselves from busy work.

However, anyone who’s used AI-based tools like LLMs knows they’re not without their limitations. LLMs tend to make things up when they don’t know the answer, for one thing. There’s little quality control when it comes to fact-checking or rating the quality of sources without human intervention, for another. This complicates the purpose of using AI for many users.

Retrieval-augmented generation (RAG) is one solution that has popped up as a response to these limitations. RAG is where an LLM consults an authoritative source outside of its training data to verify its answers before returning a result. When appropriately implemented, RAG promises to solve many issues currently preventing LLMs from being as useful as possible. As is often the case when a digital tool needs to interact with external sources, APIs are a vital component of RAG.

Here’s what you need to know about APIs and RAG to give you some ideas of how you might use this emerging technology.

How Retrieval-Augmented Generation Uses APIs

Both RAG and LLMs rely heavily on vector databases. They’re the cornerstone of the entire system, acting as an essential layer between heavily structured data and real-world applications. They’re also much faster and more efficient, as they store data as numerical vectors rather than the object itself, allowing for faster retrieval, advanced processing, and multidimensional data representation. Multidimensional data representation is vital for adding context to a query. It also reduces the need for prohibitively large datasets, performing most of the functions necessary for RAG via an API.

RAG systems can be broken down into three components. Context is the cornerstone of an RAG API query, as it lets the system know where to look for the necessary data. Role defines the system’s purpose, letting the model know how to format responses. Finally, there’s the query itself, which initiates the entire process.

4 Examples of RAG In Practice

Now that you’ve got a better idea of how RAG functions, let’s look at some real-world scenarios that implement RAG effectively and what APIs they interact with under the hood.

1. Virtual Assistants

Virtual assistants are a popular application of AI and LLMs. They eliminate the need for human customer service representatives for some queries, offering customer service around the clock when configured properly. In some cases, virtual assistants are even superior to human customer service representatives, as humans are fallible. Even the most skilled customer service rep can only know so much. In certain areas, machines struggle, though, which is where RAG comes in.

Connecting a virtual assistant to an RAG gives them access to real-time information like current weather conditions or inventory status. Integrating RAG into a virtual assistant workflow isn’t that dissimilar from other API integrations — you just need to include a generative API layer to interact with the virtual assistant. In this example, a user would enter a query like normal into the virtual assistant prompt. This query runs through the RAG model via the API using a custom-built retriever, which provides results based on the context of the query. The custom-built retriever then returns results in an appropriate format.

2. Personalized Lead Recommendations

Applications could utilize RAG to assist with lead generation and recommendations. Telescope, for example, is a sales automation platform that provides customers with real-time leads. This involves integrating with the customer’s existing CRM software to ensure the data is accurate and useful.

In this scenario, an API feeds accurate data into a machine learning model. Users can customize these models with additional sales data, like whether or not a pitch was successful, which yields even more useful leads. Even better, RAG can also be used to generate better sales material, analyzing sales materials for spelling and grammar using online dictionaries and offering suggestions for improvements.

3. Recruiting

Another area is recruiting. For example, Assembly is an HR platform for answering real-time questions. It acts as a generative layer between the person asking the question and a company’s internal data. This is useful for accepting queries from users in natural language and then returning the best answer. This makes it a much more useful platform for attracting the best candidates, as they’ll be given accurate information in real time.

Using Assembly, interesting applicants can also be directed toward relevant job openings, also using natural language. This can broaden the pool of potential candidates significantly, as you’re not limited to applicants using very specific keywords or phrases. RAG systems can even scrape specific sites to create a more robust model of useful employment data.

4. Medical Consultation

AI is increasingly being used for medical diagnoses and consultation, which brings about its own complexities and opportunities. To properly diagnose a patient involves accessing that patient’s medical history as well as the latest medical research, which has legal ramifications due to HIPAA, in the United States, and GDPR in Europe. RAG systems could allow an LLM to access pertinent medical data and the latest research, which can be summarized in an easily understandable way, and then return it in an appropriate format.

When NOT To Use RAG APIs

Although RAG systems employing APIs have an impressive range of applications, they’re not the perfect fit for every situation. They don’t always use APIs, either, meaning you’ll still need to deploy a hybrid approach in some circumstances.

Consider the medical example we just mentioned. Sharing patient information internationally creates legal problems and complexities, meaning that data needs to be stored locally to be HIPAA- and GDPR-compliant. In that circumstance, the RAG system wouldn’t need to call an API, as that data would behave like training data.

Others find that RAG cuts off too much context in favor of efficiency. Machine learning expert Phoebe Klett warns that RAG still can’t summarize which texts were used to generate a response. She also warns that the referenced documents can sometimes be larger than the training model itself.

Rather than using RAG for every occasion, Klett advises using what she calls “extended mind transformers,” which is when a system regularly interacts with an external storage system, which serves as a form of memory. This is another area where RAG falters, according to Klett, as RAG doesn’t include an internal memory, which is essential for truly establishing context. This leaves RAG vulnerable to some of LLMs’ glaring shortcomings — the tendency to hallucinate when it doesn’t know the answer.

RAG has other limitations, too, many of which have to do with some of the inherent limitations of LLMs. Many LLMs have a fee per transaction, for one thing. Including RAG in every API call could quickly become very expensive, to say nothing of slow and unwieldy. It’s also not the most secure solution in the world, either, as many LLMs can execute code and retrieve data. Offering anyone the ability to run AI programs inside your network is a recipe for disaster. In short, RAG is incredible in many circumstances but needs to be deployed thoughtfully.

Final Thoughts on RAG APIs

Retrieval-augmented generation has the potential to eliminate many of the limitations inherent to LLMs, but it’s still not a catch-all solution for every problem. RAG APIs still need to be connected to the right external data to be helpful, for one thing, making them not that dissimilar from other machine learning models. Otherwise, they’ll hallucinate an answer, just like any other LLM. It also raises questions and concerns about data compliance across international borders, as we saw in the medical examples we cited. Developers and data architects still need to decide how to deploy data for their particular needs. Finally, RAG APIs can become inefficient and expensive if you’re not careful.

Setting up RAG properly offers the best of all worlds — the ability to interact with users in real-time using natural, conversational language while still providing access to the best data in real-time. If you’re creating customer-facing solutions of any kind, you should at least consider RAG as a possibility.