Securing GraphQL APIs With Identity Control

GraphQL has become a popular query language for retrieving or mutating data in APIs and microservices. The popularity is in part thanks to the power it provides clients to ask for exactly the data needed and nothing more. This is possible with strong typing that allows the clients to know the exact structure of the returned data. GraphQL also allows for nested queries that can return data from multiple aggregated APIs using only a single query. This reduces the need to call multiple APIs and makes for a more performant approach.

Some of these aspects have made GraphQL popular, but they also pose some interesting challenges for safeguarding GraphQL APIs and the data they serve. One security example is that the GraphQL specification does not define authentication and authorization. Thus, it’s up to the implementer to handle. A particular challenge here is that the GraphQL API could be aggregating information from multiple APIs and microservices, and authentication and authorization of the aggregated data needs to be considered.

Don’t miss our workshop with Apollo GraphQL: What If All Your Data Was Accessible in One Place

Authentication

Authentication comes first and is the process of determining that the user or system is, in fact, who they say they are. It’s a prerequisite for authorization since you cannot determine what a user can access unless you know who the user is. There are many different ways of authenticating a user that we won’t cover in this post. But, the best practice in this area is to use OAuth and OpenID Connect and make use of tokens. This way, user credentials and authentication details don’t have to be passed around to APIs and microservices, and instead, an access token is used. This applies to GraphQL APIs and other types of APIs, microservices, or applications.

As mentioned, authentication is not defined by the GraphQL specification, and the implementer must decide how to handle it. One option could be to handle it in the web server or proxy in front of the API, such as NGINX. Many middleware libraries can handle the ins and outs of OAuth and OpenID Connect here, depending on the technology used.

This is an approach that, on its own, can suffice in very simple scenarios. Who is the user, and should they have access to this API? It handles access to data from a very coarse-grained perspective — the user either gets access to the API or not. And it’s probably only possible when dealing with a single API. But what if you also need to authorize exactly what data the user should access and under what circumstances? What if the API that is called propagates data from many subsequent APIs? That gets more complicated.

More complex use cases where several APIs might be called will require each API to handle authentication (and authorization, more on that later). In reality, these APIs might be called individually for other use cases. In a token-based architecture, a token can be passed on to each API called, and that same approach applies here. The user identity, the token, can be passed in context to any subsequent API that is invoked. This way, each API has a method for verifying the identity of the original requester without having to explicitly authenticate the user.

A best practices approach is to use an opaque token for public applications and clients simply because they can’t directly leak any sensitive information. With a proxy or API gateway involved in the architecture, the opaque token can be exchanged for a JWT that’s passed on to the upstream APIs, an approach known as the Phantom Token Pattern. A JWT can hold more information, and upstream APIs can verify that the details have not been tampered with.

This pattern allows for a very flexible architecture. Authentication and its subsequent access token can be propagated through multiple APIs without affecting the GraphQL schema. The API and the data served by the API can change, but the mechanism for authenticating the user stays the same. There might be some tweaks to be made in exactly how the token is issued and what scopes and claims the token contains, but these are easy tweaks to make in the authorization server that is responsible for administering the token.

Authorization

Knowing who is accessing the API is a prerequisite for authorizing what data should be released by the API. That is taken care of by authentication. A very simple form of authorization is to decide whether the user can access the API or not. This is straightforward and can easily be handled using scopes and claims in the issued token.

Even more complex authorization use cases can be handled using scopes and claims. Scenarios where there are various rules for read and mutation access, specific access based on user attributes, access to a list or a subset of particular resources, or even access to given fields within the payload.

However, in many cases, this is not going to be enough. Some level of finer-grained authorization is needed. Depending on the use case, it can get more complex where there are relational aspects of authorization rules, such as where a “user can only read their own records” or “a user that is a doctor can only view medical records of patients they are assigned to.” These can be very complex authorization rules that are difficult to implement and enforce, especially when dealing with multiple aggregated APIs.

Each API needs to control its own authorization. Therefore, if the user information is propagated throughout the nested API calls, the user is known by each API that is called. Each API, therefore, has enough user contact to at least begin to determine how to authorize access to the data it serves. The data on API releases can be the input to another API, so this matters a lot. This authorization should be handled in the business layer as opposed to directly in the GraphQL or Data Layer.

By adopting this strategy, a complex GraphQL query that runs through the business layer and involves multiple different data sources in its potentially several API calls will be best positioned to enforce authorization on the data that is retrieved and then returned to the GraphQL layer and in the end the client requesting the information. This also allows the authorization to be agnostic to the way the API is exposed (REST or GraphQL).

The high-level authorization stack.

Authorization rules fit well into the Business Layer but are sometimes complex to implement. Especially if they involve fine-grained authorization rules or relational rules. A rule might need data from another API to make the appropriate decision, which can be very difficult to handle.

Entitlement Management System

Instead of implementing the logic of all authorization directly within the business layer, a powerful option is to externalize the decision-making to an Entitlement Management System (EMS). The previously mentioned relational type authorization rules can be very complex to implement but are typically easy to express in an attribute-based access control (ABAC) system like an EMS. The approach here is that all authorization logic is expressed in some type of format that the EMS understands and can be managed completely separately from the API, the data, and the business layer.

In this case, the business layer could be where the decision is enforced, known as the policy enforcement point (PEP). This also allows the same policy to be reused by many different systems and allows for much better auditing of the authorization rules in place. The EMS can be configured to have access to other APIs and data sources where it can look up additional pieces of data (attributes) it needs to make an appropriate decision. Naturally, many of the attributes need to come from the user information, such as the department, role, age, clearance, and all of the user attributes can be derived from the authentication context, i.e., the token.

Conclusion

Authentication and authorization can get complex quickly when dealing with GraphQL APIs. This is, in part, a result of the advantages that GraphQL brings. With the ability to craft a very specific query that gets just the correct data, the GraphQL API might aggregate several APIs in the background. This means that the requester’s identity needs to transpose several APIs, and all APIs need to take authorization into account.

In summary, it’s a good idea to leverage the phantom token pattern to avoid leaking sensitive data in JWTs to public clients. Use opaque tokens instead and have the API gateway or proxy exchange the opaque token for a signed JWT that the APIs themselves can consume. Additionally, leverage scopes and claims for authorization within the APIs, and for more complex use cases, look at externalizing the authorization rules to an entitlement management system.