Centralize Data Access Control with GraphQL Posted in Security Andrew Carlson December 18, 2023 GraphQL is more than just a way to let client teams ship features faster, or backend teams efficiently reuse their existing services. When used as a layer to aggregate and orchestrate existing APIs, it’s an ideal location in our architecture to centralize access control and authorization down to the field level, providing field-level observability into which clients request what data. When people think of GraphQL, they often see it as just another option for their APIs. For example, they could deploy a GraphQL API instead of shipping a REST service. Historically, this makes sense, as GraphQL is often positioned this way among analysts and influencers. However, over a decade after Facebook created it in 2012, API platform teams at companies like Netflix and Wayfair now use GraphQL to abstract service complexity and deliver better developer velocity and composability, not just as an advanced data transport solution. Federated GraphQL enables teams to deliver GraphQL’s benefits at a greater scale, transforming GraphQL from just another API to a layer in a stack that sits on top of existing services. This graph of graphs provides access to any number of services with a single endpoint. It also enables teams to share entities and domain models across those subgraphs. Rather than exposing a sprawl of backends-for-frontends (BFFs) or experience APIs, service teams gain a central platform to contribute any services to the graph. By the nature of being a centralized platform for clients to access data, a federated GraphQL layer is also an ideal place to centralize the control of that data access down to the column level in a declarative and observable way. It enables API teams to implement broad rules, for example, to entire services, groups of services, or granular rules inside a query using authorization and policy-as-code tools that many organizations have already adopted, like OPA, Sentinel, and Casbin. By enforcing authorization with a GraphQL layer, API platform teams can enable better developer velocity and self-service without sacrificing security and compliance. Data Governance Through Policy-as-Code Data is an organization’s most valuable digital asset, and companies spend over 5.2 billion dollars each year protecting and securing it. Companies designing their data governance strategies are increasingly turning to policy-as-code and declarative authorization tools as a powerful way to define who has access to what in an auditable, transparent, and easily iterable way. Policies are rules or conditions that gate access to an underlying service or data. When defined as code, it’s usually by using a programming language such as Rego, Sentinel, or YAML. Considering the alternative of managing these rules and conditionals manually or on a service-by-service basis, a policy-as-code pattern helps teams increase the precision of the configurations they apply across services, improves collaboration across teams by establishing a systematic way of defining access, elevates testability, and more. Software architecture is about making principled decisions about things that are hard to change later. The more challenging something is to change, either in hard costs, people hours, or downstream effects, the more rigor we put into evaluating the tradeoffs. In many ways, evaluating and deciding on the policy-as-code tool is the easy part of the process. Swapping out tools is rarely trivial, but once the infrastructure is in place and teams are familiar with declarative security and authorization, that migration process becomes an exercise in mapping capabilities between tools. Instead, a much harder architectural question to answer that broadly impacts our API strategy and security footing is: Where is the best place to apply these policies? Security at Every Layer Identifying the best place to apply our security policies starts with understanding our architecture and the access boundaries that our organization draws around data. For example, what database management system (DBMS) is our persistent storage layer using, and do we have an API management tool (APIM) such as Mulesoft or Kong in place? How do we authenticate users, manage who is authorized to access information today, and decide how granular that authorization needs to be? Most organizations will benefit from applying a swiss cheese or defense-in-depth model to their security approach, applying policies at multiple or all layers in the stack. Each layer comes with respective tradeoffs, namely in the granularity of the data we can guard and our flexibility for adjusting those policies. We can look at this as a matrix, with one axis measuring flexibility and the other measuring granularity: Applying a security policy in the persistent storage layer can be more demanding to change and is usually very coarse-grained. A policy at the service level is more granular but typically still gates access to an entire service at once. What if we want to apply fine-grained policies down to the column level or wrap multiple services simultaneously? By evaluating a typical architecture topology, we can quickly identify different opportunities to inject authorization, whether declarative or imperative. Historically, the most common areas to apply authorization are at either polar end of our architectures: closest to the data or closest to the user. Applying Policies Closest to the Data The first place to look is in our persistent storage layer. We’ll consider a database for brevity, but it can be any type of persistent storage, whether S3, PostgreSQL, or a data warehouse wrapping a database, like DataBricks wrapping MySQL. As the system of record for the data, applying data access policies at the storage layer is a logical place to start. The trade-off of only using security policies at the data storage layer is that they trend towards general roles, whether that’s through a username and password, certificate, LDAP, or other authentication protocol. These are enforced directly at the storage layer, such as through PostgreSQL Authentication, authorizing table-level access at best. This type of low-level access rarely represents the permissions and scopes we need for a consumer-facing application. Applying Policies Closest to the User The services are the next layer in the stack to interrogate as an option for applying authorization and access control policies. We can do this service-by-service, or more commonly, through an API gateway like Kong or AWS API Gateway. Using policies at the service level can be a boon because we can fine-tune them to business requirements more comprehensively than at the database level. However, policies and authorization implemented at the service level using a tool like Kong are still broadly enforced at the service or gateway level, rejecting an entire request if the policy agent identifies an unmatched rule. On the other hand, if we apply policies on a service-by-service basis, we can increase the granularity of the data. Still, we must write unique and bespoke logic in each service, decreasing the flexibility of adjustments because every change will require new code, tests, and deployment. Policies that we apply at the persistent storage layer aren’t particularly granular or flexible. They limit access at the table or database level or restrict access by IP, among other things, but rarely have end-user-related business logic associated with the rules. Policies that we apply at an API gateway or service level are flexible but not very granular. They operate as guards in front of all upstream services that simultaneously allow or restrict access to an entire service or set of services. We can look in between these two polar ends of our architecture for a space to apply policies that is both flexible and granular: at the schema. Also read: How to Design a Scalable GraphQL Schema Applying Policies in the Middle GraphQL, especially when implemented as a federated GraphQL architecture, offers a unique opportunity to apply query and even column-level authorization and access policies within a single request. In a federated GraphQL architecture, teams can maintain individual GraphQL APIs, which teams can write in various frameworks, allowing each team to use their language of choice. This architecture provides the simplicity of a GraphQL monolith for client teams but the modularity of a more decoupled approach for service teams. These individual GraphQL APIs, or subgraphs, sit behind a single router that serves as an access point for clients. A composition process takes all subgraph schemas and intelligently combines them into a single schema, ensuring a consistent and performant runtime. This supergraph architecture — a graph of graphs — orchestrates these services to provide a central access point for data while retaining field-level granularity. We can apply policies in GraphQL, even without a federated architecture, but federation provides a boundary in an architecture that amplifies the benefits of declarative authorization. A GraphQL layer can be a Goldilocks zone in our architecture because it is possible to apply broad rules, for example, to entire services, groups of services, or granular rules inside a query. By applying these policies declaratively at this level, we can define granular and flexible authorization and even design for more complex patterns like returning partial responses (returning data that a user can access, and an error for requested data they don’t have permission to retrieve). How to Centralize Data Access in GraphQL Applying declarative policies in GraphQL is a nascent space, but it has tremendous potential upside thanks to the flexibility and granularity we can gain in our security posture. There are generally two ways this can be done today: manually in each resolver or centralized in the schema with a custom directive. Applying Policies in GraphQL Resolvers The first way to apply policies is by handling the agent request/response lifecycle in our GraphQL resolvers. One example of this is the Open Policy Agent (OPA) and GraphQL tutorial, which outlines creating the policy bundles required for creating fine-grained, context-aware policies to implement GraphQL query authorization. Applying Policies in GraphQL Schemas Another option for applying policies is by customizing our schemas directly. Declaring policies in our schemas requires custom directives, an advanced GraphQL feature, but it is the most declarative and clean way of applying these rules. Some emerging products on the market offer pre-built custom directives that reduce some of the complexity of building and maintaining them. From the GraphQL spec: “Directives provide a way to describe alternate runtime execution and type validation behavior in a GraphQL document.” Policy application is an excellent example of where a custom directive can shine. For instance, we could create a custom runtime directive such as authorization that accepts a policy name and resolves it through our agent of choice. Though it involves a bit more custom development through a new directive, this approach is more reusable than directly handling policy resolution in each resolver. Conclusion A strong security strategy requires a plan for every layer of our stack, and applying policies in GraphQL can give us flexibility and granularity that we haven’t seen before. By building on the rules we’ve already applied at our persistence layers and API gateways to include authorization policies in GraphQL, we can use it as a centralized hub for implementing nuanced, field-level access control and authorization. This shift streamlines data governance and elevates it by integrating policy-as-code tools throughout our stack.