GraphQL Microservices (GQLMS) as a Backend: A Netflix Case Study

Posted in

Rapid iteration is the ultimate goal for many developers. The idea of being able to, with very little inertia, overcome the barriers of development and collaboration to deliver a brand new product is a dream come true. For many situations, however, it’s less a question of whether rapid deployment is wanted, and more a question of whether or not it’s even possible.

One excellent technology for enabling this iteration is GraphQL. GraphQL can power outstanding agility, but it does come with some drawbacks in some situations. Today, we’re going to look at the general value proposition of GraphQL, as well as a specific implementation undertaken by Netflix. We’ll look at some use cases that GraphQL enables and hopefully surface some of the wonderful benefits that adoption can bring.

What is GraphQL?

GraphQL is an application layer query language specification designed to interpret a string from a server or client and then serve that data per the form and function requested by the requesting entity. In other words, as the GraphQL website explains, GraphQL allows you to “describe your data, ask for what you want, [and] get predictable results”.

This malleability of response form and function makes sense, given that GraphQL originally derived from Facebook. At the time, Facebook was transitioning away from its HTML5 applications and solutions to find a more robust, native way to display data and promote user engagement. Facebook had the additional significant need to combine data sets from completely disparate sources.

Accordingly, their solution was to allow requests to be molded by who was requesting them. Instead of serving up a data point and all of its related fields, GraphQL was designed to enable a user to request a specific element, in a specific form, with a specific output compatible with the underlying use case for that specific end-user. This ability to change the fundamental request unlocked other benefits, however, that readily made GraphQL a powerhouse solution. By defining the specific form and function and allowing multiple calls in a singular request, GraphQL made for far more efficient functionality.

We’ve covered GraphQL before in greater detail, but the main takeaway is that GraphQL is simple, intuitive, and allows for precise retrieval and aggregation of data. Multiple data points can be pulled with a single query. GraphQL allows mutations, queries, and subscriptions to establish custom connections to the server and the requested data for a wide range of outcomes.

How Does GraphQL Enable Rapid Deployment?

The major benefit of adopting GraphQL in a rapid deployment environment is the efficiency in how it handles data retrieval. Data costs for transfer are significantly reduced, both for the server and the client; since this transfer can be restricted to only that which the client needs or requests, calls are not wasted. The transit cost is also significantly reduced since these calls can also be combined into a single GraphQL request.

This also means that request size and method can be controlled, which allows for multiple data sources to exist and be collated, compared, or combined, ultimately unlocking many possibilities. First, this makes the backend much more stable. As GraphQL doesn’t require specific languages, architectures, etc., to deliver on its powerful solution, the backend can be built in pieces that are more appropriate for each component rather than what is most suitable for the ultimate endpoint.

This decoupling of the backend and frontend development cycle is hugely important, and the ability to mix and match is what makes GraphQL so effective for rapid development. Consider for a moment what is needed in a rapid deployment environment: you must rapidly prototype and quickly create flows and request patterns that touch resources in ways that those resources were possibly not initially designed for.

More importantly, you need to build with what you have, not what you need, allowing for disparate data sets to exist as they currently do, rather than exist in the way you wish they did. You don’t need to create new systems, servers, and services for each small thing you want to test and deploy; you just want to be able to develop and test. That exact situation is where GraphQL shines the brightest — you don’t need to do anything other than point to your resources, define what you want, and mutate once it comes into where you’re sending it.

The reduction in costs for data transfer also helps in this rapid deployment, as prototyping can be kept efficient, directed, and purposeful. This means any troubleshooting isn’t done within the uncontrolled request landscape but instead within the formed, known, and limited request. Leaks and inefficiencies are exposed by default as they stick out like a sore thumb. Where those leaks and inefficiencies are harder to find, you can rapidly test different types and forms of calls to find where the failure is coming from and what specific approach is causing it.

One of the biggest benefits of adopting GraphQL in this development mindset is the improvement of organizational understanding. Most of the time, APIs described with GraphQL graph schema result in a better structured, easier to understand, and more comprehensively notated API. Understanding the core data flow throughout this system and how the constituent parts work can lead to significant gains.

This might seem like a minor benefit, but the reality is that traditional API development can often lead to APIs that do what they’re meant to do with unseen added complexity that’s hard to code out. You can have something do its job, but you have nothing to compare it to in terms of efficiency. With GraphQL, you can iterate, develop, and test, and your more comprehensive understanding will result in increased efficiency and efficacy of the underlying system.

Don’t miss our workshop with Apollo GraphQL: What If All Your Data Was Accessible in One Place

Netflix Case Study

Luckily, there’s an excellent example of where GraphQL has been leveraged to deploy an effective backend for rapid deployment. Writing on the The Netflix Technology Blog, Dane Avilla, a senior software engineer, discussed precisely how GraphQL unlocked these benefits.

Avilla notes that the team started with two core theories around what they considered to be the “advertised benefits of GraphQL.” First, they believed that using a GraphQL IDE in the form of GraphiQL would allow them to pair documentation alongside schema, and that this would deliver developer ergonomics through an increased understanding of the system itself. Such backing would provide component information that was referencable, iterable, and deployable.

Their second theory was that the strong typing, polyglot client support, and solution agnosticism that underpins GraphQL’s transformation abilities would allow for agnostic client generation. This would allow for the rapid iteration of clients that were most appropriate for each situation.

Part of the stated goal of the Netflix team was to essentially create an upgraded RESTful experience. They refer to this as “better REST than REST” or “REST++”. The simple idea was to bundle the Graphile library underpinning the environment and bundle that into a Docker base image that allowed any team to pull the image and rapidly create an environment for testing and deployment.

In their quest for this solution, they took on a few solutions that, while specific to their use case, expose the potential for this sort of system. Firstly, the Netflix team decided to take the data tables on their PostgreSQL schema and define them in another schema. By doing so, they were able to:

  • Create secure database views (achieved through the leveraging of the default explicit grant system in the flow between the PostgreSQL user and the web application);
  • Create data tables that existed independent of the exposed GraphQL schema views, allowing for rapid iteration and mutation on distinct data sets;
  • Provided formatting systems through the database views; and
  • Created a data flow where tables and views could be changed, mutated, and transformed “such that the changes to the exposed GraphQL schema happened atomically”.

In other words, the solution at hand created a system where the database view could be worked upon and transformed in new, exciting ways without affecting the efforts of other teams or core business functions.

This solution was not without its negatives, of course. The team noted that nested types were hard to describe in Graphile due to how those types are described. While they could work around those issues, other concerns arose that remain in their implementation. Core amongst these is that in the default behavior as adopted, there were (and remain) concerns about DDOSing through unlimited database calls, the lack of a strong identity management solution, etc.

Netflix notes that, while these issues do still exist, the reality is that, in their use case, the concern is somewhat muted:

“However, in the context of GQLMS for rapid development of internal apps by small teams, having the default Graphile behavior of making all columns available for filtering allowed the UI team to rapidly iterate through a number of new features without needing to involve the backend team. This is in contrast to other development models where the UI and backend teams first agree on an initial API contract, the backend team implements the API, the UI team consumes the API and then the API contract evolves as the needs of the UI change during the development life cycle.”

Nevertheless, this example is intense and perhaps not appropriate for most organizations. Adopting GraphQL can allow for more rapid development and prototyping, and as seen in the case of Netflix, when taken to the extreme, can make for rapid collaboration experiences above and beyond the flexibility of other current solutions. The Netflix team is fully aware of this and notes in the original tech blog the following:

"That said, the successful implementation of an internal app over 4–6 weeks with limited initial requirements and an ad hoc distributed team (with no previous history of collaboration) raised a large amount of interest throughout the Netflix Studio. Other teams within Netflix are finding the GQLMS approach of:

1) using standard GraphQL constructs and utilities to expose the database-as-API
2) leveraging custom PostgreSQL types to craft a GraphQL schema
3) increasing flexibility by auto-generating a large API from a database
4) and exposing additional custom business logic and data types alongside those generated by Graphile

to be a viable solution for internal CRUD tools that would historically have used REST. Having a standardized Docker container hosting Graphile provides teams the necessary infrastructure by which they can quickly iterate on the prototyping and rapid application development of new tools to solve the ever-changing needs of a global media studio during these challenging times."

Conclusion

GraphQL is a powerful solution that enables a good deal of freedom in deployment and iteration. As we can see by the Netflix story, it can come with its own drawbacks in specific deployments. Accordingly, while GraphQL is a great technology and can unlock great potential, it should be used in the appropriate circumstance with an understanding of its limitations. With an understanding of limitations — as well as strengths — GraphQL can be a great value multiplier.

What do you think about the Netflix experiment with GraphQL? Are there any other similar experiments that you’re aware of that you would like us to cover? Let us know in the comments below!