Data has shape. Shape has meaning. Data is a representation of objective reality. In its ultimate form, data is perfectly normalized and has 0 redundancy while maintaining 100% integrity. It is an ideal, beautiful world created by mathematicians and computer scientists.

And then reality kicks in. Within your average stack, many competing data models exist for areas like:

  • Storage (relational or not)
  • Backend Services (Protobuf/gRPC) with REST protocol
  • GraphQL (because everyone wants it)
  • Front End

Each layer of the stack has a unique language to define objects and relationships. We are talking DB schemas, Protobufs, and GraphQL SDL. Each layer follows its own optimization functions, has different access patterns, and is maintained by different teams.

Probably nobody “actively decided” to have all of those layers in a single stack. Technically, there is no reason to maintain multiple disjointed data models. It is most likely a result of natural software evolution — adding each of the layers made sense at some point in time.

In particular, the duplication and overlap become apparent if you happen to have separate back-end-to-back-end (microservices) protocols such as Protobufs and then back-end-to-front-end protocols such as GraphQL.

GraphQL was created for server-client to appease consumer-centric API designs. Compared with Protobuf, the GraphQL spec provides native support for building connected data models at scale — such as type extensions and schema delegation. With a better serialization format and support for modern transport protocols, GraphQL could take over server-server communications too.

Disconnected Data Model for Microservices

Protobuf over gRPC: A popular backend setup.

Microservices architecture evolved to solve backend development velocity problems by allowing for separate deployments, autonomous execution, and clear ownership. Protobuf over gRPC is perhaps the most popular protocol to implement this solution. Because of that, best practices for microservices architecture center around decoupling and independence.

Data models often become very “denormalized” as the problem of connecting types and getting actual objects by ID were pushed up the stack. For example, the fact that the list of UserIDs in the Team message can be resolved to actual User objects, may not be built into the data model, but sprinkled everywhere it’s used.

Single Data Model at Scale

GraphQL logo

GraphQL: A more usable protocol for the frontend… and possibly the whole stack?

GraphQL became mainstream a bit later on. This protocol is all about catering to consumer needs. The recommended approach to defining the data model is to look at it from the client perspective. So, the types get reconnected again at this level.

The community around GraphQL is an enthusiastic crowd (myself included) and seems to have the ambition to take over the lower parts of the stack as well. As a matter of fact, projects such as Prisma, Hasura, and Postgraphile are aiming to directly provide a GraphQL API for database access, eliminating the need for a traditional backend layer in-between.

Prisma v2 has pivoted to be a generic javascript interface access tool for DB without tight coupling to GraphQL. Hasura and Postgraphile take PostgreSQL database schema and serve it over the GraphQL interface. They have very different approaches to how deployments are structured and offer different tradeoffs but generally built around the same idea of reusing the same data model through the whole stack.

Of course, for a larger organization, using a single database to storing the entirety of its data is not a feasible approach. So, the main question concerns strategies for merging and delegating parts of the queries between different parts of the system. And that is where GraphQL has the most developed ecosystem since it is natively allowing for API composition and type extensions.

In proto3, there is no “native” support for extended types, and the recommended approach is to include any bytes blob, which is up to a consumer to decode and encode. Extended types are not first-level citizens of the spec.

So the trend for managing larger data models is that organizations that have adopted the microservices start adding a GraphQL layer on top of it to normalize the schema and reconnect the types.

Disadvantages of Using GraphQL for Server-Server

So why would you bother to keep both (Protobuf and GraphQL) around? There are some disadvantages of standardizing on a single solution (GraphQL) for both server-to-server and server-to-front-end communication.

  • Lack of standard tools: Since GraphQL is not the first choice for a microservices architecture, traditional tooling and infrastructure such as logging, auth, and caching have yet to be standardized.
  • Performance: The other question is whether GraphQL can be as performant as Protobuf over gRPC. I think it can eventually, but it’s not there yet. The default data serialization for most GraphQL server implementations is still JSON, which is multitudes slower compared to Binary Protobuf. Additionally, gRPC is based on HTTP2, which is multitudes faster than HTTP.

If one solely compares the two technologies, just as JSON over HTTP (GraphQL) vs. binary format over HTTP2 (Protobuf/gRPC), the latter would always seem like an obvious choice.

That being said, the GraphQL spec does not have any recommendations around the transfer protocol and serialization format per se. By far, the community around GraphQL is mostly oriented toward front-end clients. Therefore, JSON over HTTP is a very logical implementation as all browsers support it. However, the adaptation of more advanced standards (such as HTTP 2/3 and BSON maybe) for GraphQL is just a matter of time. Such new standards could significantly improve GraphQL performance.

The Future Data Model Is More Normalized

GraphQL has correctly isolated the most faulty parts of the process. As the organization scales, communication and coordination between teams become the bottleneck. GraphQL offers better ground to facilitate team collaboration by asking the data providers to declare their capabilities (as a publishable schema) and the data-model-first approach to design.

Also, there are a few solutions that the GraphQL community offers for the problem of managing connected data models at scale. One option is Apollo federation and schema stitching by graphql-tools that allow us to decentralize the logic yet provide a single connected API.

Organizations that want to reduce overall system complexity while maintaining the flexibility associated with microservice architecture would need to connect and normalize their underlying data model.

So, either Protobuf-based solutions will adopt a curated schema design approach and thus be more friendly to consumers — or GraphQL will become friendlier to backend development needs by adopting faster, more advanced data exchange protocols.

Diana Suvorova

I am a front-end engineer at Uber, originally from Moscow, Russia and I have an MS in CS from BMSTU. I moved to Silicon Valley 10 years ago and have worked on back end for machine learning and data analytics companies while gradually progressing to working with front end development. I am an active member of the JS open source community, and an author of eslint-plugin-react-redux package with 7k weekly downloads and contributed to chromium.