How Major League Baseball is Using GraphQL

Don’t steal bases if your team is winning by a landslide. Avoid bunting to break up a no-hitter. Never swing on a 3-0 count when your team is winning comfortably. Major League Baseball (MLB) has all sorts of unwritten rules, rooted in superstition, paying undue attention to its history.

There are some ways, however, in which MLB has proven itself to be innovative and forward-thinking. No, we’re not talking here about Billy Beane’s adoption of the sabermetrics method during his tenure with the Oakland Athletics. First and last Moneyball reference, I promise.

Instead, we’ll be looking at how Major League Baseball implemented a federated GraphQL architecture to solve the issue of service discovery and sprawl.

This article follows a talk given by Olessya Medvedeva, Software Engineer at MLB, and Matt Oliver, Senior Engineering Manager at MLB, at a virtual QCon event. You can watch the presentation here.

MLB’s Big Problem

At MLB, Web Platform handles all core architecture, infrastructure, and DevOps. That spans various websites, such as MLB.com, MiLB.com, USA Baseball, and Play Ball. But it also means enabling many different teams (both development and baseball) to look after their own sites.

Historically, their main challenge has been serving fully rendered pages composed of data coordinated from many disparate services — a recipe for slow load times and page failures. Beyond that, they had low visibility into who was making calls (making it hard to know who was pulling data from platforms).

Oliver also speaks of a system so complex that the team struggled to determine where redundant calls were coming from, joking that they were prone to DDoS-ing themselves, as well as introducing disparate caching and poor handling of upstream failures.

He also states that they had previously been burned by third-party integrations, and wanted to find a way to isolate clients from backend churn. Like a pitcher trying to close things out in the bottom of the ninth, Oliver and his team had their work cut out for them.

Quick Fixes, Slow Decay

Olessya Medvedeva, a software engineer on Oliver’s team, talks about how MLB attempted to use REST services where the client does the heavy lifting. She goes on, however, to highlight some of the problems with this as a solution:

  • Clients are chatty.
  • Response payloads are “all or nothing.”
  • Backend data model is tied to frontend implementation.
  • Backend service exposure.

The team made attempts to address some of these issues using what they call mashups, but this wasn’t an ideal solution either. The mashups increased complexity, including adding a middle layer that was hard to keep track of. And there was the issue of which team owns these mashups…

A lack of ownership sometimes led to the duplication of data access, resulting in feedback loops when services called each other. The overarching problem was that stakeholders lacked a holistic view of MLB’s API service. That’s when the company began considering using GraphQL to address some of these issues.

Anatomy of MLB’s GQL Service

Initially, MLB started by implementing the GraphQL trinity:

  • Well-defined models describe the output.
  • Resolvers translate a request into a response.
  • Services provide internal data to resolvers.

And using a GQL server as a proxy between clients and services resulted in some positive outcomes for them: it made deployment easier, kept updates to models clear, and created a single source of truth because all of their code was in one place.

They found, however, that it didn’t solve every issue they were having. For example, a single source of truth also means a single point of failure. Beyond that, they found contributing with a large number of teams to be cumbersome and tightly coupled models to be restrictive.

They later enlisted the services of Apollo, a platform that extends GraphQL with a focus on federation, which also recently launched what they call the Supergraph. In Oliver’s words, the aim was to create “a topology where you have multiple independent services unified by one gateway.”

MLB’s Federated Graph

Apollo CEO and co-founder Geoff Schmidt, argues that “as enterprises broke up their monolithic application architectures and moved to microservices, everything became so atomized that it now puts the burden on developers to piece everything back together when they want to build a new application on top of these systems.”

Embracing federation meant the creation of sub-services with their own individual models and resolvers, separating pieces of the graph in a more modular way that still allowed inter-service communication. This means that:

  • Each service is only responsible for its part of the larger graph.
  • It’s easier for large teams to contribute or own parts of the graph.
  • Independent services can be versioned and deployed irrespective of other services.

It’s worth bearing in mind, of course, that GraphQL federation comes with its own set of things to watch out for:

  • More complex CI/CD
  • Potential for graph breakage
  • Connections between parts of the graph can be less clear
  • Governance requires near-constant monitoring

Oliver and Medvedeva go on to talk about some use cases specific to MLB. For example, they explain how User Service and Baseball Service communicate with each other, the advantages of federated caching, upstream circuit breaking, and so on. These examples showcase how the process has decreased the number of calls between services, increased visibility upstream, and centralized data access. In other words, the adoption of federated GraphQL is doing (so far) precisely what they hoped that it would.

Microservices vs. Federated GraphQL Sub-services

In some ways, Federated GraphQL might feel like a step away from microservices back towards a monolith. Why not just use service mesh? some might ask. On that topic, Oliver makes the following comment:

“It might be simpler to go a service mesh-esque route, but you’re not going to get a lot of the things you get with Apollo (and strictly Federation) for free.” As useful as service mesh might be, it can get costly. And that’s something even large organizations like MLB need to think about.

Anatomy of Federated GraphQL

Anatomy of Federated GraphQL

The case study above is intriguing because it highlights a very specific problem with microservices. Although microservices offer advantages, they can also be problematic for larger organizations trying to break down monolithic services.

This appears to be the issue MLB faces — how long do the benefits of breaking down a monolith into hundreds, or even thousands, of microservices outweigh the additional complexity and ownership issues generated by doing so?

In this case, MLB addressed that issue by employing federated GraphQL to create sub-services that are independent but capable of communicating with each other. But, because they’re all part of the larger graph, they feel more closely aligned (and are accessible in a similar format) than comparable decoupled microservices might be.

Federated GraphQL isn’t a silver bullet for everyone, but it might be worth investigating for organizations struggling with service sprawl or grappling with seemingly endless microservices.