How Netflix Scales Using Data Abstraction Art Anthony October 22, 2024 With over 275 million subscribers worldwide, Netflix continues to be a streaming juggernaut. If you still need convincing, just compare that figure with Prime Video’s 200 million subscribers, or Disney+, Hulu, and ESPN+, which combine for a total of 229 million. And even though it wasn’t the first video-on-demand service to launch (Amazon’s Prime Video launched in September 2006, and the first version of YouTube appeared at the end of 2005) Netflix is still the king. Around 60% of UK households have at least one Netflix viewer. So how did the company that Blockbuster once turned down an offer to buy for $50 million become a $150 billion company? Comfort shows, like that one with those friends in New York, and original content, like that other group of chums who keep encountering strange things in small-town Indiana, are certainly factors. But, as Netflix’s Vidhya Arvind demonstrated when she joined us to speak at our 2024 Austin API Summit, they’re far from the only reason Netflix’s growth has been so meteoric. A big part of the brand’s success lies in its data abstraction and API-first approach. Below, we’ll dive into the company’s stack to find ways you can mimic their agility and ability to scale. This post was inspired by Vidhya Arvind’s session at the 2024 Austin API Summit. Netflix’s API-First Approach We’ve written a lot about the rise of API-first companies and even highlighted Netflix as one such product in a recent piece about API-first success stories. Here’s the short version of that story: by breaking down its monolithic systems, Netflix created a decoupled and extensible API that can be accessed from a huge range of devices without the need for DVDs or custom-built applications. Back in 2021, we wrote about how Netflix deployed GraphQL Microservices (GQLMS) as a backend to unlock a range of benefits, including increased organizational understanding and faster deployment. Bear in mind that GraphQL was only released as open source in 2015 and then moved to the newly established GraphQL Foundation in 2018. These experimentations speak to the fact that Netflix has never shied away from embracing new technology to get an edge over its competitors. There is, of course, a downside to all of this: using hundreds of microservices — Netflix currently uses more than 1,000 of them — can cause complexity, data translation issues, and cascade failures. Data abstraction is one of the ways Netflix sought to mitigate these risks. More Use-Cases, More Problems If you should feel inclined to do so, you can watch Netflix on everything from aging Roku devices to smart refrigerators. While that flexibility is one of Netflix’s greatest strengths, it also poses a significant problem for the company itself: hugely varied use cases. Arvind specifically highlights how, when we don’t have abstraction measures in place, every application needs to be able to understand various data formats from a range of APIs. And that means extra legwork for client teams who may need to work with all these different APIs. That means, as Arvind continues, that client teams need to familiarize themselves with “different languages, rough edges, tuning parameters,” and so on. In seeking to address this problem, she asks two questions: “Can we take the common patterns and provide a common solution?” “Can the solution be generic and storage agnostic?” Components of Abstraction and Virtualization One solution is introducing data abstraction layers. With abstraction, you break a complex system into smaller pieces with clearly defined boundaries. Arvind outlines how abstraction and virtualization can help in distributed systems by enabling you to switch between implementations and providing the ability to layer systems to solve bigger problems. This can be achieved by using a data gateway as an abstraction server to obfuscate what’s going on between a client application and the storage engines/APIs mentioned above. According to Arvind, the three main components of abstraction are an abstraction client that sits in your client application, a gateway (abstraction server), and a control plane. Arvind comments, however, “When many applications are connected to a single abstraction layer, that can be a single point of failure. So we need to think about how we can deploy these in a way that offers isolation and avoids noisy neighbor problems.” We can accomplish that by using sharding, one of the key components of virtualization. Here, we connect different sets of applications to their own abstraction layers with a control plane that knows how to talk to different databases, thus giving us the isolation we’re seeking. Arvind also highlights that we shouldn’t think of abstraction as one simple server. In fact, we can use different processes, like key-value (KV) abstraction, tree abstraction (in conjunction with KV abstraction), or UI personalization (in conjunction with KV abstraction), to add extra layers. Be sure to check out the video of Arvind’s talk, embedded above, for some in-depth examples (inc. runtime and persistence) of how we can use configuration to deploy such compositions. Ease the Pain of Migration With Abstraction Arvind goes on to explain how abstractions can be used for shadow writing, which can help with migration. “Migrations are painful,” she says. “It took almost a year and a half for us to migrate 250 Cassandra clusters from Cassandra 2.0 to Cassandra 3.0.” With shadow writing, she continues, “We start with DB1, your only implementation. Then we add DB2, and now we talk to both the databases in parallel. Underneath, we move the data from DB2 to DB1 to backfill the data. All of which happens without the client touching anything on their site. We can then promote DB2 as a primary, at which point we can decommission DB1.” As many interesting possibilities as abstraction opens up, such as easier data migration, it’s important to remember that “your API is your contract with a client,” says Arvind. “No matter how many abstractions you have, you need to have a clean and simple API on the client’s side so that they can come and use it without being aware of what’s going on behind the scenes.” In Netflix’s case, all of that behind-the-scenes action is packaged into a slick UI that’s surprisingly consistent across many different devices. Starting Down the Road to Abstraction Arvind rounds out her talk by discussing some of the building blocks for KV abstraction… Chunking Compression Caching and Nearline Caching Adaptive Pagination Signaling and SLO Signaling Summarization Dictionary Compression …before closing with a quote from Marc Andreessen: “Every new layer of abstraction is a new chance for a clean-slate redesign of everything, making everything a little faster, less power hungry, more elegant, easier to use, and cheaper.” Although most API developers are unlikely to need to scale to the same degree that Netflix has (although you never know!), abstraction can still be a valuable tool in the toolbox because it offers API providers a greater degree of flexibility in how they provide their service. When an endpoint decouples a consuming application from the infrastructure providing that service, it allows you to make changes behind the scenes without negatively impacting the API. And, with how it shifts away from code and very specific implementation models towards common metadata frameworks and language-agnostic modeling, abstraction has also been touted by some as a match made in heaven with the API-first mindset. If you’re already thinking API-first, it could make sense to start thinking about abstraction, too. The latest API insights straight to your inbox