How to Build an Open Data Ecosystem

Today’s most valuable data is locked away in silos. Its producers have most likely never been remunerated for their contributions, nor will they ever be able to reap the benefits that their accumulated data could potentially create. However, there’s a new approach to liberate data from silos and create open data ecosystems.

Such open data ecosystems include four core components, which are a mix of technical and legal solutions: Data Unions, Data Portability, One-Click Rights, and API Neutrality. Let’s dive into them to see how to build an open data ecosystem.

Data Unions: Turning the Power Dynamics of Data Collection Upside Down

Data Unions are the first ingredient in getting data out of silos, namely by bypassing those silos. Consider an average internet user browsing the web today — they use Google Chrome and are tracked by Facebook (and many other apps, but we’ll just use these two for the sake of argument). Google and Facebook get extremely valuable data through the millions of people they are tracking. We all know that.

So far, the response has either been to protect our privacy by using blockers and alternative browsers, or simply not to care, which is the option the majority seems to have opted for. But ultimately, this data is quite valuable and could help startups and businesses gain new markets.

There is a third option to the current privacy and tracking debacle. Data Unions turn the power dynamics of data collection upside down. Internet users can join a Data Union to crowdsource their data together with thousands of other Data Union members, thereby creating alternative data sets. Data Unions split profits from data equally, and users can choose which data points they wish to share and when to truly browse incognito.

A Data Union framework could be integrated into any existing app, whether it’s a fitness tracker, a meditation app, a travel planner, or a ride-sharing service, to monetize the data created in these apps ethically and transparently. Additionally, it allows for entirely new business models to form. One example is Swash, a browser plugin. With Swash, users can monetize their browsing data while creating alternative data sets from their Amazon shopping habits, Google page ranks, and ads clicked on Facebook.

The data is transported, fully encrypted, in a real-time peer-to-peer network, and never stored by the Data Union. Users decide which data points to sell, and they can end their Data Union membership at any time. Payments for data sold are made instantaneously by data buyers via a cryptocurrency, which can be conveniently traded into fiat or other cryptos. The fiat banking system cannot handle micropayments in real-time, nor can it pay unbanked internet users for their data. This is also something a tax, like the Data Dividend proposed by Andrew Yang, could not possibly achieve.

Data Portability: Theoretically a Good Thing

Of course, there is the threat that big tech doesn’t want their users to self-monetize their data. However, looking at Europe’s GDPR, internet users do officially have the right, according to Article 20, to port their data from one service to another. The EU’s Data portability regulation was not written to allow people to sell their data, but rather to give them the option of changing services seamlessly. Say you’re listening to music on Apple Music but would like to switch to Spotify. In such a case, data portability was intended to make the experience of changing services easier. Ideally, it would take one click to port data from one service to another. Obviously, the reality looks different.

Platform providers have a 30-day grace period before data must be made available for portability. There’s also no required format in which the data needs to be delivered. Effectively that means that service providers will hand over an Excel sheet to their user who has been waiting patiently for four weeks. After that, if the intention is portability to a new platform, the data must be entered manually. The current reality of data portability is far from being a seamless, interoperable experience.

But besides that, Article 20 also offers an inroad to data monetization. Suppose a user wants to take their data from one service to another to have the same music library in two places or to display their musical taste on one platform but monetize it in another. In that case, this leverages the same legal principle of data portability.

Looking at the monetization of data, it gets more complicated, at least in some domains. Regarding highly sensitive data like health data, the law of each individual European jurisdiction is above GDPR. This means that some European countries will allow their citizens to monetize their personal health data, and others won’t. However, this is more of a niche case and less relevant to the goal of opening up data silos by funneling our online data away from the tech giants.

One-Click Rights: What We Need for Better Data Portability

The EU has realized that the current speed of data portability is not on par with how data usually travel, which is often real-time and not a 30-day grace period. The upcoming EU Data Act, which will be passed in 2021, will extend current data portability rights and make them more attuned to our online reality.

Regulators are also aware of the so-called vision of a “One-Click Right” where users could, with the touch of a button, exercise their GDPR Article 20 rights and take their data from A to B in an instant. Yet this futuristic vision needs another vital ingredient: API Neutrality.

API Neutrality

If we look back at the concept of Data Unions, some current models work today, and others rely on API Neutrality to become a reality. If a user wants to monetize what they are listening to on Spotify, then a Data Union can already get easy access to their music taste by tapping into Spotify’s open API. However, if a user wants to do the same with Netflix, it would be much harder to create a Netflix Data Union since there is no open API.

To automate data portability, Mehdi Medjaoui has begun lobbying for open APIs. In one blog post he writes about Airbnb: “They scraped Craigslist to bootstrap their platform at the beginning, but which now no longer allows third parties to scrape their website to find renters. This is unfair if we want fair competition and a free market.”

Without API Neutrality, or platforms shutting down their APIs for Data Unions, screen-scraping would be the only workaround. This is a solution that would be neither scalable nor accurate enough to produce the right data sets. Therefore, support from regulators is needed to create a truly open data ecosystem.

The Business Case

For new businesses, Data Unions offer an ethical way to monetize their users’ data. By sharing revenues with users, they gain a competitive advantage over other players. Such an arrangement could become quite profitable for startups and users alike.

Earlier this year, an anti-virus browser plugin made news by secretly collecting user data. The company behind it, Avast, and its sister company, Jumpshot, made millions by selling their users’ browsing habits to enterprises such as Pepsi and McKinsey. Such a model could equally function by having users join the business and share a percentage of the revenues.

In combination with the right legal backbone, Data Unions offer a transparent tool that brings us an alternative for dealing with privacy issues online and how we approach data collection and monetization at large. When everyone becomes a stakeholder with their own rights and ability to profit, data silos can be opened up to inspire new innovations, research, and ultimately better business.