Lean and Mean Open Data Machines

Open-Data-Machines-nordic-apis-doerrfeldOpen data folks could have a lot to learn from the API community, and vice-versa is true too. As the open data field is about exposing information, there is much overlap between open data and APIs or Application Programming Interfaces, which expose an application’s data or functionalities for third party integration.

Open data is commonly used by API practitioners, app developers, and end users, perhaps unknowingly. Have you ever used a weather tracking feature in a web app? Have you ever tracked the location of public transit from your smartphone? If so, then you have used open data.

Backed by government bureaucracy, however, the open data space evolves at a slow pace. With outdated consumption modes still the norm, many open data platforms have a lot of catching up to do to match the usability of developer-pleasing services like Twilio.

Nevertheless, there is a great case for changing the tides and creating open data platforms that are API-first. As we feel both sides can benefit from the exchange of knowledge, in this article we introduce open data, and evangelize the value open data already has and can have if positioned with the right technological makeup and marketing facade.

What is Open Data?

Open data is by definition data without license, limitation, or price. There are many kinds of open data with awesome potential uses cases, in areas such as culture, science, education, statistics, economy, finance, weather, environment, transportation, agriculture, and more. These are typically large public data sets, published by the government, businesses, or research groups.

What can you do with open data? Well, with an open data set of student test scores under an Open Data Commons license, for example, you would be free to do the following:

  • Share: Copy test scores, and distribute the data freely throughout any device, app, or platform.
  • Create: Produce a commercial product that accesses the database, such as an app that visualizes average student behavior from the lens of varying demographics.
  • Adapt: Modify, transform and build upon the database, such as through publishing new findings from user or teacher input.

Minor caveats may exist, mainly with the intent on preserving the database. In the case of the ODbL 1.0 license, you are free to use the data as long as you:

  • Attribute: You must attribute any public use of the database, or works produced from the database, in the manner specified in the ODbL.
  • Share-Alike: If you publicly use any adapted version of this database, or works produced from an adapted database, you must also offer that adapted database under the ODbL.
  • Keep open: Redistributed databases with DRM restrictions must also redistribute a version without those measures.

open data big data nordic apisWhat is not open data? Proprietary data offered by a business like Facebook is not considered open data. Most commonly, open data exists in the public sector, hosted by federal government entities like Data.gov or city centers, like Open.stockholm.se. The amount of government backed open data in the US is impressive — over 188,000 data sets encompass a wide array of subjects. Open government data initiatives in the UK and Canada are worth mentioning as well. Currently, over 52 countries support open data platforms, according to an international list published by Data.gov.

Sweden toutes their own government data portal, Öppnadata.se, and also offers OpenAID.se, an interactive vizualizatoin into aid given to developing nations and refugee support. Portugal, Belgium, France, UK, The Netherlands, Italy, Estonia and Spain are among other EU countries providing open data platforms.

Why Open Data?

Opening data is doing a public good. In a government setting, open data increases transparency, and allows citizens access to valuable information. In our digital age, this information is crucial for spurring innovation in business to deliver end social value. A business may also choose to open datasets to increase engagement with their community, and for government, it increases social participation in governance.

Open Data is Different than Open Source

It should be noted that open data is different than open source software. Open source refers to software licensed under MIT, Apache License 2.0, or GNU, or other license agreements, that developers can freely use to create commercial applications. CKAN, for example, is an open-source software upon which many data portal platforms are constructed.

Licensing software is different than licensing data. Data is the raw material, as opposed to software, which operates on those assets. Open source code published on Github can be forked, modified, and redistributed. Open data may similarly allow for reuse and redistribution, but when it comes down to it, open source and open data have differing business, maintenance, and licensing approaches.

Defining an Open Data Platform

An open data platform shares many similarities with an API platform:

  • Both automate relationships: No face-face interaction is required, relying on a web portal for access.
  • Both are hubs for innovation: Both open data platforms and API platforms are created with the intent of fostering new ideas and new applications.
  • Both are made to fulfill organizational goals: For a business, an API may be a monetization strategy, whereas for government, open data may be required by law.

Watch Andreas Krohn present on this topic at a Nordic APIs event

The Benefit of Open Data API Platforms

An API developer’s first impression of open data may not be positive. If you scroll through these data sets, you’ll see accessibility options — often HTML, .zip, .text, Excel, or PDF. For developers used to the flashy design of Facebook and Twitter built APIs, an FTP directory with an Excel file in it seems stuck in another age. Marketing many of these data platforms to developers as is can be a challenge.

Open data publishers often are coming from a very different background — a combination of academia and bureaucracy, which causes them to think very differently than web devs. According to Krohn, they likely have been working in a slow moving bureaucratic organization for a long time, impacting the way they think, work, and how they present data.

Though exposing data in the aforementioned formats is the norm, it doesn’t have to be the future of open data exchange. Many cases of open data do embrace API distribution, using XML/SOAP web services or RESTful JSON formatted APIs. The platforms that haven’t embraced API-accessibility yet can greatly benefit from doing so.

How Do We Improve Open Data?

Though often overlooked, open data is ripe with value. For example, a 2014 Commerce.gov survey estimated that 301 billion forecasts were consumed each year, estimating an aggregate annual valuation of weather forecasts to be about $31.5 billion.

As entire ecosystems of companies tap into weather data, similar value must inherently rest in other sectors too — climate, finance, education, culture, science, statistics, economy, finance, weather, environment, transportation, agriculture, and more. But how to improve open data platforms in a way that enables intuitive third party application development? Here are three general areas that are important for any API project; critical aspects of improving an open data platform:

Technology

Unfortunately, getting locked into certain technologies is where many open data projects begin. Often, the provider considers what technical platform should be used, such as CKAN, instead of first considering:

a) what data should be offered,
b) the value of the data,
c) and how it should be exposed.

Technology should be viewed as a tool, not as an end solution. Too commonly, adoption of an API management solution or framework arises out of current industry trends, rather than what makes the most sense for longevity in a particular use case. When publishing an open data API, you encounter many of the same API management hurdles as if publishing any type of API. For open data though, authentication and rate limitations can often be loosened as the data is meant to be freely available.

Organization

In the realm of politics, open data is founded on beautiful, egalitarian principles. Increase transparency, decrease corruption, promote business innovation, and revel in increased tax revenue from economic expansion.

These ideals, however noble, won’t ever come to fruition if people don’t ever use the data. The truth is that there is a lot of wasted money in open data right now, funding thousands of dormant data sets.

In the future, politicians are going to vote for open data projects that have real, physical outcomes. So what are the real arguments for open data?

  • Enabling new projects: Can do things never done before.
  • Savings: If you can save money in an internal project and release open data as a side project, it’s worth it. Everything else is just a bonus.
  • PR: You look great doing this.

Open data sets are not monetized, and typically, the funding for the collection and maintenance of the data in such situations comes from public sources. However, some have suggested a premium option as a monetization scheme, which would allow access to more real-time data, at faster speeds, or access to datasets that contain both open and closed data.
The API Lifecycle eBook blog post bannerCTA-01

Marketing

Like any platform, open data initiatives need to acquire users to remain viable. But many open data programs are stuck in an “open data bubble.” Many aren’t positioning themselves in a way that targets people who are developers searching for API integrations.

Don’t forget who the potential user base of your open data API is, and make sure they are aware — app developers, entrepreneurs, designers, hackathon attendees, etc. This doesn’t mean immediately hiring an open data evangelist before your documentation is complete (an Excel file not documentation).

Conclusion: Construct Open Data Platforms Around an API

Open data portals can learn a lot from API-first culture, in terms of design, usability, product ownership, and ongoing maintenance. As Jason Hare, writing for Opensource.com puts it:

“Data online should be API [First], and data portals need to be replaced with something more useful and less annoying.”

The open data field is an exciting frontier, with limitless potential for innovation if open data platforms host APIs to maximize usage. Having an API-first mindset can spur open data usage by increasing the ease of innovation, but remember:

  • Opening up a data platform with an API involves much more than simply technology.
  • There are definitely selfish motivations at work: motivations will lie in potential savings, PR, and actual usage of the data.
  • Reach outside the open data community to market your open data platform or API to new faces.

That’s all for now. Thank you for reading, and please comment below if you have thoughts on the subject. We’ll leave you with the power of open data to change the world, as eloquently phrased by Open Knowledge:

A world where knowledge creates power for the many, not the few.
A world where data frees us — to make informed choices about how we live, what we buy and who gets our vote.
A world where information and insights are accessible — and apparent — to everyone.
This is the world we choose.

Resources