Using JSON-LD To Establish Semantic Linked Data

The world-wide web is a powerful tool, and much of this power is driven from the relational intersections between web pages, the data these pages host, and their further relations to other web pages and data sets. In fact, it is this fundamental relationship that drives the conceptual model and approach of the world-wide web – without these relationships, there would be no Reddit, no BBC News, and certainly no Wikipedia.

JSON-LD-logo-64Despite the success of this framework, there are still some issues within the greater context of the world-wide web. One of these chief issues is bridging the divide between human interaction and machine interaction through linked data. Today, we’re going to look at a solution to this issue known as JSON-LD. We’ll look at where the aforementioned issue comes from, what it fundamentally means for web providers, and how JSON-LD is poised as a solution in this space.

The Fundamental Issue

To understand the value of JSON-LD, we need to take a look at the fundamental problem it’s trying to solve. When average users utilize the world wide web, they tend to take many of its relationships and design choices for granted – the interrelational linking between various web sites and the data those sites host are considered a “settled” technology, one that is fundamentally planned out and completed. The truth of the matter is that this couldn’t be farther from the truth.

Data on the web falls into a wide variety of content types, ranging from video to audio, from plaintext to dynamic rendering. While this certainly gives users a media experience, it is the linking from one website to another, providing data related to that which you previously consumed, that takes this interaction from a media experience into a rich multimedia experience. This linking of resources is where the “world wide web” gets its name – content is formed from simple single-service pages into complex webs of data.

The problem is that this system is designed for humans. While this might seem to be a strange problem to have, one must remember all of this data is served to us by machines, and as such, relational linking and discoverable content geared towards human usage results in a system wherein users experience media that cannot be properly contextualized by machines. In other words, a machine knows that there is content, and a machine knows that there are links between the content, but exactly how a video of “John F. Kennedy” and an article on “The Cold War” are linked is almost entirely lost to a machine.

Some strides have been made to rectify this, but they are all essentially decompositions of the web content into chunks that are more easily contextualized using additional systems. The tagging system in WordPress or the categories found on MediaWiki are semi-effective in solving these problems, but again, they require heavy efforts from the user and are essentially a human-centric solution to a machine-centric problem.

Linked Data

With all of this in mind, a solution can be found in the concept of Linked Data. Coined by Tim Berners-Lee, the inventor of the World Wide Web itself, the concept is simple – link data in such a way as to ensure the machines have the ability to not only recognize links between data, but to contextualize and understand them. In his 2006 note on Linked Data, Berners-Lee presents four basic principles for the establishment of such Linked Data:

  • Use URIs to name and identify content;
  • Use HTTP URIs so that these named and identified content entities can be looked up;
  • Utilize open standards such as RDF or SPARQL to provide useful information about what a name identifies to those who query the content; and
    When publishing data to the web, refer to these things by their HTTP URI-based names.

The entire concept is to create a network of data that is named in such a way that the network can be queried – and that, when queried, this network can serve additional data in a standardized format. This data can then be used to infer further relationships, get more data, and create a more powerful semantic network.

As an example of what Berners-Lee is advocating for, let’s imagine a theoretical visit to the Nordic APIs website from a manager, “Michael”. Michael is a subscriber to Nordic APIs, and, finding an article useful, decides to email it to a co-worker at another office, Jens. Jens sees the article, and comments on it.

According to the classical design paradigm, all that has occurred here is a transference of data – while there’s a detailed relationship between Michael and Jens, there is no leverage of this, and as far as the entities are concerned, this relationship might as well not exist.

Now let’s look at it from a Linked Data perspective. In this situation, Michael and Jens utilize a social platform for their internal efforts. Michael uses the social platform button to share this article, tagging Jens in the comment. Jens receives the notification that they’ve been linked, and comments on the piece.

On the base level, this seems to be the same exact exchange, but in reality, there’s a lot of hidden value underneath. When properly represented by URI paths (generated through sharing buttons, profile systems for commenting, etc.), every entity in this exchange exposes an impressive amount of relational data. In a Linked Data scenario, this is what the exchange looks like:

These relationships are important. By defining Michael as a subscriber to Nordic APIs, Jens as a commenter on this article, and Michael and Jens as coworkers via a social link, we create a web of interrelated content with actual, direct context that can be understood by the machines handling that data.

An embedded link on Michael’s social platform that is then commented upon by Jens creates a fundamental difference in how a machine understands this behavior – previously, a machine only knew that an article exists, that there is a subscriber, and that there is a comment. With Linked Data, the system knows all of this, but also knows the exact path this data took, how the comment is linked to Michael’s shared link, the relationship between Jens and Michael, and the intricacies of said relationship between all three entities.

Berners-Lee summarizes this type of network best during a TED talk on the concept:

“If I take one of these HTTP names and I look it up…I will get back some data in a standard format which is kind of useful data that somebody might like to know about that thing, about that event […] When I get back that information it’s not just got somebody’s height and weight and when they were born, it’s got relationships. And when it has relationships, whenever it expresses a relationship then the other thing that it’s related to is given one of those names that starts with HTTP.”

Value Proposition for Web APIs

This is all incredibly important for web APIs for a variety of reasons. First and foremost, moving to Linked Data forces providers to make data more “usable” – by giving more information to other providers, web APIs can form a much more powerful network than they can ever expect by themselves.

Issues with Linked Data

Unfortunately, implementing Linked Data brings with it its own issues. These can be broadly summarized by two points.

First, the web supports a wide range of formats and systems. In order to support meaningful relationships, these formats need to be either standardized or widely supported, and either side creates additional issues. If you mandate a specific format, you’re excluding possibly powerful uses for those other formats, and if you allow any format to be used, you create segmentation in support.

Second, the fundamental nature in which this data is linked needs to be considered in order to establish these relationships. Simply linking content directly does not create a relationship – having Jens exist at https://internalwiki.com/Jens tells the Nordic APIs server nothing about the relationship, and simply serves as a one-way road. Thus, systems must used Linked Data as a default approach in order to participate and leverage the power of such a semantic relationship web – and as such, needs to tie into the consideration on formats as well.

JSON-LD

Enter JSON-LD. Based upon the concepts laid out above, JSON-LD is a Linked Data format that is designed upon the hugely successful and nearly ubiquitous JSON. By utilizing JSON data, JSON-LD seeks to establish the very relational semantic data we’ve been discussing so far.

The way JSON-LD establishes these relationships is really quite simple. Let’s go back to our example, where Michael shares a site with Jens. In a traditional design, we would simply have a small range of data entries, typically the site in which a user originated, some metric data for their browser, IP, etc., and possibly a referral for when the article is shared. While this is certainly helpful for humans, it does nothing for machines.

What we can instead do is assign these users some basic data in JSON-LD that establishes a relationship that can then be inferred or established by servers exchanging this data. Let’s take a look at a JSON-LD entry for Michael:

{
  "@context": "https://organization.org/context/userjsonld",
  "@id": "https://organization.org/managers/denver/michael",
  "name": "Michael",
  "position": "Manager",
  "department": "https://organization.org/departments/design"
  “location”: “Denver”
}

So what does this entry specifically tell us? First off, by defining @context and @id in our JSON-LD entry, we establish a framework by which this data should be interpreted. Michael is part of a corporation known as “Organization”, and both his URI and his “location” entry tell us that he is based out of Denver. His URI further tells us that he is a manager, and this position is later explicitly stated by the “position” value. Additionally, we define the department that he is a manager for.

This data may not seem useful as is, but let’s create a JSON-LD entry to Jens.

{
  "@context": "https://organization.org/context/userjsonld",
  "@id": "https://organization.org/employees/remote/jens",
  "name": "Jens",
  "position": "Designer",
  "department": "https://organization.org/departments/design"
  “location”: “Remote”
}

Now we’ve established an entry for Jens. Note that both the id URI and the “location” value indicate where Jens works – as a remote employee. Regardless, Jens is established to be working as a remote employee in the same department as Michael, and thus we can infer a subordinate relationship.

When this data is sent on to the Nordic APIs website, a few things happen that did not before. Because Nordic APIs already has an entry for Michael that points to this very entry, we have a reference point from which to interpret all other data. Without the entry from Michael, were we to receive Jens entry, we would have no information whatsoever on how they are related – but because we do have the data, when we receive Jens information, we have reference points for the organization, for the position, and for the relationship between these entries.

In this way, we can not only document a relationship between these two otherwise unrelated entries, we can begin to explore more extensive relationships. Let’s say for a moment that Nordic APIs has a knowledgebase of organizations that they are partnered with, and that Michael is a manager that is featured.

When an article is created on Michael and his work, a Linked Data entry can help contextualize projects, such as automatically linking social profiles, tying in GitHub accounts, and even building internal databases for how effective an ad campaign requested by Michael is amongst registered subscribers.

In this way, Linked Data, and more specifically JSON-LD, is a very powerful tool to create relationships and define context.

Strengths of JSON-LD

JSON-LD is a very strong solution for many users because of one simple fact – it removes ambiguity. Consider our JSON-LD entries above – we specifically note a @context data type. What is that? Simply put, it’s a schema that defines what our terms mean.

We have to remember that this data communication is being done between different sites. Because of this, not every term is going to be defined the same exact way – while “name” might mean first name on Facebook, “name” on LiveJournal could be your username or your alias.

As such, being able to define the schema by which we are operating on and then comparing those terms to another schema during data interchange removes a huge amount of ambiguity. This in turn makes the data more useful, more complete, and better related.

Additionally, JSON-LD leverages these type specifications to great effect in @id, @vocab, @type, and more – being able to define specific elements of your data packets mean that you can define what data means without having to specifically define a common forced acceptance of standard – something that standardization, while effective, tends to do. Being able to skip out of this “forced standardization” approach makes it a very powerful solution indeed.

Weakness of JSON-LD

There is an argument to be made that JSON-LD is not the perfect solution. First, JSON can be hard to parse for humans, and at times harder to write. JSON-LD has a very powerful typing system, but this also means that it’s more complex to utilize than something like HAL. If a single competitor had to be identified, in fact, it would absolutely be HAL, and many of its selling points seem set to exploit the issues inherent in JSON-LD.

HAL is simple, and utilizes a very simple and easy to understand system – both things that JSON-LD, at its most complex, are not. As part of this, many have accused JSON-LD of “reinventing the wheel”, especially when it comes to the evolving use of defined types. For many people, it seems to be simple re-use of already existent verbiage in a more complex setting.

That’s not to say that JSON-LD being more complex and somewhat derivative is necessarily a bad thing, however – as Manu Sporny, one of the primary editors and creators of the JSON-LD specification said so eloquently:

“Simpler isn’t always a good thing. In the cases listed above, simpler meant that HAL was not capable of addressing the use cases above. You should use the simplest solution that addresses your use case today and into the future. If HAL is that, use HAL. If JSON-LD is that solution, use it.”
Source

Conclusion

The strongest argument for JSON-LD is actually one of authority – W3C is in the process of making JSON-LD a standard specification. The simple fact is that JSON-LD, though it may have its issues, has been designed from the ground up to solve a fundamental problem, and is based upon a concept defined by the creator of the world wide web itself.

While we can argue about efficiency and specific approach for the rest of time, having a solution based upon a concept from the creator of the world wide web and incorporated into W3C after heavy review from some of the leading experts in the field certainly makes non-adoption a hard argument to accept.

What are your thoughts on JSON-LD and the process of it being made a standard specification? Do you think there are better solutions? Let us know in the comment section below.