Federated Data With HyperGraphQL

Is HyperGraphQL the linked data extension for GraphQL we’ve been waiting for?

Data, in many ways, is the driving force behind much of what the information technology industry does to innovate. Each choice is informed largely by one’s ability to find data, secure that data, and then form relationships within a web of unconnected points.

Linked data is thus about connecting data in a meaningful way to uncover these additional relations, boosting our understanding and the value of otherwise disparate sources. GraphQL is promising in this regard, but in its basic form is limited only to a singular discrete or collated data set.

Enter HyperGraphQL. Designed to allow for disparate data sets to be combined, compared, and analyzed, HyperGraphQL promises to be the new evolution of GraphQL with combined, linked data at the forefront – but does it deliver on this promise? Today, we’re going to look at HyperGraphQL, and see what exactly it offers.

What is HyperGraphQL?

HyperGraphQL is essentially an evolution from GraphQL proper, and functions as an interface that is used to query and serve linked data. GraphQL was originally designed by Facebook for internal use, but was later publicly released in 2015 – this origin exposes the actual purpose and function of GraphQL. As we’ve explained previously, GraphQL allows for end users to state what they expect from a data source, and to form the response received in a specific manner and form for further processing. In this way, it allows for quite expressive querying and linking between certain objects and their relationships, which drives the sort of relational network that has come to dominate modern data.

HyperGraphQL could then be seen as a further development on this concept. By offering a basic GraphQL experience alongside a more robust methodology of federating data (a topic we will discuss shortly), HyperGraphQL offers a more significant way to link related objects and perform queries on them even if those objects and relationships are not exclusively contained in a singular database – the very concept of Linked Data.

Linked Data, briefly, is a concept and set of specifications that has been developed by the W3C for connecting data on the web, and linking that data in relational form. The entire concept and goal with linked data is to treat data not as disparate, unique sets, but instead to treat them as discrete subsets of a “global” sort of data store. Establishing a relationship, linking those resources together, and treating them as a singular element in a singular virtualized datastore is everything that HyperGraphQL is attempting to do.

HyperGraphQL: a GraphQL interface for querying and serving linked data that supports “federated querying and exposing data from multiple linked data services using GraphQL query language and schemas.” HyperGraphQL.org

Benefits and Pitfalls

Since HyperGraphQL is based upon GraphQL itself, it similarly is extremely flexible and expressive. This means that, when paired with expressive querying languages and standards, combined data can be mined for more usable, actionable, and valuable information than could otherwise be generated by any singular source. This also means that data is flexible in both transit and request – only what is requested is delivered, and in the form demanded, which allows for resource flexibility and excellent scalability.

The use of HyperGraphQL also removes much of the traditional cost inherent in generating data off of combined resources. Typically, a singular physical database must be created using combined data, essentially turning many into one. While this is doable, this leads to some pretty significant data insecurity issues, and also leads to server bloat as more and more resources are needed on a permanent basis. By creating a virtualized database that is ephemeral for the duration of the action being taken, data is made more secure, and the resources required are reduced.

Not all is perfect, though – HyperGraphQL does introduce some complexity with data, especially when the disparate sources are numerous. A clean and easily understood interface does mediate this somewhat, but the process is still very complex and intensive, and in situations with a great many sources, the gains in ephemeral database mining can quickly be lost.

There is, of course, an issue with security that is inherent in big data collection in general. When working with so many data sets, ensuring that data is encrypted in transit is hugely important. This is less of an issue with combined data when you control the singular database created – when the database is virtual and ephemeral, however, this becomes more complicated.

Additionally, terms of services of data sources as well as the local legal limitations might dictate that the data in question cannot be grabbed, combined, or distributed in the first place. While many APIs offer public data for consumption, other data sources might be governed under legislation such as the GDPR in the European Union. While this is certainly not an issue for your own localized data, permitting you are already compliant, importing data from non-GDPR compliant sources for analytical comparison and data federation can pass on liability to your own database and your own organization. This must be considered both as a technical and an operational issue.

What is Data Federation?

While we discuss the value of HyperGraphQL, we should also take a moment to talk about data federation. The concept of data federation is deceptively simple – it is the process of joining data from disparate sources into a single virtualized database, which can then be acted upon as if it were a singular data source. This is required by data processors because, often, data is in a variety of different forms and locations, and offering actions based upon these sets of data requires a singular system from which to draw and towards which we can reference.

Data federation is facilitated in HyperGraphQL through the use of joinable queries. Because GraphQL by design returns data in a specified and demanded way, multiple data sets can have data points pull from their data stores, and can then be assigned values and joined with other services and data stores to create an ad hoc virtual database. When this is done, the content is linked and value can be drawn from the linked data.

This is done not via the query language proper, but rather through the instance configuration. When the instance is configured, services must be identified, associated with URIs, and given relationship definitions. From there, the internal query can be just like any other GraphQL query, but can return much greater amounts of linked information. In essence, the interaction layer is opaque while the internal layer federates across services – a clean, easy method for linking data.

Example HyperGraphQL Query and Response

Let’s view a sample HyperGraphQL request and response taken from the HyperGraphQL documentation for further context.

An Example HyperGraphQL query looks like this:

{
  Person_GET(limit: 1, offset: 6) {
    _id
    _type
    name
    birthDate
    birthPlace {
      _id
      label(lang: "en")
      country {
        _id
        label(lang: "en")
      }
    }
  }
}

The response is a typical GraphQL JSON object, but augmented with a JSON-LD context:

{
  "extensions": {},
  "data": {
    "Person_GET": [
      {
        "_id": "https://dbpedia.org/resource/Sani_ol_molk",
        "_type": "https://dbpedia.org/ontology/Person",
        "name": "Mirza Abolhassan Khan Ghaffari",
        "birthDate": "1814-1-1",
        "birthPlace": {
          "_id": "https://dbpedia.org/resource/Kashan",
          "label": [
            "Kashan"
          ],
          "country": {
            "_id": "https://dbpedia.org/resource/Iran",
            "label": [
              "Iran"
            ]
          }
        }
      }
    ],
    "@context": {
      "birthPlace": "https://dbpedia.org/ontology/birthPlace",
      "country": "https://dbpedia.org/ontology/country",
      "_type": "@type",
      "name": "https://xmlns.com/foaf/0.1/name",
      "_id": "@id",
      "label": "https://www.w3.org/2000/01/rdf-schema#label",
      "people": "https://hypergraphql.org/query/Person_GET",
      "birthDate": "https://dbpedia.org/ontology/birthDate"
    }
  },
  "errors": []
}

Conclusion

In conclusion, HyperGraphQL is a powerful resource that promises to join disparate data sources. While this is hugely value adding, one should consider the source of these databases – for local data, the ramifications are lesser, but for outside data, legal issues and privacy concerns abound. As such, HyperGraphQL should be viewed for what it is – a powerful tool to consider on a case to case basis.

That being said, HyperGraphQL is not the end-all be-all for every solution. Linked data is valuable, but can cause significant noise over datasets that are too large. If the need is present, however, HyperGraphQL offers a great way to leverage federated data, and should be considered an important part of the toolset for data processors.