Review of Supermodel

Perhaps one of the most important elements of any implementation leveraging data is the actual modeling of the data itself. Data is only useful when it’s understood and contextualized – when it’s ultimately either shared or made into information. Failing to actually model this data in an understandable way is damaging, and can seriously hamper any effort to develop a product. Additionally, it makes whatever product is developed harder to share and iterate upon, essentially stymieing development on a macro scale.

Supermodel aims to fix this problem. Today, we’re going to take a look at Supermodel to see what it is, and what problem it aims to solve. We’ll look at a few examples of the user experience in Supermodel, and posit a use case in which Supermodel would make sense. We also sat down with Supermodel creator and past Nordic APIs speaker, Zdenek “Z” Nemec, and will intersperse some of his thoughts!

What is Supermodel?

Supermodel, a Good API project, is actually two distinct parts which make a greater whole – Supermodel.io is a schema registry for data models, and Supermodel CLI is an open source tool design for working with these schemas models. That being said, Supermodel can be roughly defined as a collaborative tool for modeling data and sharing that data via a trusted registry. Supermodel uses a JSON schema in the YAML format for its data model definitions and defines these data models via a sharing scheme. Models can be authored directly to the Supermodel.io registry, or stored locally and then pushed to the registry using Supermodel CLI.

In essence, Supermodel is chiefly concerned with providing a method to model data, and then present that data in a variety of digestible and commonly supported formats. It works with a wide variety of formats, including JSON, GraphQL, JSON-LD, and the Open API Specification, amongst others such as Kafka. When these models are generated, they can be published to the registry, and at that point, are searchable and reusable by those within the Supermodel.io ecosystem. The Supermodel CLI is distinct from the registry itself, but is a chiefly important tool in that process, converting from various formats and allowing either the reference to or the direct consumption of such schemas.

Perhaps the best summary of this offering can be found in the announcement from Z, its creator:

“The key objectives of Supermodel are to promote modern-day data modeling, manage complex domains, improve communication, and enable discovery and reuse of data models.”

What Problem is Supermodel trying to solve?

Before we can look in detail at what Supermodel does (and how it does it), we should first look at the problem it is attempting to solve. Throughout all of its documentation and its initial announcement, the chief issue that Supermodel raises is the nature of data structure documentation and semantic information availability. In essence, the argument Supermodel makes is that formal data models are rare, and if they exist, are often incomplete. Smaller organizations are more likely to not bother with formal data model definitions at all, and with larger organizations, these models are often indelibly affected by the ancient monoliths and “canonical data models” that arise from traditional enterprise development.

“There are more and more APIs out there, but the problem right now is the data models themselves, and the understanding of data in general.”

In both cases, the data models, if present, do not match the actual production data models, and result in issues that propagate down the path of implementation. With lacking data model formalization, there comes lacking documentation. With poor documentation comes poor knowledge around the chief data models that drive the underlying system, and vis-a-vis the production integrations, poor API development, and understanding. In fact, there is an argument made that tasking the API with formalizing the data representations of the underlying data model actually exposes the internal frameworks and database models in a damaging way – in essence, without a formal data model, you are asking the consumer to make their own model and to know what it can’t know. This has the effect of chaining the API client to internal libraries and databases directly, resulting in less effective, less agile, and less uniquely extensible and powerful code.

Another huge issue this creates is the so-called “tribalization” of knowledge. Without formal data documentation, only certain groups truly understand the data model at hand. While this makes for a small expert team in the best of cases, it almost always results in a situation where knowledge is invisible, wherein knowledge is ignored in favor of duplicated efforts, often to the detriment of the final product.

This is not simply an internal problem, either – when these models are hidden from collaborative view, this issue continues between teams, products, even separate implementations, and results in a less healthy ecosystem. There are no discussions outside of the siloed knowledgeable tribe, and thus, any attempt to use the data is complicated and requires guesswork or outright manipulation outside of the stated use case.

There’s also the very real issue of testing in such a format. When testing units don’t really understand the data at hand, testing that data model is more complex while simultaneously more incomplete – it’s harder to test what you know and impossible to test what you don’t know. This has a very negative effect on the whole for your given data model and can result in issues propagating forward when they should have been caught early on.

Example Use Case – The DataFeed

Let’s look at a possible use case that Supermodel would be extremely beneficial in. A developer is attempting to create an API that uses a Dataset schema in order to provide structured information about an internal database. In this case, the database collects a variety of documents and publications, primary sorting them by the ISSN of the associated document.

The developer sits down to code their API and begins to realize how intense the associate effort of modeling this data is truly going to be. While they have the database already designed on the backend, each element of the data representation – the catalog, the ISSN, the year of publication – represents something that will have to be processed and somehow structured in the data model.

The developer remembers Supermodel, however, from a workshop they attended. After a quick search on Supermodel.io for “dataset”, a schema is found titled – interestingly enough – Dataset. This schema is publicly usable and utilizes several additional inherited data models. Dataset represents all of the elements they would need in order to model their own data internally – “ISSN” for the ISSN number, “temporal” for the specific time limitations for each data point, and so forth – and this schema can be utilized for their own implementation.

Since the developer is using this new schema, they’ve gained some major benefits. First, they have a data model that is proven and usable without the additional cost incurred in development hours. Second, they have an open source schema that allows the developer not only to integrate with others using this data model but to allow integration with their own data model. Third, and perhaps most importantly, the developer has implemented a testable schema, preventing the so-called “reinventing of the wheel” which so often results in novel issues and intimate errors which replicate throughout the nascent system.

In a few simple steps, the developer has done much more than they ever could on their own – and with greater effectiveness.

What Supermodel Looks Like

Now that we understand the use case and the value behind Supermodel, let’s look at the actual schema we’ve discussed, “Dataset”. One of the great things about Supermodel is that everything builds upon everything else – each model can inherit from another model, adding context while making the current “level” specific to the given functionality.

In this case, Dataset inherits several models. First, it inherits CreativeWork, which is a generic model for creative works, such as movies, music, or books. CreativeWork itself also inherits a model known as Thing, which is an even more generic object, representing literally any “thing”. In this way, while Dataset technically only inherits CreativeWork, it actually inherits two separate data models.

The end result of this model is that Dataset – which is a very complex set of possible properties – is actually made up of several smaller sets, allowing for more modular understanding and knowledge around each property and its tested components.

Interestingly, the Schema representation and the Graph representation paint separate, though both complete, stories about the actual implementation we’re looking at. If we look at the Schema representation, we see specifically what properties are imported, and where those schemas exist.

This schema is short because it’s not meant to be representative of everything that’s being imported – instead, it’s meant to give an overview of what specific properties are delineated, as these are each properties of the stated schemas, while it states a complete import of CreativeWork. If we take a look at the Graph representation, however, we see much greater detail.

In the Graph representation, what we’re seeing a graph of outbound references. Supermodel defines outbound references as references from the current model to an external schema. While it shows internal properties, this graph is much more concerned with the externally referenced elements. Here, we can see the true complexity behind the implementation.

From a distant view, we can see just how much is involved in this particular schema. On the left side of this picture, the green box represents Dataset itself, and to the left of it, its stated properties. To the right, however, is the referenced CreativeWorks schema, which demonstrates a large amount of additional schema properties that are being included in this particular implementation. We can see this in greater detail with a closer look at the CreativeWorks part of this graph.

With this closer view, we can see the actual detail within the schema itself. While this cuts out a lot of content (and in fact, there is almost no way one could see the entire schema while providing a readable amount of information) that is being called as a property, it does demonstrate one of the biggest values of the Supermodel system – data model visualization. By visualizing data in this format, especially in terms of knowledge and then testing, a greater understanding of the model and its underlying secondary schemas is achieved. The value of that cannot be overstated, and this is clearly something lost amongst simple import statements and schema references.

Conclusion

While it’s technically true that a developer can get away with not providing a data model, that’s about as true as saying that a mechanic can get along without a wrench – yes, it’s true, but the reality is that life would be difficult and any work would be monumentally more difficult than it needs to be. Accordingly, providing proper data models and contextualizing data into information is supremely valuable.

Supermodel offers this in spades. There are some caveats, of course, as there are with anything. Supermodel’s offer of a sharable, collaborative schema works for many applications, but there are some spaces in the API industry in which sharing such data models might be considered sharing intellectual property or proprietary trade secrets. There are even some cases where governmental APIs might want to purposely not provide these schemas for security through obfuscation – whether or not it’s effective.

There is also the very real fact that, for many microservices, data models might be so simple that a data model could be as simple as a stated graph in the documentation. In these cases, Supermodel can be useful, but it may not be an absolute requirement. This is also true of “single service” APIs, in which the data model is unimportant since the API is meant to only serve a single purpose, and is not meant to iterate or integrate.

All of this said, Supermodel is still a highly effective solution for the stated problem, and it does so with a very beautiful GUI and a highly efficient graphical representation system.

What do you think? Is Supermodel the best tool in this category? Do you have any other tools you’d like to share with the community? Let us know below!