Structure and Meaning Must be Right

GraphQL is indeed an attractive data API for applications (and people). However, even if you know the basics of GraphQL, you may run into issues about getting the API data structures right and in prettifying the data content to make it business friendly.

What does “pretty” mean, anyway? Most people can answer that for designer dresses, flowers, pets, babies and so forth. But pretty data? It turns out that there are actually two dimensions of this: Structure (of the data) and meaning (the business semantics). If you get both right, your data is good to go.

There is an approach to information-driven analysis and design, which is “New Nordic” in the sense that it implements the traditional Nordic brand values such as superior quality, functionality, reliability, and innovation. Within this context, here follows a proposal of new ways of communicating both structure and meaning of the business context in the GraphQL API. Visualizations of the schema and of the queries are important parts of this.

As you know, GraphQL is not a database, but it is an API of data, which is described in (and produced from) a set of GraphQL schema. Since so much relies on the schema, you need to ensure it has high quality. That is both the structure part and the semantic part of the meaning of the data.

You are also looking at servers producing data from many different sources, legacy or new. Whatever kind of source, there may be quality issues. Garbage in equals garbage out. For that reason, GraphQL API design may take you into having to resolve data discovery and unification issues, such as quality, metadata, and business acceptance. This is the “meaning part” and we will come back to that later in this post.

Visualize Structure with Property Graphs

The schema is a data graph containing related concepts in a network organized as a directed graph. In a way you could say that the GraphQL approach makes everything look like a graph! (Which is actually the case, anyway).

This makes the property graph approach to graph visualization a powerful opportunity for the GraphQL Schema designer. Here is a represented as a property graph:

sample property graph

Internet Message Format represented as a property graph

The circles are concepts, which are the nodes of the graph. A Message, for example, is a concept, and it is part of several relationships, such as “originator” (who sent the message) or “in reply to” (which other message is this in reply to). The properties can be attached to concepts (nodes) and/or relationships (edges of the graph).

The notation uses arrowheads for cardinalities. In the diagram you find both one-to-one, one-to-many, and many-to-many relationships. Here is a brief explanation of property graphs.

In the GraphQL context, the property graph is useful for representing the structure of the schema:

  • Nodes are types (object types, interface types and union types)
  • Relationships represent the connections between types
  • Properties are the fields of the types (scalars or lists)

Here is the Message type with its properties and relationships in GraphQL:

Note that in property graphs, the relationships are named. This is important, because those names are part of the business semantics, and by visualizing them, it is much easier to review and discuss the meaning imposed by the structure. In the example, Graphcool’s @relation directive is used to get the names of the relationships into the schema.

The property graph representation is considerably more compact than the “boxes and arrows” found in most diagramming approaches. In the GraphQL space, there is a tool called GraphQL Voyager. The Voyager is based on standard diagramming library, which shows in the “boxes and arrows” style. Getting a solid grip on the structure across say 5 or 8 object types is not easy.

The property graph representation is more compact and has been successful over the last 15 years, and it comes out of the Nordics. Neo4J, based in Malmö, invented the property graph model as a data model, and they are now a leading player in the graph database segment, worldwide.

At this point, we have dealt with schema visualization, and that makes GraphQL pretty and good:

  • Alignment with business terminology and definitions (structure and content fields)
  • Understanding complex schemas, structured as graphs

A Deeper Look at Property Graphs Applied to GraphQL

The queries and their results use the same naming etc. as in the schema, and the resulting data are structured like hierarchical tree structures, which can also be represented as property graphs. A scenario: Let us assume there is a need for a query result, which has its root in the Originator. First thing to do is to morph the property graph of the schema a little bit so that it has Originator on top:

Originator on top

Edited property graph with Originator on top.

From that it is easy to grasp that the result set could look like this:

Resulting set

Note that the graph shown above is also a property graph, now just a “bonzai” version of a (tree-structured) part of the schema graph, and a twisted one, at that.

Property graphs are clearly highly relevant for GraphQL developers. They ease the analysis of the data at hand significantly, and they help to organize the resulting API schema and query structures. Having done this, we have also dealt with:

  • Correct exposure of the structure of the relationships inherent in the exposed data (query result)
  • Handling traversals of many-to-many relationships in order to produce a result tree (both schema and query result).

The visualization is an intuitively understandable (pretty good) representation of business and application terminology that can be discussed with business folks.

Dealing with Meaning and Content

Getting the Right Semantics

The quality of the content of the API results depends on the business semantics and on the actual data delivered by the API. We already dealt with the structure and the terminology in the property graphs above, so now we need to handle the actual data content properly.

However, remember that meaning and content go together. If you change the semantics, then you may have to refactor the data.

10 Tips for Prettifying Graph Data Content

As for the data discovery and unification issues there is (too) much information about that on the Internet and in books (including two of mine). In the GraphQL context you should be observant on these 10 most important issues:

  • Include business names in the API
  • Identity and uniqueness
  • Presenting the keys
  • Presenting state changes
  • Presenting versions of data
  • Presenting dates and times
  • Presenting relationships and missing references
  • Which objects and which relationships
  • Presenting the right level of detail
  • Good relationships

How much work is necessary on the resolver side really depends on issues, which are partly out of your control:

  • The quality of the data sources by themselves (structure, meaning and content)
  • Conflicts arising from unification of data from multiple sources (both upstream and downstream)

Let us have a brief look at the 10 most important issues in making data content pretty and good to go:

  • Include business names in the API: Potential naming conflicts. Use the property graphs to get business alignment.
  • Identity and uniqueness: Uniqueness is the business level rules for the determination of uniqueness of the instance of data. Frequently this is a combination of business level “keys” such as social security number, employee number, postal code, product number and so forth.
    Identity is the combined result of the uniqueness of participating types. An order line, for example, is unique for the combination of order number (from the Order type) and order line number (from the Order Line type). In most IT systems identity is ensured by way of a unique ID field (the primary key in relational databases) or other kinds of surrogate keys. Obviously ID conflicts across multiple source databases must be resolved. Also note that the downstream requirements of the GraphQL API data may set distinct requirements of the API’s delivery of identity and uniqueness.
  • Presenting the keys: If proper ID identity fields are not available, then maybe the resolver layer should take care of that. In most cases they are needed in the API.
  • Presenting state changes: Many types of business objects go through a sequence of states across their lifecycles. Care should be taken in planning whether a state change is just a change of a property, or whether it should generate a new identity.
  • Presenting versions of data: Versioning of data content is also worth considering. Should data be versioned (may be a business requirement, not least in the financial sector)? How should the versioning be presented? At least a date and maybe also a “Last version” flag for convenience?
  • Presenting dates and times: Date and time are currently ot scalar data types in GraphQL. It is up to you to use user defined types. Have a look at the GraphQL ISO Date proejct on Github.
  • Presenting relationships and missing references: This is a classic consideration about “NULL” or not. GraphQL supports NULL, so you can use that. You could also use default values such as “Unknown” or similar.
  • Which objects and which relationships: It is not always the case that you are offered the right object types and the right relationships in the data sources. You may need to be selective and you may need to transform and/or generalize some of the exposed data.
  • Presenting the right level of detail: Related to the above – be sure to select the right level of abstraction all over the interface. Who does the aggregations? The resolver or the application?
  • Good relationships: Related to the above – be sure to represent the relationships well. Be careful with information residing on a relationship. You cannot do that in GraphQL, so you may have to invent a new object type for that purpose. The same goes for many-to-many. Be careful when you traverse them in the queries. If the schema is not correct, or if the data has redundancies, you risk creating cartesian products and “queries from hell”.

Final Thoughts

The GraphQL approach has many benefits that seasoned data professionals will admire. It has a good potential of being a long-lasting thing; self-describing, structured result sets are good for everybody. The legacy technologies for interfacing with data were as good as they could be at the time they came about, but that is not good enough today. GraphQL is still young, but maturing, and everyone could benefit from having graph visualizations in there. The same goes for a visual, interactive version of GraphIQL, for end-users!

Oh, and remember: Information is based on trust, and if business people do not trust or understand the data presented to them, they will stop using it!

Be prepared to do the additional work, if necessary, based on circumstances. Make sure what you deliver is visual and pretty. Then you are good to go.

Thomas Frisendal

About Thomas Frisendal

Thomas Frisendal is an experienced data guy with more than 30 years on the IT vendor side and as an independent consultant. He has worked with databases and data modeling since the late 70s; since 1995 primarily on data warehouse projects. He has a strong urge to visualize everything as graphs! He excels in the art of turning data into information and knowledge in a "New Nordic" style. Thomas is an active writer (3 books) and speaker.