Using Apache Kafka to Build Flexible APIs (Case Study)

Posted in

Building big platforms has always been difficult. Old-fashioned monolithic applications are plagued with slow delivery cycles, questionable reliability, and serious complexity. On the other hand, the microservices approach comes with complexities of its own, such as service overlap or unintended dependencies. But what if you could create loosely-coupled services with a clear business purpose, and no limits for scaling?

In this article, we look at how Norwegian public transportation company Entur is using Kafka to build a scalable services architecture with minimal dependencies and maximal flexibility.

This article is based on a talk given by Henrik Stene of Aboveit at the 2019 Nordic APIs Platform Summit.

 

The Goal: A Loose-Coupled Architecture

A core part of Entur’s business model is its unified sales and ticketing system, which allows customers to make ticket reservations for almost all major transport media. The company was keen to build this system as a collection of independent data and function modules, offering the flexibility of a fine-grained architecture with clear business logic.

Here are just some of the data modules they settled on:
payment
customer
order
product

By minimizing data overlap (and instead using references between modules), each data module can be treated as a single source of truth for the data at hand, Henrik explains. This means that all customer data is kept in one place, simplifying compliance with data protection regulations like GDPR.

Function modules, like order history or reserve, combine multiple data modules in order to fulfill a specific business function. For example, the order history function module combines data from the customer and order data modules to generate a list of a given customer’s orders. The main purpose of this is simply to make it easier for customers to interact with the API.

Of course, it’s essential for information to flow between these modules for them to operate smoothly. Entur wouldn’t want to create a new ticket if the order hadn’t been confirmed, and it wouldn’t want to confirm an order if it hadn’t been paid for. But how do you allow information to flow between these data modules without creating dependencies?

Apache Kafka for Data Propagation

To facilitate the spread of data between various modules, Entur chose to employ Kafka. This high performance, distributed streaming platform creates a central “cluster” of events, which applications (in this case, the data and function modules) can listen to and interact with at will.

The idea is simple. Each of Entur’s modules has a set of events associated with it. Looking at the payment data module, events include payment creation, completion, and cancellation, among others. Whenever one of these events takes place, the relevant details are published to the Kafka cluster.

Meanwhile, all of the data and function modules are constantly “listening” to the event log for events that may concern them. For example, the order module might listen for payment-related events so that when a payment is completed, the order status can be updated.

“When the order module has read the “Payment Completed” event, it can react to that by changing the status of the order. But a change in the order status is information that the ticket module might be interested in knowing. So the ticket module is, in this example, listening for an “Order Confirmed” event, because it would like to react to that event by creating and enabling the ticket for this particular order.”

Valuable Kafka Features

Aside from enabling smooth communication between separate modules, Kafka has at least two other features which make it an attractive starting point for your backend: seamless expansion, and a built-in query language.

Seamless Expansion

A huge benefit of building your backend around a streaming platform like Kafka is that you can seamlessly add new functionality. Henrik gives the example of an accounting department looking to start tracking sales: they can simply create their own module that listens to the Kafka stream, without having to make any changes to the existing modules.
What’s specific to Kafka, however, is its ability to keep track of past events. This means that new modules can read any (and all) of the events that have transpired since the inception of the cluster. For some purposes this won’t matter, but for others, it’s absolutely essential.

Query Language

Another awesome feature of Kafka is its dedicated query language, KSQL. This allows you to write simple workflows incredibly easily, which can be used to process incoming events in real-time. Henrik shares the example of a query that identifies suspicious transactions (four or more tickets purchased in five minutes) and adds them to a dedicated stream:

CREATE STREAM suspicious_sales AS
SELECT customer_id, count(*)
FROM order_confirmed
WINDOW TUMBLING (SIZE 10 MINUTE)
GROUP BY customer_id
HAVING count(*) > 4

Bonus: A Customer-Facing Kafka Stream

As demonstrated, Apache Kafka can be an excellent medium of communication for your service ecosystem — but that’s not all it can do! In his presentation, Henrik touches on how you can expose a customer-facing Kafka stream to provide an asynchronous API of sorts.

In the case of Entur, this means copying over a handful of events to an external Kafka stream (if they meet certain conditions) for partner/customer use. For example, all “Order Confirmed” events are shared to the external stream so that the public transport operator in question can immediately process the reservation.

Exposing this event stream opens up plenty of possibilities for partner customization. As Henrik explains, allowing partners to send their own branded transactional emails — which might otherwise be a complicated task — is as simple as exposing these events to the external stream, and choosing not to react to them internally.

Conclusion

When it comes to platform architectures, there’s plenty more to choose from than just microservices. Henrik’s talk shares an overhead view of how Kafka can facilitate data propagation between a number of individual data and function modules, offering a scalable platform that’s easy to modify and even easier to build on. Together with its powerful features, like the ability to read past events and dedicated query language, this makes Kafka an attractive tool in the service architect’s arsenal.