Scalable APIs are Built From Consistency

Scalability is a hard sought after quality in making architectural decisions for software products these days. Yet, the most robust way to increase the likelihood that your product continues to scale to end user needs is through APIs. That’s probably old news for most. However, scalable APIs, especially from an enterprise point of view, are built from consistency. That’s probably, classic news.

Those are generic terms that might conjure different mental images so let’s break them down a bit. Enterprise in this sense doesn’t mean stodgy, old, and business conservative. Rather, it’s an acknowledgment that when organizations of the same goals reach a certain number of people, the complexity of transferring data between systems and humans is inherently more complex. For an example of this, take Netflix’s Falcor, an $80B enterprise’s strategy to increase the ease at which data is communicated.

In this context, consistency doesn’t imply that every system shares the same information, but rather they share the same method of sharing information. When data is sent and received in a consistent state, the speed at which you are able to integrate, aggregate, and consume data from various sources increases dramatically.

There are various areas that create the anatomy of a consistent API that is built for scale. In this article, we’ll review some essentials for designing consistent, scalable APIs:

Common Data Structures
Required Data Fields
Governance & Authentication
Eventing & Polling
Pagination and Search

We’ll see how each of these facets flow into the next, gradually building on one another to create well-formed enterprise APIs.

Common Data Structures

When integrating data from different applications, the biggest challenge is standardizing on a consumable format for your organization to use. The odds that your data structure is the same as a different company is close to the lottery. The odds that your data structure is the same as every application you are connecting to is close to the powerball. In a study by Cloud Elements, they found the average number of applications touching business process in an organization is 121*.

Creating common data structures for how you leverage your internal data, whether you decide to expose it publically or not, creates consistency and familiarity for those that maintain the API. Common data structures build the familiarity necessary to expand and scale in a responsible way.

More Enterprise Please

This becomes especially fun when attempting to create commonalities between applications that are as complex as the organizations they live in. Many ERPs, from Oracle to NetSuite and SAP, fall into this category. Converting between SOAP/XML into REST/JSON is a favorite. Before you can even get to a consistent data a consistent structure is needed.

Only 60% of applications are built on REST APIs*, so connecting to multiple applications becomes an unknown. Without using a common data model you can expect an average of 60 development days to integrate a REST API and closer to 90 days for a SOAP API. In the ERP space it quickly climbs to 120+ days by the time you add in sandboxes, testing and expert support.

Required Data Fields

“Is a phone number required for this invoice? I think it is in our CRM to create the opportunity, so probably need that right? How about this custom field for contract notes? Might be important, but not sure if accounting needs that. Looks like some are filled out by the reps and other are blank on some of the renewals. Can you slack Jessica and see if they need it. Custom fields will take a bit longer.”

Part of a consistent data structure is knowing what’s needed, and how secure it needs to be. For example, POST/contacts needs to have these fields for a 201. DELETE/contact needs these additional fields for a 200 and 404 test. Knowing that data will interact consistently lends even more scalability. The difference between aggregation and aggravating can be a small difference.

May I Have More Enterprise?

Going beyond required, consider what other metadata the integration can ingest. Larger organizations are built to create even larger amounts of data at different levels by nature of the different types of internal and external customers.

You may find yourself asking questions like: Can we add primary keys and maybe some compound keys to act as a unique identifier? We need to query fields, should we have mandatory query fields? You may want to clarify the chain of information and add entity relationships. For example, allowing developers to see parent-child relationships instead of an orphaned payload may fit your integration requirements.

Governance & Authentication

Authentication is a bit more straight forward so we can start there. OAuth 2.0 is an obvious favorite, being a preferred consumable authorization method for 51% of API providers. Yet, 31% of applications are just key & secret, and 15% only use basic credentials.*

Governance for the Good of Scale

Much like the term “enterprise,” governance can be a broad term wrapped in a bureaucratic bow. To unwind it we will use governance to speak to: Permissions and Responsibility. Authentication helps with external security but there also needs to be thought given to internal security around usage and permissions. Giving the right people the right data can help keep the APIs consistent from a usage standpoint. Scaling in this sense will be more of hitting 100% of your audience instead of growing usage by 100%.

Responsibility is an overarching aspect that gives credibility to the organization. A key ingredient in order to scale. Would you let Yahoo endpoints interact with your data after knowing that every account was hacked? Larger organizations are much more valuable targets so the ability to effectively govern data flow becomes vital to scale.

Eventing and Polling

Modern communication is driven by millions of small pushes. Notifications for: texts, calls, emails, news, apps, slacks, alerts, downtimes, and tags to name the more informative. So it’s somewhat baffling to find that only 40% of applications don’t support an event framework*. Sure the data is there, held like mail at a post office, but consumers would rather have it delivered straight to the kitchen table..

This is not to say that all data is created equal and therefore updated as frequently; it should be noted that additional resources must be added if you intend to constantly poll for information.

Poll the Enterprise!

To avoid setting up individual scripts for every application and field within that application, organizations will employ polling engines. Effectively transforming a polling experience into a webhook-like experience. As much as we would love to wait for the world to change into modern practices, large organizational investments can’t wait. This consistency in how information is pushed and notified allows for better, reactive programming. This can be especially helpful when transitioning to a microservices architecture that employees serverless capabilities.

For another view on polling, read: Stop Polling and Consider Using REST Hooks

Pagination & Search

If you thought your date last night was verbose, many applications return pages and pages of data in the payload. This is often returned in various formats that not only make consumption complex, but also hinder the ability to search through that information. To add to the fun, the number of records available in an individual page is dictated by the endpoint, so even if the data is similar the paging can vary based on the metadata available.

Search capabilities can vary from endpoint to endpoint with some very well written discovery APIs to pull in custom fields. A majority of the time though, it will take several queries to get the correct path to the data you need. And that’s likely just for one application.

Salesforce.com, for example, provides SOQL for this purpose. Some services, Instagram for example, have a much narrower set of fields that users can filter by. Other services, like Pipedrive, handle search functions by providing JSON-defined filters that can be managed via APIs and attached to search calls to filter the returned data.

Paging the Enterprise

There are 3 primary pagination types out in the wild of: limit/offset, page/page size and cursor based.

limit/offset: The limit/offset approach uses a limit (the max number of results returned) and an offset (the position of the first result) to determine the result set.
page/page size: The page/page size approach is just like hitting next page on a Google search. The page size indicates how many results should be returned, while page tells which set of results to return. A little basic arithmetic can convert a limit/offset scheme to page/page size and vice versa.
Cursor-based: Cursor-based paging returns a token value or URL that is provided with the following call to retrieve the next set of results. The cursor token is usually combined with a value to determine the maximum results to return.

To make pagination consistent in the enterprise, all three pagination types can be emulated with a cursor-based scheme. This is done by returning a token that is an encoded version of the limit and offset of the next set of results. This token can then be returned with the next call, so on and so forth until your big data heart is full.

Final Thoughts

This post focuses more on how to make consistent API structures, but why is consistency so crucial? The underlying assumption is so that whoever is tasked with organizing a complex web of data prevents their head from exploding and running away to short sell the stock. By introducing more consistency into mostly all the data coming from different sources, you can scale that aggregation, that usage, that firehose of information that people use (or not) to make decisions. And sometimes the “Enterprise” label just means those decisions impact thousands of other people.

*Each of the metrics cited above come from a sneak peek of The Cloud Elements 2018 State of API Integration Report that will be made public soon.