Designing Evolvable APIs for the Web: Interaction

This is the second article in a series that aims to put the focus of Web APIs back on the Web, on its underlying architecture, and on what it means to build evolvable APIs for it. Within the previous article we introduced the architecture of the Web and it’s first pillar — Identification. Now, in this article, we describe a crucial pillar upon which the entire exchange of information over the Web rests: Interaction.

Other articles in the Designing Evolvable APIs for the Web series:

Getting the Most out of HTTP

HTTP (Hypertext Transfer Protocol) is an application level client-server protocol, where clients and servers interact by exchanging messages. Clients send request messages to servers, identifying a target resource and a method defining the operation to be performed over that resource. Servers reply with response messages, containing operation status (success or other), metadata, and representations. Both messages can have metadata about the messages or about the carried representations.

One should remember that HTTP isn’t just a representation transport protocol. It is a feature rich protocol for distributed systems, providing many application level features that shouldn’t be ignored when designing APIs for the Web. Some of those features are:

A common set of methods with resource independent semantics.
A rich set of status codes that can be used to inform clients of the request outcome.
Additional protocol features to handle application level concerns such as caching, access control, optimistic concurrency, and fault tolerance.

HTTP Request Methods

A request message is characterized by two very important parts. The target, composed by a URI, identifies the resource where an operation should be performed. The request method defines this operation and is the primary source of request semantics.

A defining characteristic of HTTP is that the set of available methods is always the same, independent of the target resource. This is in sharp contrast with other architectural styles, such as RPC (Remote Procedure Calls) or distributed objects, where each request target has its own specific operations.

The fixed set of available methods is (descriptions adapted from RFC 7231)

Method	Description
GET	Transfer a current representation of the target resource.
HEAD	Same as GET, but only transfer the status line and header section.
POST	Perform resource-specific processing on the request payload.
PUT	Replace all current representations of the target resource with the request payload.
DELETE	Remove all current representations of the target resource.
CONNECT	Establish a tunnel to the server identified by the target resource.
OPTIONS	Describe the communication options for the target resource.
TRACE	Perform a message loop-back test along the path to the target resource.

This uniform method set does not mean that all resources must support all these methods. For instance, some resources may not be deletable, in which case the DELETE method never succeeds when applied over them. The method set is fixed and so are its semantics. As an example, the DELETE semantics are the same independent of the resource where it is applied.

At first sight it might seem that this uniform method set is only useful for simple CRUD systems (Create-Read-Update-Delete), where the POST, GET, PUT and DELETE methods are used to manipulate entries in a database. However, this view ignores an important characteristic of the Web and of HTTP: resources aren’t confined to files or entries in a database.

Resources can represent processes and not just data. For instance, a POST to a resource can be used to start a long running operation, whose state can be continuously retrieved by GETting the representation of an associated monitorization resource. This carries into an IoT (Internet of Things) context, in which HTTP requests to resources like sensors or actuators can be used to trigger behavior in the physical world.

For more on IoT design read: Data Sharing in the IoT

HTTP Status Codes

The HTTP protocol provides a rich set of status codes to represent different request processing outcomes. In the words of L. Richardson, M. Amundsen and S. Ruby in the book RESTful Web APIs, HTTP status codes “represent a basic set of semantics, defined in the most fundamental of all API standards. There’s no excuse for ignoring this gift.”

A Web API should take advantage of this richness and correctly use the appropriate status codes for the different request processing results. For instance, a Web API shouldn’t return a 200 (OK) status, if there was an error processing.

In response to a GET request, a server should only use the 200 status if the message body is indeed the representation of the request target resource. So, unless “Oops, something went wrong” is indeed the representation of the target resource, don’t use a 200 on that response.

Note also that they are status codes and not only error codes, meaning that these returns can be used to represent multiple processing outcomes. Status codes are three-digit integers and are divided into 5 classes.

HTTP 200–299: Successful

The status codes between 200 and 299 inform that the request was successfully executed. However, success comes in multiple flavors. Some examples include:

A 200 (OK) on a GET request informs the client that the response body does indeed contain the representation of the target resource.
A 201 (Created) indicates the the request was successful and that a new resource was created.
A 202 (Accepted) indicates the request was only successfully accepted and that its processing will continue asynchronously.

HTTP 400–499: Unsuccessful

The status codes between 400 and 499 inform the client that the request was not successfully processed due to a client error. Similarly to the 200s, multiple variations exist, like:

A 400 (Bad request) is a general code for a bad request issued by the client.
A 401 (Unauthorized) indicates invalid or absent security information, such as authentication credentials.
A 403 (Forbidden) informs the client that it isn’t authorized to perform the request method over the target resource.
A 404 (Not found) indicates that the target resource does not exist.

HTTP 500–599: Server Error

The status codes between 500 and 599 inform the client that the request was not successfully processed due to the server’s fault. Here are some example scenarios:

Server infrastructure error, such as the inability for the HTTP server to connect to a back-end database server or service.
Programming error that results in an invalid operation, such as an indexation out of bounds.
Server unavailability due to scheduled maintenance.
When the server is acting as a reverse proxy, such as an API Gateway or HTTP-level load balancer, a 502 or a 504 can be used to inform the client that an error occurred on the request to the back-end server.

Here are some examples of errors that aren’t really the server’s fault and therefore shouldn’t use a 5xx code:

Invalid request information, such as a query string parameter that should be convertible to an integer and isn’t.
Server side state not compatible with the requested operation, such as a customer not having a valid configured payment method on a purchase request.
Unsupported requested HTTP method.

Namely, a 4xx status code is the most appropriate to represent the processing result on these scenarios.

HTTP 1xx & 3xx

The two remaining status code classes are informational (1xx) and redirection (3xx). The 1xx status class is used by the servers to convey intermediate processing information before the result message is produced. The most common example is the 100 (Continue), which indicates the server has accepted the initial part of the request and that processing will continue before a response message is produced.

The second status class (3xx) indicates that further requests must be performed by the client to complete the request action. A typical example is the 302 (Found) status that indicates to the server that the target resource is now identified by a different URI.

Also check out: Architecting an API Backend

3 Benefits of Proper HTTP Status Code Usage

What are the advantages of using the proper HTTP status code? Why not just return a 200 (OK) for everything, and embed the result status on an application specific manner? Here are some reasons that may convince you to properly adopt HTTP status codes.

Development simplicity: Reduce the effort required to consume your Web API by reusing a set of status that should be familiar and well-known to HTTP client developers and libraries.
Operational visibility: A single 500 error on the server log should be a matter of concern. However, multiple 4xx errors may occur routinely (e.g. expired authentication credentials). Having the ability to differentiate these two error classes simply by looking into the status code integer, without having to analyze the response body, is a very valuable feature.
Intermediate visibility: Typically an intermediate will not interpret the return payload, because it does not have domain knowledge, and will base its behavior on the response status. So for intermediates to operate correctly, the proper status codes should be used. For instance, a client side cache will happily store a 200 response even if the payload is a Java stack trace.

Note: Status codes are uniform across resources. Documenting status codes per resource type should not be required.

Caching

Caching is another area where the HTTP protocol shows its application-level characteristics.
To illustrate that, consider a Web API that provides slowly-changing information, such as catalog data. A way to improve the behavior of such an API is by using response caching:

On the server side, this means that requests can be fulfilled by using a prior stored response.
On the client side, this means requests will be handled locally, using previously received responses, without needing to access the network.

There are multiple ways to define the caching policy that a client could use. One option is via out-of-band, in which the Web API documentation states how long the content is valid and can be reused. A more dynamic solution could be based on application specific caching information added to the response content, such as a “validUntil” property on the catalog entry JSON representations.

However, a better alternative to these two ad hoc and application specific protocols is to use the caching mechanisms already defined by the HTTP protocol. There are multiple advantages to this:

Richer set of well documented caching semantics, including aspects such as content age and staleness.
Set of request and response headers that allow both clients and servers to express caching information in a standard way.
Use of off-the-shelf caching components, such as standalone HTTP caches (e.g. Varnish) or software libraries, which already understand these headers and their semantics.

The Cache-Control header can be used on a response to represent its cacheability characteristics — i.e. if it can be cached or not and for how long.

HTTP/1.1 200 OK
Cache-Control: max-age=3600, public

In the previous response, the server states to the client, and any intermediaries in the request path, that the message is valid for use for the next 3600 seconds. It also states that the response can be stored on a public shared cache. The Cache-Control header can also be used on request messages to convey client freshness requirements such as the maximum cached content age or if the client is willing to accept stale content.

GET /catalogue/item HTTP/1.1
Host: api.example.com
Cache-Control: max-age=600

In the previous example, the client uses the Cache-Control to state that it is willing to accept cached responses whose age is not greater than 10 minutes. As Web API developers, we should resist the urge to add application specific caching information to our payloads, and instead try to reuse what the HTTP protocol already defines.

Caching is just one of the many application level features provided by HTTP. Other examples are optimistic concurrency management, using conditional requests, fault tolerance and authentication.

The Uniform Interface and Intermediaries

HTTP defines a uniform interface, where the applicable request methods, response status codes and metadata are independent of the target resource type. This uniformness facilitates the use of intermediate components, such as caching proxies, which can act on the requests and responses without knowing the concrete resource semantics.

As another example, a caching server can autonomously perform a GET request because this method is safe: its execution does not change the relevant origin server visible state. Furthermore, any intermediary is allowed to retry a PUT request because this method is idempotent: the effect on the visible server state of multiple identical requests is the same as performing only one request. All these decisions can be made without any knowledge of the resource semantics. This resource independence increases the usefulness of domain independent intermediaries, that stand between clients and origin-servers.

HTTP intermediaries can be useful even in an HTTPS world, where network intermediaries don’t have visibility over the exchanged HTTP messages. On the client side, application independent proxies can be run in process or in the same machine to provide features such as caching, automatic retries, or pre-fetching. On the server side, the TLS tunnel can end at general purposes intermediaries (reverse proxies) that provide features such as load-balancing, caching and security. The growing field of API Gateways is an example of the value added by these HTTP intermediaries in the specific context of Web APIs.

On Deck: Part Three – Formats

This ends our second article in this series, where we aim to put the focus of Web APIs back on the Web, on its underlying architecture, and on what it means to build evolvable APIs for it. Our next and final article in this series will explore the third pillar — Formats — to see what format flexibility means for Web APIs.