Like many industries, the API space has its own jargon. Many terms are used so often that it’s a common assumption that the related party understands what is being talked about. But for for newcomers, these subtle definitions may not be as apparent.
Take the difference between stateless and stateful; an invaluable distinction within the development of APIs and the services that use those systems. Accordingly, in this piece, we’re going to briefly discuss what these terms actually mean. We’ll take a look at what makes statefulness and statelessness so different from one another, and what this actually means in terms of the API.
To understand statelessness, one must understand statefulness. When we talk about computer systems, a “state” is simply the condition or quality of an entity at an instant in time, and to be stateful is to rely on these moments in time and to change the output given the determined inputs and state.
If that’s unclear, don’t worry — it’s a hard concept to grasp, and doubly so with APIs. We can break this down even further — consider binary, a language of 1’s and 0’s. What this functionally represents is either “on” or “off” — a binary system cannot be both 1 and 0, and are so mutually exclusive.
Now, consider a theoretical situation in which you are given a piece of paper with these simple instructions — “if the number is 0, say no, if 1, say yes” — and you were placed into a room with a binary display which changed between the number 0 and 1 every five seconds.
This is a stateful system. Your answer will depend entirely on whether or not that clock says “0” or “1” — you cannot answer independently of the state of the grand machine. This is statefulness.
Stateful Web Services
With this in mind, what does a stateful web service looks like? Let’s say you log into a resource, and in doing so, you pass your password and username. If the web server stores this data in a backend manner and uses it to identify you as a constantly connected client, the service is stateful. Do keep in mind that this is a very specific example that exists in other forms, so what seems stateful may not necessarily be stateful — more on this later.
As you use the web service, everything you do is referenced back to this stored state. When you request an account summary, the web service asks two things:
- Who is making this request?
- Using the ID stored for who is making this request, what should their page look like?
In a stateful web service like this, the response formed from a simple GET request is entirely dependent on the state registered by the server. Without knowledge of that state, your request cannot be returned properly.
Another great example is FTP. When a user logs in to a traditional FTP server, they are engaging in an active connection with the server. Each change to the state of the user, such as active directory, is stored on the server as a client state. Each change made to the server is registered as a change of state, and when the user disconnects, their state is further changed to disconnected.
So far so good, right? Well, not quite. Stateful programming is fine in some very limited applications, but it has a lot of issues. First and foremost, when you have to reference a state, you’re opening yourself up to a lot of incomplete sessions and transactions. Let’s say you make a call to present a piece of data. In a stateful system where the state is determined by the client, how long is the system supposed to leave this connection open? How do we verify if the client has crashed or disconnected? How do we track the actions of the user while maintaining the ability to document changes and roll back when necessary?
While there are certainly workarounds for all of these questions, more often than not, statefulness is only really useful when the functions themselves depend on the statefulness quality. Most consumers are able to respond to the server in intelligent, dynamic ways, and because of this, maintaining server state independent of the consumer as if the consumer was simply a “dumb” client is wasteful and unnecessary.
The answer to these issues is statelessness. Stateless is the polar opposite of stateful, in which any given response from the server is independent of any sort of state.
Let’s go back to that binary room theoretical. You are given the same binary clock, only this time, the paper simply has a name — “Jack” — and the instructions are to respond when someone says the password “fish”. You sit watching the clock slowly change, and each time someone says the special password, you say the name “Jack”.
This is statelessness — there’s no need to even reference the clock, because the information is stored locally in such a way that the requests are self contained — it’s dependent only on the data you hold. The speaker could easily say the secret word, tell you to change the name, then walk away. He can then come back an hour later, say the secret password, and get the new name — everything is contained within the request, and handled in two distinct phases, with a “request” and a “response”.
This is a stateful system. Your response is independent of the “0” or “1”, and each request is self contained.
Stateless Web Services
Statelessness is a fundamental aspect of the modern internet — so much so that every single day, you use a variety of stateless services and applications. When you read the news, you are using HTTP to connect in a stateless manner, utilizing messages that can be parsed and worked with in isolation of each other and your state.
If you have Twitter on your phone, you are constantly utilizing a stateless service. When the service requests a list of recent direct messages using the Twitter REST API, it issues the following request:
The response that you will get is entirely independent of any server state storage, and everything is stored on the client’s side in the form of a cache.
Let’s take a look at another example. In the example below, we are invoking a POST command, creating a record on HypotheticalService:
Content-Type: text/xml; charset=utf-8
<!--?xml version="1.0" encoding="utf-8"?-->
<content>This is an example</content>
In this example, we are creating an entry, but this entry does not depend on any matter of state. Do keep in mind that this is a simple use case, as it does not pass any authorization/authentication data, and the POST issuance itself contains only very basic data.
Even with all of this in mind, you can plainly see that doing a POST issuance in a stateless manner means that you do not have to wait for server synchronization to ensure the process has been properly completed, as you would with FTP or other stateful services. You receive a confirmation, but this confirmation is simply an affirmative response, rather than a mutual shared state.
As a quick note, it must be said that REST is specifically designed to be functionally stateless. The entire concept of Representational State Transfer (from which REST gets its name) hinges on the idea of passing all data to handle the request in such a way as to pair the data within the request itself. Thus, REST should be considered stateless (and, in fact, that is one of the main considerations as to whether something is RESTful or not as per the original dissertation by Roy Fielding which detailed the concept).
Smoke and Mirrors
We need to be somewhat careful when we talk about web services as examples of stateful or stateless, though, because what seems to fall in one category may not actually be so. This is largely because stateless services have managed to mirror a lot of the behavior of stateful services without technically crossing the line.
Statelessness is, just like our example above, all about self-contained state and reference rather than depending on an external frame of reference. The difference between it and statefulness is really where the state is stored. When we browse the internet or access our mail, we are generating a state — and that state has to go somewhere.
When the state is stored by the server, it generates a session. This is stateful computing. When the state is stored by the client, it generates some kind of data that is to be used for various systems — while technically “stateful” in that it references a state, the state is stored by the client so we refer to it as stateless.
This seems confusing, but it’s actually the best way to work around the limitations of statelessness. In a purely stateless system, we’re essentially interacting with a limited system — when we would order an online good, that’d be it for us, it wouldn’t store our address, our payment methods, even a record of our order, it would simply process our payment and, as far as the server was concerned, we’d cease to be.
That’s obviously not the best case scenario, and so, we made some concessions. In the client cookie, we store some basic authentication data. On the server side, we create some temporary client data or store on a database, and reference it to an external piece of data. When we return to make another payment, it’s our cookie that establishes the state, not the non-existent session.
What’s So Bad About Sessions?
In terms of web services, the commonly accepted paradigm is to avoid sessions at all costs. While this certainly doesn’t apply to every single use case, using sessions as a method for communicating state is generally something you want to avoid.
To start with, sessions add a large amount of complexity with very little added value. Sessions make it harder to replicate and fix bugs. Sessions can’t really be “bookmarked”, as everything is stored on the server side. All of these are significant issues, but they pale in comparison to the simple fact that sessions are not scalable.
Lets say you are a professional chess player, and you’d like to play multiple people at the same time. If you’d try to remember every game and your strategy on it, you’ll hit your capacity rather quick. Now imagine you were not remembering anything of those games, and you were just rereading the chessboard on every move. You could literally play 1.000.000 people at the same time, and it wouldn’t make any difference to you.
Now draw an analogy to your server. If your application gets a lot of load, you might have to distribute it to different servers. If you were using sessions, you’d suddenly had to replicate all sessions to all servers. The system would become even more complex and error prone.
Simply said, sessions don’t do what they’re designed to do without introducing a ton of overhead, and their functionality can easily be replicated using cookies, client caching, and other such solutions. There are, of course, situations in which sessions make sense, especially when servers wanted to store state without having even the slight potential of modified client runtime data.
For instance, FTP is stateful for a very good reason, as it replicates changes on both the client side and server side while delivering increased security due to the nature of the requested access. This is doable because a single person needs to access a single server for a single stated data transferral, even if the transferral involves multiple folders, files, and directories.
That’s not the case with something like a shared Dropbox, in which stateful sessions would cause the added complexity without adding value. In this case, stateless would be a much better choice.
We hope this has cleared up the difference between stateful and stateless architectures when it comes to APIs. Understanding this simple concept is the foundation upon which most architectures and designs are based upon — such lofty concepts such as RESTful design are based around these ideas, so having a sound conceptual framing is extremely important.