Authentication as a Hypermedia API

User Authentication — the process of answering the question of who someone is — has evolved greatly over the last few years. From the dawn of computer security until fairly recently, User Authentication has been predominantly represented by password protection. This involves asking for a username and then verifying that the user behind that name knows the password tied to it.

As computing power increased, and attacks became more sophisticated, it became apparent that passwords were not enough. Authentication now involves several methods combined into what’s called Multi-Factor Authentication. Depending on region, industry, and use-case, these factors tend to be different. We also rely on other parties to figure out who the user is; social logins is a good example.

This means that what used to be a simple query of validating form data is now is a lengthy user journey. It may involve visiting other sites (Facebook or Google), opening other apps (Duo, BankID), and clicking on all pictures that show traffic lights (no comment).

Authentication services are elaborate websites that guide the user through all these steps.

At Curity, we have been asked this question many times: why can’t we get an API to authenticate our users? The reasoning behind that question is simple: our customers don’t want to send their users to an authentication website from their app, but rather keep them in the app. The answer, however, is less simple; authenticating a user takes a few things for granted:

  • The user will follow a predefined step by step journey to complete the identification.
  • The browser will make sure this is the case.
  • No other party can intercept the messages and make use of the end result.

These expectations make for quite a different type of API. First of all, since authentication is a journey, an API client has to adhere to the protocol when communicating with the API. There must be a definite starting point and a definite end. This goes in stark contrast to methodologies such as GraphQL and lower-levels of REST (as defined by the Richardson Maturity Model). Both are designed to let the caller dive into the resources the way the caller needs.

Hypermedia Representation

Stepping back and looking at it from afar can give some important insights. This article started by describing how authentication is done on the web, with a stateful website that takes the user through the steps of authentication. In this context, the web browser is itself an API client, using REST and HTML to present the user interface. This is a Hypermedia API, which has been discussed on Nordic APIs many times before. (Especially interesting is “Using Hypermedia To Design Event-Driven UIs” based on Asbjørn Ulsberg’s presentation at Nordic APIs’ Platform Summit). Hypermedia lends itself very well to a process where the server and client need to walk in tandem through a flow. If we replace HTML with JSON and structure it more consistently, we have a perfect way of designing an authentication API.

Basic Steps

Let’s look at an example of how a simple username password screen can be represented using HTML:

<body class="login body-light">
   <main class="container clearfix" role="main">
       <div class="login-well form-light">
           <img class="login-logo" src="/assets/images/curity-logo.svg" alt="Logo" title="Logo" role="presentation">
           <form method="post" action="/dev/authn/authenticate/htmlSql">
               <div class="form-field">
                   <label for="userName" class="">Username</label>
                   <input type="text" name="userName" class="field-light" autocapitalize="none" autofocus>
               </div>
               <div class="form-field">
                   <label for="password" class="">Password</label>
                   <input type="password" name="password" class="field-light">
               </div>
               <button type="submit" class="button button-fullwidth  mt2">Login</button>
               <div class="login-actions">
                   <a href="/dev/authn/authenticate/htmlSql/forgot-password">Forgot your password?</a>
 
                   <a href="/dev/authn/register/create/htmlSql" class="mt2">
                       <i class="icon ion-android-person-add"></i>
                       Create account </a>
               </div>
           </form>      
       </div>
   </main>
</body>

Now, let’s take a look at the same screen using a JSON media type:

{
       "type": "authentication-step",
	"links": [
    	{
        	"href": "/dev/authn/authenticate/htmlSql/forgot-password",
        	"rel": "forgot-password",
        	"title": "Forgot your password?"
    	},
       {
        	"href": "/dev/authn/register/create/htmlSql",
        	"rel": "register-create",
        	"title": "Create account"
    	}
	],		
	"actions": [
    	{
        	"template": "form",
        	"kind": "login",
        	"title": "Login",
        	"model": {
            	  "href": "/dev/authn/authenticate/htmlSql",
            	  "method": "POST",
            	  "type": "application/x-www-form-urlencoded",
            	  "actionTitle": "Login",
            	  "fields": [
                	{
                    	"name": "userName",
                    	"type": "username",
                    	"label": "Username"
                	},
                	{
                    	"name": "password",
                    	"type": "password",
                    	"label": "Password"
                	}
            	]
        	}
    	}
	]
}

Note how the possible actions are present. Two fields should be rendered and can be submitted. But there are also links that can be rendered where the user can take another path through the authentication flow if needed.

With the Hypermedia approach, the server can control the flow, with a single entry and exit point, but present the user with choices throughout the flow.

The server guides the client with details about how each screen should be constructed, but there are only a certain number of options to implement. Unlike HTML, where the markup, styling, and JavaScript form a massive framework to implement, a Hypermedia API is a small, well-constrained DSL for rendering user interfaces.

 

It’s quite clear that authentication is a state machine where the server guides the user through several states to complete the authentication process. I recommend reading Designing a True REST State Machine for a more in-depth walkthrough of why Hypermedia lends itself so well for this purpose.

Automated Steps

Not every step in the authentication process requires user interaction. A good example is the process of polling. Polling occurs when we are waiting for the user to perform out-of-band actions, such as when a user selects an email link as an authentication method. The user enters their email address, and then the login process waits for the user to click that link. When the user clicks the link, the server state is updated, and the authentication can proceed. During this time, we need to instruct the client to periodically poll the server to see if the user has clicked the link. By introducing a representation type of “polling,” the client can parse the message and know that there are actions to use without the user interacting with it.

{
  "type": "polling-step",
  "properties": {
	"recipientOfCommunication": "johnxxxx@xxxxple.com",
	"status": "pending"
  },
  "actions": [
    {
  	"template": "form",
  	"kind": "poll",
  	"model": {
    	  "href": "/authenticate/email1/link-wait",
    	  "method": "GET"
       }
    },
    {
  	"template": "form",
  	"kind": "cancel",
  	"title": "Restart the process",
  	"model": {
    	  "href": "/authenticate/email1",
    	  "method": "GET",
    	  "type": "application/x-www-form-urlencoded",
    	  "actionTitle": "Restart the process"
  	}
     }
  ]
}

The status pending instructs the client to continue the operation. As long as the polling-step is presented, the client will continue to poll.

On a regular web page, this would be a JavaScript client that polls the server waiting for a certain response, but with Hypermedia, even that case is improved. The client always receives a well-defined response that follows the same schema whether the user is authenticating using email, SMS, or some other method.

Schemas

In contrast to a regular web-based authentication application, the Hypermedia API counterpart needs to be strictly schematic. The client needs to understand each representation (response) by reading its top-level type, and then knowing what possible items may exist in that representation. This is not the case on a regular web page, since the browser is only limited by HTML schema, not the subset necessary for authentication. It goes without saying that it would be an unreasonable requirement to have the mobile applications implement something like HTML to be able to perform an authentication flow. Instead, we make the schema as small as possible and fully predictable. This doesn’t mean that the application knows what comes next, but it knows what to do with its current information.

Example:

type: authentication-step
purpose: interactive user step during authentication
possible top-level elements: properties, actions, messages, links

Example:

type: polling-step
purpose: automatic step
possible top-level elements: properties, actions

This can be formalized using any suitable schema language. At Curity, we have defined our schema definitions in JSON Schema.

Content Negotiation

When building a dynamic API such as a Hypermedia Authentication API, it’s important to ensure that the client can control what it supports. HTTP has a great mechanism for this called content negotiation.

Content negotiation is when the client tells the server what type of content it can accept, and the server then tries to serve content in the appropriate format. For example, when requesting an image, the browser might send:

GET  https://example.com/assets/images/foo.jpg
Accept image/webp,*/*

This means that it prefers an image in webp format, but if that doesn’t exist, it accepts anything (*/*). This list can be long, outlining all the specific formats supported. Interestingly, the file extension (.jpg) doesn’t really apply here — it’s redundant, and there is a great presentation from Erik Michaels-Ober: Content Negotiation for REST APIs where he explains this in more detail. I also recommend reading Bill Doerrfeld’s introduction to content negotiation here on Nordic APIs.

When it comes to a Hypermedia API, or any REST API for that matter, the point of content negotiation is so that the client can determine if it supports the API or not.

When we discuss authentication, there are many ways a user can authenticate; some are pure forms-based authentication mechanisms where the user enters some credential or identifier, and the server validates it. But, there are more involved methods as well, which might require the client to perform additional steps, such as communicating with the operating system or browser to perform biometric authentication. This may be beyond what certain clients are capable of.

So, by using content negotiation, we can form a contract between the client and the server.

GET  https://idp.example.com/api/authenticate
Accept application/vnd.auth+json

This tells the server that the client wants to start an authentication flow using the Hypermedia API defined by the application/vnd.auth+json media type. It also tells the server that the client is capable of handling the content it receives.

Localization

Normally, the webserver is responsible for localizing the content served to the browser. This is also done using a form of content negotiation. The browser sends the Accept-Language header where it states the languages and locales it supports in a weighted list, and the server tries its best to fulfill this.

Accept-Language en-US,en;q=0.5

When building a Hypermedia API, there is no reason to use a different mechanism. The client can tell the server how messages in the content should be localized. However, more freedom can be given by also responding with message keys so that the client can localize the messages itself if it so desires.

Example when Accept-Language en-US;en;q=05.:

{
  "type": "authentication-step",
  "actions": [
	{
  	"template": "selector",
  	"kind": "authenticator-selector",
  	"title": "Select Authentication Method",


Example when Accept-Language sv-SE;sv;q=05.
{
  "type": "authentication-step",
  "actions": [
	{
  	"template": "selector",
  	"kind": "authenticator-selector",
  	"title": "Välj inloggningsmetod",

Security

An important part of building an authentication API is security. It’s crucial not to allow arbitrary applications on the internet access to your API since the potential for misuse is enormous. You don’t want attackers to collect credentials from one site by building a phishing site and directly trying those against your API to access your API.

When using a normal browser-based login (non-API), it’s standard to block framing of the authentication flow from anyone but whitelisted sites using content security policies (CSP). (Of course, CSP doesn’t prevent phishing but does prevent XSS). But when delivering an API instead, these browser mechanisms are of less use.

OAuth should, of course, be used to protect this API. However, plain old OAuth is arguably not enough in this scenario. User interaction flows in OAuth are designed with the browser in mind. There is an underlying assumption that things like CORS are enforced properly by the browser and that the browser can prove that the site is running on the domain it says it is. These assumptions break when using a pure API-driven approach.

I recommend reading our whitepaper on the security model of the Hypermedia Authentication API that we built at Curty. It takes you through the subject in depth.

Hypermedia Authentication API: The Future of Authentication

Authenticating with an API has long been the holy grail for mobile developers and recently also web developers. We need to listen and provide a secure protocol for these use-cases to prevent homegrown insecure solutions to emerge. At Curity, we work with standards such as OAuth and OpenID Connect, and when they fall short, we work with the standard organizations to develop new solutions. We believe that a standard Hypermedia Authentication API is the future of mobile and web authentication.