As the internet grows and more devices become interconnected, authorization is becoming more and more complex.
Early implementations of online services were easy to authorize against since they were tied to desktops, but modern authorization must consider varying environments, from mobile apps to IoT scenarios. Many of our new devices, such as smart TVs and voice-controlled speakers, don’t have traditional UIs like web browsers.
The growing prevalence of input-constrained devices leads to a quagmire concerning how providers should actually authorize these devices. What is a secure way to enter username and password in UI-incapable systems?
Today, we’re going to look at where this problem comes from, and what we can do to fix it. We’ll cover three unique OAuth flows to see how they stand up to solve the issue at hand: The Resource Owner Password Credentials Flow or ROPC, the OAuth Code Flow, and the OAuth Device Flow. We’ll see which is the safest way to incorporate identity into these new environments to ensure that even your living room devices maintain a high-level of security.
Identifying the Problem
To get to the root of the issue, let’s take a hypothetical case. Let’s assume that we want to stream music onto a TV from a data source. In this case, from a music streaming provider we’ll call Musicbox. Musicbox requires authorization for all streaming sessions, and in our hypothetical situation, this applies to our TV as well.
To get Musicbox to stream on our TV, we have a few options. We can use an app built for this specific purpose – this is commonplace on many modern smart televisions, and offers what is essentially a web browser overlay that handles authorization. To do this, we would use a type of flow called the Resource Owner Password Credentials (ROPC) Flow.
ROPC OAuth Flow
In this approach, the ROPC flow has the resource owner issue a
POST request with a form URL encoded body containing the user credentials to the OAuth server. This server is at the streaming service level, and utilizes this credential to grant access to the internal systems. This is done by generating an OAuth token, which is then handed back to the TV, and passed on to the streaming service for each request. This is a very traditional OAuth flow, but it has some significant problems in our use case.
First, there’s a major security concern. Most streaming providers are not going to want a TV utilizing proprietary codebases and systems to have the login credentials for all their users who choose to utilize the app. The issue of trust is especially relevant for client apps built on the streaming service API that are developed by third parties. Even if the application is an official one, this creates a major point of failure that dramatically expands the attack vector on the API itself, and all but invites sophisticated token attacks.
More seriously, however, there’s the fact that the ROPC flow simply was never designed for this application. While it seems a perfect fit, the ROPC flow is designed for legacy systems – utilizing it for smart TVs is an incorrect application, and actually works against OAuth. As Jacob states:
“The resource flow is not really meant for this… It’s actually there to just solve legacy problems. If we’re building a new system, we should never use it. It’s why it has built-in antipatterns.”
OAuth Code Flow
The whole purpose of OAuth is to not give passwords to 3rd parties, which this procedure would do. Consider we ignore ROPC and go for a more regular OAuth code flow, where the browser is used to send a
GET request and an authorization page is used as a prompt.
In this case, we still run into a single fact that we can’t avoid – all of this flow was meant for smart devices more capable than our constrained TV. The browser would be terribly slow, and entering a username and password with a remote control is inefficient. As such, even if these were acceptable solutions, they would result in bad user experiences.
A Question of User Experience
Even if we could ignore the technical issues inherent in this issue, the fact is that using the solutions often results in frustration because of the limitations on input and interaction. Utilizing a tiny remote control to enter in a complicated username/password pair and deal with any additional prompts that might pop up is ultimately quite cumbersome.
Things get worse if there’s no controller at all. Imagine that, instead of our smart TV, we’re utilizing a speaker such as an Alexa intelligent speaker. In this case, we no longer have a screen or a mode of easy input, and our issue becomes that much more complex.
What is our solution then? Luckily, there’s a new OAuth flow being standardized that could help here. Instead of the Resource Flow or the Code Flow, let’s turn our attention to the OAuth Device Flow.
OAuth Device Flow
OAuth recognized the issue inherent with authorization using constrained devices, and has drafted a new standard known as the OAuth Device Flow. The standard, currently under draft as “draft-ietf-oauth-device-flow–06”, is specifically designed for UI-incapable devices, such as browserless and input-constrained systems. It therefore should be a good method for devices like our smart TV or voice-controlled system.
The flow looks similar to the traditional OAuth solutions, but breaks away quite significantly at a very key stage.
In this solution, we have an API, the OAuth server, and the TV requesting the content. The TV starts by sending a Device Authorization Request, passing with it the scope and the client ID of the requesting device. From here, the OAuth server responds with a device code, a user code, and a verification URI. This also has an expiration timer and an interval that limits the exploitability possible in such a communication.
From here on out, the OAuth flow breaks into a new form. The user visits the verification URI, enters in the requested data (typically the user code), and authorizes the device function. During this time, the OAuth server is told to wait for the user, and to expect the user code. A countdown is initiated that will automatically revoke the validity of the code passed if time is surpassed, giving the data an expiration time for security’s sake.
During the time the user is entering the code, the device constantly polls the OAuth server on a set interval, and once the OAuth server receives the credentials, this polling is responded to with an authorization token. This token is then handed off from the device to the API to stream content.
How is this Different?
“[This discussion is] about how we can work with devices that are not as smart as we’re used to…It’s really about getting identity into a new box that we haven’t thought about before.”
This flow is different in some pretty significant ways. First and foremost, there is the obvious fact that authorization of this type occurs outside of the device band. The device itself, in this case the TV, is not the flowing credential system that accepts login information – the user uses an external system, such as their phone, laptop, computer, etc., to verify the request and gain access.
This also means that access is not restricted to just physically entering in the login information – logins can occur using NFC, bluetooth, and biometrics, and the code requested can be given using as many solutions. This is limited, of course, to nearby methods – OAuth does not allow this to expand outside of the near field, as allowing access from out of country or out of city could result in a wide attack vector.
A Solution of Gaps
Setting up authorization for devices with limited user interfaces presents an interesting UX challenge. The problem is not going away, and until every device utilizing a service is either limited by policy or by reality to support advanced interactions and software suites, the issue will only get worse.
Ultimately, using the OAuth Device Flow is the best current solution we have – the draft solution is effective, and offers a wide range of options for authorization. As we saw in Jacob’s talk, it’s a standardised approach on how to login to a non-UX-friendly device.
We live in a world of smart services, but the devices we use to interact with them are often quite dumb. What do you think? Is this the best solution for this issue? What other solutions can effectively bridge the divide between the smart service and the constrained client? Let us know below!