Best Practices for API Error Handling Kristopher Sandoval June 15, 2017 Error codes are almost the last thing that you want to see in an API response. Generally speaking, it means one of two things — something was so wrong in your request or your handling that the API simply couldn’t parse the passed data, or the API itself has so many problems that even the most well-formed request is going to fail. In either situation, traffic comes crashing to a halt, and the process of discovering the cause and solution begins. That being said, errors, whether in code form or simple error response, are a bit like getting a shot — unpleasant, but incredibly useful. Error codes are probably the most useful diagnostic element in the API space, and this is surprising, given how little attention we often pay them. Today, we’re going to talk about exactly why error responses and handling approaches are so useful and important. We’ll take a look at some common error code classifications the average user will encounter, as well as some examples of these codes in action. We’ll also talk a bit about what makes a “good” error code and what makes a “bad” error code, and how to ensure your error codes are up to snuff. The Value of Error Codes As we’ve already said, error codes are extremely useful. Error codes in the response stage of an API is the fundamental way in which a developer can communicate failure to a user. This stage, sitting after the initial request stage, is a direct communication between client and API. It’s often the first and most important step towards not only notifying the user of a failure, but jump-starting the error resolution process. A user doesn’t choose when an error is generated, or what error it gets — error situations often arise in instances that, to the user, are entirely random and suspect. Error responses thus are the only truly constant, consistent communication the user can depend on when an error has occurred. Error codes have an implied value in the way that they both clarify the situation, and communicate the intended functionality. Consider for instance an error code such as “401 Unauthorized – Please Pass Token.” In such a response, you understand the point of failure, specifically that the user is unauthorized. Additionally, however, you discover the intended functionality — the API requires a token, and that token must be passed as part of the request in order to gain authorization. With a simple error code and resolution explanation, you’ve not only communicated the cause of the error, but the intended functionality and method to fix said error — that’s incredibly valuable, especially for the amount of data that is actually returned. HTTP Status Codes Before we dive deeper into error codes and what makes a “good” code “good,” we need to address the HTTP Status Codes format. These codes are the most common status codes that the average user will encounter, not just in terms of APIs but in terms of general internet usage. While other protocols exist and have their own system of codes, the HTTP Status Codes dominate API communication, and vendor-specific codes are likely to be derived from these ranges. 1XX – Informational The 1XX range has two basic functionalities. The first is in the transfer of information pertaining to the protocol state of the connected devices — for instance, 101 Switching Protocols is a status code that notes the client has requested a protocol change from the server, and that the request has been approved. The 1XX range also clarifies the state of the initial request. 100 Continue, for instance, notes that a server has received request headers from a client, and that the server is awaiting the request body. 2XX – Success The 2XX range notes a range of successes in communication, and packages several responses into specific codes. The first three status codes perfectly demonstrate this range – 200 OK means that a GET or POST request was successful, 201 Created confirms that a request has been fulfilled and a new resource has been created for the client, and 202 Accepted means that the request has been accepted, and that processing has begun. 3XX – Redirection The 3XX range is all about the status of the resource or endpoint. When this type of status code is sent, it means that the server is still accepting communication, but that the point contacted is not the correct point of entry into the system. 301 Moved Permanently verifies that the client request did in fact reach the correct system, but that this request and all future requests should be handled by a different URI. This is very useful in subdomains and when moving a resource from one server to another. 4XX – Client Error The 4XX series of error codes is perhaps the most famous due to the iconic 404 Not Found status, which is a well-known marker for URLs and URIs that are incorrectly formed. Other more useful status codes for APIs exist in this range, however. 414 URI Too Long is a common status code, denoting that the data pushed through in a GET request is too long, and should be converted to a POST request. Another common code is 429 Too many Requests, which is used for rate limiting to note a client is attempting too many requests at once, and that their traffic is being rejected. 5XX – Server Error Finally the 5XX range is reserved for error codes specifically related to the server functionality. Whereas the 4XX range is the client’s responsibility (and thus denotes a client failure), the 5XX range specifically notes failures with the server. Error codes like 502 Bad Gateway, which notes the upstream server has failed and that the current server is a gateway, further expose server functionality as a means of showing where failure is occurring. There are less specific, general failures as well, such as 503 Service Unavailable. Making a Good Error Code With a solid understanding of HTTP Status Codes, we can start to dissect what actually makes for a good error code, and what makes for a bad error code. Quality error codes not only communicate what went wrong, but why it went wrong. Overly opaque error codes are extremely unhelpful. Let’s imagine that you are attempting to make a GET request to an API that handles digital music inventory. You’ve submitted your request to an API that you know routinely accepts your traffic, you’ve passed the correct authorization and authentication credentials, and to the best of your knowledge, the server is ready to respond. You send your data, and receive the following error code – 400 Bad Request. With no additional data, no further information, what does this actually tell you? It’s in the 4XX range, so you know the problem was on the client side, but it does absolutely nothing to communicate the issue itself other than “bad request.” This is when a “functional” error code is really not as functional as it should be. That same response could easily be made helpful and transparent with minimal effort — but what would this entail? Good error codes must pass three basic criteria in order to truly be helpful. A quality error code should include: An HTTP Status Code, so that the source and realm of the problem can be ascertained with ease; An Internal Reference ID for documentation-specific notation of errors. In some cases, this can replace the HTTP Status Code, as long as the internal reference sheet includes the HTTP Status Code scheme or similar reference material. Human readable messages that summarize the context, cause, and general solution for the error at hand. Include Standardized Status Codes First and foremost, every single error code generated should have an attached status code. While this often takes the form of an internal code, it typically takes the form of a standardized status code in the HTTP Status Code scheme. By noting the status using this very specific standardization, you not only communicate the type of error, you communicate where that error has occurred. There are certain implications for each of the HTTP Status Code ranges, and these implications give a sense as to the responsibility for said error. 5XX errors, for instance, note that the error is generated from the server, and that the fix is necessarily something to do with server-related data, addressing, etc. 4XX, conversely, notes the problem is with the client, and specifically the request from the client or the status of the client at that moment. By addressing error codes using a default status, you can give a very useful starting point for even basic users to troubleshoot their errors. Give Context First and foremost, an error code must give context. In our example above, 400 Bad Request means nothing. Instead, an error code should give further context. One such way of doing this is by passing this information in the body of the response in the language that is common to the request itself. For instance, our error code of 400 Bad Request can easily have a JSON body that gives far more useful information to the client: < HTTP/1.1 400 Bad Request < Date: Wed, 31 May 2017 19:01:41 GMT < Server: Apache/2.4.25 (Ubuntu) < Connection: close < Transfer-Encoding: chunked < Content-Type: application/json { "error" : "REQUEST - BR0x0071" } This error code is good, but not great. What does it get right? Well, it supplies context, for starters. Being able to see what the specific type of failure is shows where the user can begin the problem solving process. Additionally, and vitally, it also gives an internal reference ID in the form of “BR0x0071”, which can be internally referenced. While this is an ok error code, it only meets a fraction of our requirements. Human Readability Part of what makes error codes like the one we just created so powerful is that it’s usable by humans and machines alike. Unfortunately, this is a very easy thing to mess up — error codes are typically handled by machines, and so it’s very tempting to simply code for the application rather than for the user of said application. In our newly formed example, we have a very clear error to handle, but we have an additional issue. While we’ve added context, that context is in the form of machine-readable reference code to an internal error note. The user would have to find the documentation, look up the request code “BRx0071”, and then figure out what went wrong. We’ve fallen into that trap of coding for the machine. While our code is succinct and is serviceable insomuch as it provides context, it does so at the cost of human readability. With a few tweaks, we could improve the code, while still providing the reference number as we did before: < HTTP/1.1 400 Bad Request < Date: Wed, 31 May 2017 19:01:41 GMT < Server: Apache/2.4.25 (Ubuntu) < Connection: close < Transfer-Encoding: chunked < Content-Type: application/json { "error" : "Bad Request - Your request is missing parameters. Please verify and resubmit. Issue Reference Number BR0x0071" } With such a response, not only do you get the status code, you also get useful, actionable information. In this case, it tells the user the issue lies within their parameters. This at least offers a place to start troubleshooting, and is far more useful than saying “there’s a problem.” While you still want to provide the issue reference number, especially if you intend on integrating an issue tracker into your development cycle, the actual error itself is much more powerful, and much more effective than simply shooting a bunch of data at the application user and hoping something sticks. Good Error Examples Let’s take a look at some awesome error code implementations on some popular systems. Twitter Twitter API is a great example of descriptive error reporting codes. Let’s attempt to send a GET request to retrieve our mentions timeline. https://api.twitter.com/1.1/statuses/mentions_timeline.json When this is sent to the Twitter API, we receive the following response: HTTP/1.1 400 Bad Request x-connection-hash: xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx set-cookie: guest_id=xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx Date: Thu, 01 Jun 2017 03:04:23 GMT Content-Length: 62 x-response-time: 5 strict-transport-security: max-age=631138519 Connection: keep-alive Content-Type: application/json; charset=utf-8 Server: tsa_b {"errors":[{"code":215,"message":"Bad Authentication data."}]} Looking at this data, we can generally figure out what our issue is. First, we’re told that we’ve submitted a 400 Bad Request. This tells us that the problem is somewhere in our request. Our content length is acceptable, and our response time is well within normal limits. We can see, however, that we’re receiving a unique error code that Twitter itself has denoted — “215”, with an attached message that states “Bad Authentication data”. This error code supplies both valuable information as to why the error has occurred, and also how to rectify it. Our error lies in the fact that we did not pass any authentication data whatsoever — accordingly, error 215 is referenced, which tells us the fix is to supply said authentication data, but also gives us a number to reference on the internal documentation of the Twitter API. Facebook For another great example, let’s look at another social network. Facebook’s Graph API allows us to do quite a lot as long as we have the proper authentication data. For the purposes of this article, all personal information will be blanked out for security purposes. First, let’s pass a GET request to ascertain some details about a user: https://graph.facebook.com/v2.9/me?fields=id%2Cname%2Cpicture%2C%20picture&access_token=xxxxxxxxxxx This request should give us a few basic fields from this user’s Facebook profile, including id, name, and picture. Instead, we get this error response: { "error": { "message": "Syntax error \"Field picture specified more than once. This is only possible before version 2.1\" at character 23: id,name,picture,picture", "type": "OAuthException", "code": 2500, "fbtrace_id": "xxxxxxxxxxx" } } While Facebook doesn’t directly pass the HTTP error code in the body, it does pass a lot of useful information. The “message” area notes that we’ve run into a syntax error, specifically that we’ve defined the “picture” field more than once. Additionally, this field lets us know that this behavior was possible in previous versions, which is a very useful tool to communicate to users a change in behavior from previous versions to the current. Additionally, we are provided both a code and an fbtrace_id that can be used with support to identify specific issues in more complex cases. We’ve also received a specific error type, in this case OAuthException, which can be used to narrow down the specifics of the case even further. Bing To show a complex failure response code, let’s send a poorly formed (essentially null) GET request to Bing. HTTP/1.1 200 Date: Thu, 01 Jun 2017 03:40:55 GMT Content-Length: 276 Connection: keep-alive Content-Type: application/json; charset=utf-8 Server: Microsoft-IIS/10.0 X-Content-Type-Options: nosniff {"SearchResponse":{"Version":"2.2","Query":{"SearchTerms":"api error codes"},"Errors":[{"Code":1001,"Message":"Required parameter is missing.","Parameter":"SearchRequest.AppId","HelpUrl":"http\u003a\u002f\u002fmsdn.microsoft.com\u002fen-us\u002flibrary\u002fdd251042.aspx"}]}} This is a very good error code, perhaps the best of the three we’ve demonstrated here. While we have the error code in the form of “1001”, we also have a message stating that a parameter is missing. This parameter is then specifically noted as “SearchRequestAppId”, and a “HelpUrl” variable is passed as a link to a solution. In this case, we’ve got the best of all worlds. We have a machine readable error code, a human readable summary, and a direct explanation of both the error itself and where to find more information about the error. Spotify Though 5XX errors are somewhat rare in modern production environments, we do have some examples in bug reporting systems. One such report noted a 5XX error generated from the following call: GET /v1/me/player/currently-playing This resulted in the following error: [2017-05-02 13:32:14] production.ERROR: GuzzleHttp\Exception\ServerException: Server error: `GET https://api.spotify.com/v1/me/player/currently-playing` resulted in a `502 Bad Gateway` response: { "error" : { "status" : 502, "message" : "Bad gateway." } } So what makes this a good error code? While the 502 Bad gateway error seems opaque, the additional data in the header response is where our value is derived. By noting the error occurring in production and its addressed variable, we get a general sense that the issue at hand is one of the server gateway handling an exception rather than anything external to the server. In other words, we know the request entered the system, but was rejected for an internal issue at that specific exception address. When addressing this issue, it was noted that 502 errors are not abnormal, suggesting this to be an issue with server load or gateway timeouts. In such a case, it’s almost impossible to note granularly all of the possible variables — given that situation, this error code is about the best you could possibly ask for. Conclusion Much of an error code structure is stylistic. How you reference links, what error code you generate, and how to display those codes is subject to change from company to company. However, there has been headway to standardize these approaches; the IETF recently published RFC 7807, which outlines how to use a JSON object as way to model problem details within HTTP response. The idea is that by providing more specific machine-readable messages with an error response, the API clients can react to errors more effectively. In general, the goal with error responses is to create a source of information to not only inform the user of a problem, but of the solution to that problem as well. Simply stating a problem does nothing to fix it – and the same is true of API failures. The balance then is one of usability and brevity. Being able to fully describe the issue at hand and present a usable solution needs to be balanced with ease of readability and parsability. When that perfect balance is struck, something truly powerful happens. While it might seem strange to wax philosophically about error codes, they are a truly powerful tool that go largely underutilized. Incorporating them in your code is not just a good thing for business and API developer experience – they can lead to more positive end user experience, driving continuous adoption and usage. The latest API insights straight to your inbox