How to Implement Input Validation for APIs

Browsing the OWASP API Security Top 10 reveals that problems with validation are responsible for some of the most severe API security risks. Every time you consume data from a third-party API, allow users to input data, or allow users to make read/write requests via POST, you open your API to possible injection-style attacks or unauthorized use. That’s where input validation comes in.

Input validation is the process of testing any input or data against expected criteria. It’s also referred to as data validation. Input validation is one of the most surefire solutions to ensure your API isn’t vulnerable to risky data. It’s also a best practice to help ensure that only quality data enters your system in any capacity. Validating input guarantees that only properly formatted data enters your system, reaches your database, or triggers downstream components.

With that in mind, we’ve put together some tips on how to implement input validation for your APIs to make sure it’s as secure as possible.

1. Validate Content-Type Header and Data Format

The Content-Type header indicates what media type will be transmitted from an HTTP request or response. Verifying the Content-Type to ensure that the posted data matches the expected format is one of the most straightforward ways to protect an API from invalid or malicious data.

Imagine an API endpoint that expects a JSON object for a specific application. Verifying the header and that the request is in JSON format protects your API from data corruption or injection attacks, among other things.

2. Prevent Entity Expansion

Denial of service (DoS) attacks like A Billion Laughs or XML bomb attacks rely on vulnerable XML parsers. The attack involves sending an XML file with a large number of nested entities, resulting in the XML parser expanding each entity and consuming an excessive amount of resources. Simply limiting the number of entities that can be opened by the parser eliminates this type of DoS attack entirely.

3. Limit the Size of Posted Data

When users can interact with your API via input forms, file uploads, or POST requests, it’s important to restrict how much data can be sent. Prohibitively large files can consume excess resources, increase processing time, or even, in some circumstances, cause an API to crash.

4. Compare User Input Against Injection Flaws

Injection flaws are pervasive in SQL, LDAP, or NoSQL queries, OS commands, XML parsers, and ORM. They’re very easy to discover, as well, using tools like scanners or fuzzers. Injection flaws are when an attacker can run malicious code through an external application. This can result in both a backend being compromised as well as third-party clients connected to the affected application.

To help prevent the risk due to injection flaws, you should validate input from every potentially untrusted source. This includes everything from internet-facing web clients to backend feeds over extranets and data from suppliers, vendors, or regulators. Any of these could be compromised, putting your system at risk.

5. Validate All Levels

To help ensure API security, input validation should be performed on both the syntactical and semantic levels. Syntactic level validation should check to see that structured fields are formatted correctly, such as ensuring the right currency symbols are used, for example, or that the proper hyphenated structure is followed for things like phone or social security numbers. Semantic input validation checks to ensure data falls within a specific context, such as a particular date or price range.

6. Choose The Right Implementation For Your Programming Language

Most programming languages and frameworks have some method for input validation. Django supports Django Validators, for instance, while Apache Commons Validators are useful for Java-based applications. Java also allows for type conversion, using functions like Integer.parseInt(), as does Python, using int(). Input can be checked against JSON and XML schema, as well. Arrays can be used for small sets of parameters, like days of the week or time of day.

7. Use Allow Lists Instead Of Block Lists

Many developers use block lists to try and prevent common attack patterns, such as the apostrophe ‘ character, the 1=1 string, or <script> tags. This approach is easy for attackers to circumvent, though. Valid inputs can trigger the filter, too, as in the case of last names like O’Grady.

Using allow lists is a better approach for user inputs. They let you specify what is allowed rather than trying to prevent any potential risk. This is easy to implement for structured data, like addresses or social security numbers. Input fields with limited options, like a drop-down menu, are even easier to validate, as the selection needs to exactly match one of the available options.

8. Validate Free-Form UNICODE Input Properly

Free-form inputs, such as text, have a reputation of being the most difficult to validate, as there’s such a wide range of possible variables. To validate free-form UNICODE inputs, you should practice normalization to ensure no invalid characters are present. You should create an allow-list for acceptable character categories, such as Arabic or Cyrillic alphabets. You can also create allow-lists for individual characters, like the apostrophe in a name field.

9. Validate on the Server Side As Well As the Client Side

JavaScript input validation is easily bypassed by attackers, who can either disable JavaScript or use a Web Proxy. Validating input on the client side adds an invaluable extra layer of protection.

10. Manage User Uploads Properly

We already mentioned how attackers can use uploads for DoS attacks. Putting size limits on user uploads is just one approach for validating user uploads, though. If your API only accepts certain file types, you can use input validation to make sure the uploaded file matches the expected file type. Once the file’s uploaded, though, make sure to change the file’s name in your system. They should also be analyzed for malicious content, such as malware. The user shouldn’t be able to dictate where the file is stored, either.

Final Thoughts On Input Validation

User inputs are one of the most vulnerable areas in your API. If they’re not handled properly, it could be like outfitting a building with a state-of-the-art security system and then leaving the door open. Furthermore, several common attack patterns rely on vulnerabilities that involve user inputs. Even if this weren’t the case, input validation is a useful best practice to have in place, anyway, as it can help ensure that data is formatted correctly for your system or falls within a particular range, for example. Input validation is not hard to set up, either, once you learn how.