5 Major Modern API Data Breaches (And What We Can Learn from Them)

The world is becoming ever-more connected by the moment. Twenty years ago, the idea of having an internet-connected toothbrush was about as alien as the idea of having the internet was a hundred years ago. Development is moving swiftly, and new inventions are continually changing the face of our interactions in the digital age.

With that frame of reference, it’s shocking to see some developers not taking API security seriously. API data breaches often occur because of third party actors. Yet, in many cases, simple failures to treat API security with respect led to some significant data breaches that affected many millions of users.

Today, we’re going to look at five major modern API data breaches, and see what lessons we can distill from them.

1 – Venmo

Venmo’s API allowed one to scrape millions of transactions that users didn’t realize were public.

To say Venmo had a data breach is somewhat misleading, as Venmo is “social” by design. The payment platform shares all transaction descriptions (which are typically fun and filled with emojis) by default, requiring users to opt-in to keep this data private. The “breach” in this case came from the fact that the API which served these descriptions was unsecured, allowing for the mass scraping of 200 million transactions.

This scrape included data such as the full names of senders, the memos attached to each transaction, the value of the transaction, and more. Notably, many of the descriptions for Venmo transactions were transparently describing private activities, suggesting that many users were not aware of Venmo’s public policy (or, at the very least, were not aware of the extent to which this information might be shared). While Venmo (and many of its defenders) responded that the API functioned as was intended, this is not the best defense — poor security by design is still poor security.

Lesson Learned

The problem with this kind of data breach is not the data itself, but rather the data in aggregate. Providers tend to think that their limited data sets have limited risk of damaging users in significant ways. Because of this, they typically don’t treat the data sources as important as they are.

For instance, while a batch of transactions relating to concert tickets or shared dinners from a sender might seem innocuous, the data becomes more impactful when paired with the same sender’s transactions concerning medical choices or protected activities. Venmo has seen use in mainstream businesses, yes. Still, it has also seen use in activist circles — not only does this aggregated data expose the behaviors of those who make their purchases, but bad actors can also use them (alongside other public and non-public sources) to track users and habits.

All told, while this may seem a minor breach to some, it was seen by many of its users to be a fundamental breach of privacy, trust, and ethical security.

Also read: Your API is Vulnerable if These 4 Risks Aren’t Mitigated

2 – Facebook’s Breaches

Over the years, Facebook has had numerous API vulnerabilities exploited within various services.

Facebook is notorious for a variety of things, not the least of which is a spotty track record with security.

In September of 2018, hackers used a vulnerability in Facebook’s Developer API to expose millions of users. The data scraped was rather comprehensive, and in many cases, could circumvent purposeful obfuscation and privacy opt-ins that otherwise protected information. The issue at hand, which was left unpatched for 20 months, was due to a feature in the “View As” function in the Developer API that allowed developers to render pages as a user. Unfortunately, this function also delivered the authentication token for that user directly to the developer.

Facebook’s API woes did not end there. In December, a Facebook photo API exposure exposed private data in a breach affecting up to 6.8 million users and 1,500 apps. The vulnerability was due to a bug with Facebook login that allowed third-party developers to retain access to photos shared with the service, thus allowing apps to keep content well after users demanded this access ended and data deleted.

Another massive data breach could not be pinpointed to Facebook’s API per se – over 267 million Facebook IDs, phone numbers, and names were scraped into a publicly accessible database that was discovered by security researcher Bob Diachenko. The database most likely came from one of two sources – either Facebook profiles were illegally scraped, or the Facebook API had another hole in it that allowed for this sort of activity. While it’s not been confirmed either way, the suggestion by some of those affected that they had already locked down their privacy settings before the leak suggests this was not the result of compiling publicly accessible information, but instead, was the result of an API leak.

Lessons Learned

Facebook’s woes exemplify what happens when you try to create an agile monolith with the amount of data it has. When each bit of code is its own module with its own team, you gain a drastic amount of flexibility, scalability, and efficiency, but you also introduce the possibility of failed connections in terms of team communication and project management.

While it’s not been confirmed that these API breaches were the result of failed internal management, the fact that they’ve occurred over seemingly disparate service clusters and functions suggests that a combination of legacy code, over-simplification, and division of teams, and ineffective auditing were the primary cause of these breaches.

It should also be noted that Facebook has designed much of its dataset to be minable in its core form. Facebook applications are given high levels of access to profiles that “opt-in” to their use (even if the apps in question may fail to disclose their secret uses of this data). Cambridge Analytica was a high-profile example of this in 2016, where an app known as “This Is Your Digital Life” harvested data from millions of users – it was allowed to do this due to a fault in the Facebook data structure design that allowed the app to not only capture data from those who had opted into the collection but also the data from the friends of those who had opted in.

The lesson here is simple – when your code is massive and borderline unmanageable, you need to fundamentally rethink how you handle data, code structure, and even internal team alignment.

Audit your API security here: Introducing the API Security Maturity Model

3 – USPS’ Corporate Database Exposure

A major API breach involved insecurity with the USPS’s Informed Visibility System in 2018.

In 2018, an issue with the web API for USPS was published. The weakness, which allowed an attacker to query the USPS website and scrape a database of over 60 million corporate users, email addresses, account numbers, addresses, campaign data, and phone numbers, was originally reported a year previously but went unaddressed for quite a long time.

The underlying issue was, as it is in many breaches, due to an authentication issue — this issue allowed improper access to an API service called “Informed Visibility,” which was designed to deliver real-time tracking data for large-scale shipping operations. This tracking system was tied into the web API in such a way that a user could modify the wildcard search parameters to an arbitrary value without escalated privileges — in essence, while the process of finding a specific target was difficult, mass exposure to many, many users at once was very easily done. Since there wasn’t a robust anti-scraping system in place (for instance, rate-limiting), this mass exposure was only compounded by the automated and unfettered access available.

Lesson Learned

The USPS example is one that pops up again and again in different breaches — ease of access versus security. Too many providers are willing to give extreme powers to a specific service or function, and in failing to secure every permutation of that service or function’s interaction flow, this power can be leveraged to inflict harm.

API owners should develop each aspect of their API with the concept that someday, it’s very likely it will be abused – not just internally, but by external forces. If coding is conducted under this assumption, then proper steps can be taken to mitigate future damage, or at least put into place systems that will detect this ill use.

4 – Federation of Industries of the State of São Paulo

In 2018, Brazil’s FIESP database exposed millions of company records.

In 2018, Brazil’s largest professional industrial body, the FIESP, was accused of exposing millions of data points for over 130,000 companies across three databases. The records included names, identification and social security numbers, addresses, emails, and more, with the largest of the three sources containing 34.8 million entries.

While the breach was clearly the result of some sort of API or database failure, it’s unclear what the specific mechanics responsible for the exposure actually were – this is in large part due to the fact that the organization refused to accept the breach, and has stated that they believe that no “personal information from the database has been exposed.”

Lesson Learned

While this is a pretty cut and dry data exposure case, the real lesson learned here comes down to responsibility. In this case, the FIESP refused to claim responsibility for the data breach, going so far as to say that the breach never even happened. A few things come out of such a failure to disclose and take responsibility.

First and foremost, the investigation becomes extremely difficult. Security researchers can only look at so much data, and in many cases, they only know as much as the average layperson who can also access the data. Responsible disclosure was attempted in the FIESP case, but without the responsible party willing to coordinate efforts or accept the reality of a breach, such disclosure does not deliver any tools for mutual investigation or general aid. Researchers can only guess and estimate, essentially finding themselves locked out of the very ecosystem they’re trying to protect.

Second, failure to accept a breach means that those who are affected cannot be adequately warned and prepared. Due to this, poor security practices such as reused passwords or lacking two-factor authentication (which has its own issues notwithstanding) amplify the impact of the data breach through propagation. If users are warned, at the very least, they become more aware of the dangers lurking in their accounts – if they are not warned, everything continues the same as before.

5 – JustDial

Justdial’s 100 million accounts were exposed through publicly accessible API endpoints in 2019.

JustDial, the largest local search engine and social market in India, was accused in 2019 of leaking its entire database of customer information – data points for its over 100 million users included names, emails, mobile phone numbers, date of birth, gender, occupation, photos, and more. Essentially, any piece of data provided through the use of its website, its app, its customer support system, everything, was leaked.

How did this happen? It turns out that JustDial’s API endpoints were publicly accessible, existing with basically no authentication, authorization, or encryption. Access to the database was provided via unfettered APIs since at least 2015, allowing anyone to access the database and pull whatever data they would want to request.

During testing, it was also discovered that, unlike other high-profile data breaches, this data was not from a limited or development server – testing showed conclusively that the exposed data was being fetched in real-time, indicating that the access was to an actual live production server. Additional testing across the JustDial platform showed that the main API was not the only one exposed – additional APIs, such as an OPT request system, was found to be unsecured as well.

Lessons Learned

JustDial is a great example of something technical being overlooked for business logic. JustDial’s concept is strong, and their code does what it is supposed to do – despite this, their failure to secure their APIs exposed an incredible amount of information. The strange dichotomy between a professional company and a frankly amateur mistake (not securing your production server with authentication or authorization) is a common one.

This could, of course, be easily rectified – simple audits should be enough to look into possible data exposures and ensure that they are plugged before they become a problem. The greater concern here is whether or not this is a single oversight or the first exposed in a series of oversights. It could be argued that failure to do something simple such as securing a production server, could indicate other internal issues, both code and otherwise; this remains to be validated in the coming years.

Conclusion

The ultimate takeaway from all of this is simply that security should be of prime importance. Developing with security first in mind should mitigate many of these problems – in fact, failure to code with security first in mind is a major problem both in the API space as well as the general tech space, and is one that should be fixed urgently. With so much of the world connecting with APIs, the Internet of Things developing connections between things as diverse as a hairdryer and a pacemaker, and ever more data points being created as we move about the world, lacking security is not only unacceptable – in parts of the world such as the EU, it’s borderline criminal.

What do you think some high profile API breaches are in the last few years? Did we miss any major ones? What is your solution for the balancing act between the necessity of access and the insecurity of such access? Let us know in the comments below!