Milton Carranza shares an interesting personal project that relied heavily on APIs — smartly purchasing a house in a volatile Prague market
Buying property in the current real estate market in Prague has become hard, time-consuming and expensive. This is due to multiple factors: price, income, unwillingness to commute, strong demand, and a decreasing supply of new properties.
According to Deloitte’s Property Index of 2018, affording a 70 square meter flat in the Czech Republic requires 11.3 years of gross salary; the highest in Europe and only seconded by the UK at 9.8 years and the lowest being Belgium at 3.7 years. According to experts, this trend will continue and in the best case scenario will take several years to stabilize.
My goal was to buy the best property that would satisfy or exceed my needs while staying within the budget. In order to do so, I leveraged several APIs collecting data about public transport, traffic, noise, and the retail properties themselves. This helped to make a data-driven decision that would both support things that realtors usually tell you about the neighborhood and discover the things they don’t.
Personally, there were several factors that influenced the fact of actually “making the move,” including increasing mortgage interest rate and law changes. My search criteria included the following attributes:
- Property Type
- Livable: I didn’t want to go through a big reconstruction.
- To my place of work
- Good public transportation
Real Estate Market In The Czech Republic
If you go to one of the most popular Czech Republic real estate websites, you will be overwhelmed by over 80,000 properties, and not all of them are interesting. After doing my homework and applying some filters on the websites; only 0.58% were left, which still is a big number to go through one by one (508). Plus, you would have to check these sites every day and remember the properties you went through already. At the end, I reduced the number to 120 properties per month, which was a big improvement.
How Did APIs Get Into The Equation?
Traditionally, when APIs didn’t exist, we used to scrape web pages in order to get the information we needed. However, a week after you’d finish the perfect web page scraper, the page would be updated, and the whole HTML structure changed; then you were screwed — you had to check HTML page structure, and modify your code accordingly.
These days, APIs work on a contract basis, which means the API provider promises to keep the lights on. They do so by not altering the data structures and maintaining a stable API that doesn’t break the client application. To design my real estate filtering application, APIs were thus the option of choice.
The idea of buying a property came 3 years ago, and at the beginning, like a normal human being, I was searching manually, getting duplicated results, spending a lot of time, and was not very efficient. In July 2017, I realized I could make asynchronous calls to one of the most popular real estate web pages, and like Newton feeling the apple on his head or Archimedes taking bath, that was the eureka moment.
Retrieving Property Data and Price
At that moment I started fetching network calls, understanding the data structures and parameters behind every single payload. After devoting some time to understanding the undocumented API and basically reverse engineering the site, I decided to run an automated property retrieval and storage every two hours (which could have been initiated every day without a problem). This solved the problem of remembering the properties you had already visited.
At this point in time, I was able to check in with a quick glance and see houses, flats, and parcels. This improved the initial situation and gave better visibility into the market, supplying the first criteria of consideration: the price.
Geolocation and Actual Travel Time
What’s the shortest distance between 2 points? A straight line, right?. Wrong. Most of the real estate websites tell you things like “you have a restaurant 300 meters from the property” or “you have a supermarket 500 meters from the property.” Unfortunately, they leave out that you must risk your life by going through a highway or drive straight through a cornfield to arrive at these locations.
A straight line gives a false sense of distance/time. What you want to know is how much time would it take me to go there either by public transportation or by car. To find real distances to nearby locations, the Google Maps API came to the rescue. Usually, every property comes with geographical coordinates, and from there with a simple API call, you can figure out the travel time and distance to drive from each property to your place of work, city center, or place of interest.
Using such data, I was able to add important filters to the research:
- Houses and parcels travel time should be less than 30 minutes and no more than 25 kilometers from my point of interest.
- Flats travel time should be less than 20 minutes and no more than 10 kilometers from my point of interest.
Public Transportation Data
Who uses a car to go to work? Not many people do who live within a city such as Prague. Thus, discovering public transportation data was an important criteria for me. This gave another important filter to my research:
- Houses and Parcels travel time should be less than 50 minutes.
- Flats travel time should be less than 35 minutes and no more than 10 kilometers from my point of interest.
Again, Google Maps came to the rescue. I recycled the geographical coordinates and requested public transportation travel time from the properties to particular points of interest. So far, I had enhanced the data by two dimensions, uncovering hidden value that wasn’t originally queryable.
For each property in question, a call like this was fired:
$ curl https://maps.googleapis.com/maps/api/directions/json?origin=<ORIGIN>&destination=<DESTINATION>&mode=transit&key=<MY API KEY>
Here we are interested on the distance and duration values returned by the API. These values are taken from a time table, so they’re not influenced by traffic (in the case of buses).
After finding potential candidates, I realized I needed to dig deeper on certain aspects. For example, any public travel connection finder can respond with the time table from point A to point B, but what about if you’re taking a bus and it happens to be rush hour? You’re stuck.
Public transportation (bus, trams) travel times are affected by the traffic, but how do we confirm this? Again Google Maps! In terms of predicting traffic, the API is very accurate. A simple API call every 15 minutes will return a lot of information about waiting times in traffic. In my particular case, some data debunked realtor claims or confirmed them.
For example, one realtor claim that the bus would take 9 minutes to a point of interest happened to be true in most cases. While checking the traffic API, I found a 15 minutes travel time during heavy traffic, which is not the end of the world.
For each property in question, an API request like this was triggered:
$ curl https://maps.googleapis.com/maps/api/directions/json?origin=<ORIGIN>&destination=<DESTINATION>&mode=driving&key=<MY API KEY>
In this instance, we are interested on the distance and duration values returned by the API. As these results are given in real time, they will vary according to the time of the day you’re retrieving it. Unfortunately, right now you can’t get data from a past date.
Planes and Noise Pollution
Have you ever had the feeling you liked something very much, until you uncovered something horrible about it? That’s how I felt after saying “this is the one,” only to have a friend quickly tell me “be careful, that property may be near a plane runway”.
I almost dropped it; then I recalled some web pages providing information about noise pollution. After that, I found another source that provided the elevation of the aircraft. The noise produced by an aircraft is roughly given by its elevation and size, so I derived a simple model to input the aircraft elevation, which assumed a medium sized aircraft, in order to output noise level in decibels.
I had a model, but I still didn’t have the data. As with the properties, I started fetching some network calls and figured out the API from the web page, the cryptic fields, the undocumented API, etc. After analyzing the data with the model, I discovered that when planes arrive, they fly on average 750 meters above the property, making the sound barely audible.
High level Algorithm:
1. Get Flight information.
a. Extract Aircraft altitude
b. Adjust altitude base on the property altitude, as may be given in meters or feet above sea level.
c. Is the aircraft ascending or descending?
2. Compare with [Aircraft Noise level chart](https://www.nats.aero/environment/aircraft-noise/)
a. Consider height
b. Consider if the aircraft is ascending or descending
c. Assume a middle sized aircraft
3. Get the noise level
a.Interpolate the values if applicable
API Call to retrieve live aircraft info:
$ curl ‘https://opensky-network.org/api/states/all?lamin=45.8389&lomin=5.9962&lamax=47.8229&lomax=10.5226’
From the output we’re interested on the element with index 7; on each one of the nested arrays, which contains the altitude of the aircraft.
Good Practices Around APIs
These days APIs are driving businesses, capabilities, digital transformation, among other initiatives. Because of this, they must be taken care as first class citizens, not only from the technical point of view, but from a product management, documentation, marketing, security, user/developer experience (yes, we developers have feelings as well), among many others.
From the APIs used here, the access to 2 of them was not controlled. There were no security mechanisms on top of them, therefore you could access them and potentially abuse them. In case of public APIs, the proper rate limiting, quota, throttling, HTTPS everywhere, proper validation, auditing, monitoring, and authentications and authorization mechanisms must be in place.
If you are productizing your API, make sure you document it well. There‘s an article called: Why developers hate your API, in which documentation was listed as the #1 API problem. There are plenty options for API Description formats, i.e.: swagger, API Blueprint, RAML, WADL, etc. to help this process.
Regarding automation, there is always a lot of repetitive and manual work around software development (deployments, builds, testing, etc.) on which we could afford investing some time automating it. Automation will pay off in the long run. Remember the core of any automation: crystal clear and well established processes.
These days, all the data needed in order to make an informed or data driven decision is out there, and have in mind: numbers can tell you amazing things, but if you don’t feel comfortable in the property, there’s no power on earth that will change that.