Why you should be using drivers to integrate with disparate data sources
As enterprises grow, so too does the number and variety of data sources they use to drive business. The average company is using at least sixteen SaaS applications and has data in at least that many on-premises data stores and internal apps.
With such disparate data, each tied to a unique API, integrating, managing, and maintaining integrations for all of a company’s data creates a whole new set of challenges. Thankfully, solutions exist that enable enterprises to rely on data to drive business without causing undue strain. In this article, we’ll explore and compare the different options for solving the data integration problem and explain why you should be using standards-based drivers to abstract your API integrations.
Common Options for API Integration
As you look to solve the challenge of integration, you are presented with many options. The most popular tend towards the following:
- Direct integration: making direct calls to an API or service directly
- SDKs: language-specific bindings and libraries
- Middleware: ESBs, ETL products, or an enterprise scheduler
Which option is best for you is largely dependent upon what how you’re using data now (and how you plan to use data in the future), your own time and money constraints, and the context of your integration needs.
If you have a straightforward use-case, like checking the success or failure of an operation or displaying a simple response, then any of the above solutions will work.
However, as your business flows and requirements become more extensive, you may find yourself looking for a data-centric approach to your integration. In this case, a new option for integration presents itself: standards-based drivers. With drivers, your development tasks are simplified through a unified interface.
What is a Standards-Based Driver?
You are probably familiar with drivers for databases (think technologies like JDBC, ODBC, ADO.NET, etc.). Thanks to the trend towards disparate data, many companies are beginning to either create or commission drivers based on the established technologies that make APIs look like databases, translating SQL queries to API requests. For example, NetSuite has created their own JDBC and ODBC Drivers and ADO.NET Provider for use in their Suite Analytics platform. Additionally, companies like CData Software often create third-party standards-based drivers on behalf of enterprises (e.g. HarperDB and SlicingDice).
Modeling an API through SQL
- Tables and views correspond roughly to resource collections (Orders, Accounts, Users, etc.)
- Individual elements from a resource generally correspond to a row, with fields mapped to columns.
- Sub-collections can have a foreign-key relationship with parent collections (Orders and Order Line Items)
- CRUD operations are roughly translated to SQL statements:
|HTTP Request||Equivalent SQL|
- Operations and entities that are not easily represented are implemented through stored procedures
Now that we’ve explained the basics of a standards-based driver, let’s explore the common approaches to data integration and the benefits and considerations surrounding each.
Common Integration Approaches
|Greatest control||Longest integration|
|Simplified integration||Lifetime maintenance|
|Limited reuse across data sources|
Direct API integration is great for the most straightforward use-cases, like checking the success or failure of an operation. Direct integration gives you the most control over how a user or application is authenticated, how security is maintained, and how data is requested and processed.
However, building a direct integration is likely the most difficult, particularly if you are building integrations for disparate data sources. Each source may use different protocols, or even leverage similar protocols differently. Once you have the integration built, you are still responsible for optimizing performance when it comes to data processing (not to mention testing, failover/redundancy mitigation, etc.). Also, because SaaS APIs are constantly growing and changing, you will be responsible for updating and maintaining the direct integration for its lifetime.
Integrating directly with an API offers advantages when it comes to granular control of the integration, but the cost of lifetime maintenance may outweigh those benefits.
|Tightly bound data access||Lack of shared data model|
|Simplified integration||Limited vendor support|
|Limited reuse across data sources|
SDKs offer one of the fastest ways to integrate data from client applications and APIs thanks to technology-specific developer libraries. With SDKs, you often get a logical coupling between the SDK and the underlying API. For example, a “document” entity at the API level will likely be represented by a “document” object or structure at the SDK level.
While you can quickly develop an integration using an SDK, you may run into issues with unwanted complexity in your code. And since SDKs are managed and maintained by the community or the API provider and rarely look similar across data sources, there is little opportunity for you to transfer knowledge gained from integrating with one source into an integration with another.
Additionally, using an SDK opens to door to “dependency hell” where your application often requires updates, recompilation, and redeployment based on changes to the SDK definitions.
If you are a developer tasked with working in a specific language or environment, with a relatively small number of data sources (or a larger number of sources with similar SDKs), then using an SDK may be the right choice. The question you need to ask is whether the tightly bound data access is worth the potential for recompilation and redeployment based on an updated SDK.
|Abstract data connectivity||High cost|
|Shared data model||Greatest complexity|
|Least developer control|
|Limited data source integrations|
For many businesses, integration middleware, like an enterprise service bus (ESB) or an extract-transform-load (ETL) solution, has become a critical piece of the back-office. ESB and ETL solutions enable integration between disparate systems and typically empower business to configure and maintain their own data extraction and automation.
Unfortunately, these solutions tend to be large, complicated, and costly. And ESBs and ETL solutions usually end up shifting the burden (and control) of integration away from you as the developer into the hands of corporate IT. Once there, the solution can demand technology and monetary investment to configure, secure, and maintain. Once the integration solution is out of your hands, you may discover a limited ability to manipulate data to meet the needs of the applications you have been tasked with developing and deploying.
With integration middleware, you will often see the benefits of a shared, but abstract data model, easing the burden of integration within the application. However, you may find yourself wrestling with a cumbersome solution with limited integration capabilities that lies outside of your ability to directly configure and maintain.
|Low learning curve||Not suitable for event-based operations|
|Minimal documentation||Additional layer of abstraction|
|Insulation from API changes|
As a developer, whenever you are integration data from various sources into your application, you will often want to request and aggregate data, collate related data points, or otherwise summarize data. While data aggregation is possible through various APIs and SDKs, standards-based drivers offer you the ability to work with all of your data using a single interface: SQL.
With standard-based drivers, you simply need to know SQL and the frameworks or libraries needed to work with SQL databases in your language of choice in order to get disparate data. Because the data source endpoints and resources are discoverable by using standard queries for metadata, you’ll be able to spend more time connecting to data and less time searching through documentation.
There are other benefits to standards-based drivers that lie outside the scope of development but still warrant mentioning including easier (and code-less!) connectivity to disparate data from popular BI, reporting, and ETL tools, a detachment from API changes and updates (thanks to the drivers being developed and maintained by a third party) and a marked increase in API accessibility.
Advantages of SaaS Drivers
Standards-based drivers for Software-as-a-Service (and other non-database platforms) grant key benefits over native SDKs or other common methods of API consumption. It’s likely that you already know how to work with database data in your preferred languages and environments. Data processing steps, like opening a connection, sending queries, processing results and closing the connecting, are trivial, meaning that with standard drivers, you already know how to work with all of your data.
Since the drivers are based on ubiquitous standards, you can readily browse and work with your SaaS application data in IDEs. With the driver installed, you’ll be able to work in environments like Eclipse, IntelliJ, Visual Studio, and in popular products like MS Excel and MS Access.
“While HarperDB embeds developer-friendly REST interfaces for developer integration, it was critical that our platform offer standards-based drivers to extend our access to the broader world of BI, Analytics, Reporting and ETL integration.”
– Kyle Bernhardy, CTO, HarberDB
Simple SQL access to all of your data provides measurable advantages when it comes to time spent integrating with data. Once you consider complex operations, like querying an e-commerce SaaS application such as Magento to generate a list of your top-spending customers in each state, the advantages of SQL vs. native language bindings become even more apparent. With SQL, you can perform JOINs, GROUP BYs, SUMs to aggregate your data, use ORDER BY to sort, and use WHERE clauses to filter. With native language bindings, you would be responsible for coding the logic, which will look different for each data source. With SQL, you get the same functionality across virtually all data sources with no need to replicate data to a database.
Many SaaS drivers support caching, whether in-memory or in common database, which means that you get optimized performance for repeated queries (the driver will hit the API for the first request, then go to the cache for subsequent, repeated requests).
As you move through your development stack, you often find yourself moving from one language to the next, all within a single application, which highlights the obstacles of using language-specific APIs even more. With drivers, your data looks the same and is accessible using the same queries regardless of whether you are work with ADO.NET, JDBC, or ODBC.
What’s the Takeaway?
In the end, you’re going to make the integration decision based on your own needs and context. For some developers, direct API integration or SDK use make the most sense, thanks to straightforward data needs, common development environments, and a need for granular control over the integration. For others, using integration middleware is the right choice, based on their corporate, back-end IT configuration or their need to collect or consolidate their data outside of the development environment.
For many developers, though, standard drivers present a solid solution for their data integration needs. Thanks to a uniform interface across data sources and development environments, ubiquitous standards, self-describing documentation (through metadata discovery), and insulation from API changes and updates, developers are able to focus on what they do best, building solutions that drive business.