Using APIs in Sound Engineering

Using-APIs-in-Sound-EngineeringAPIs have brought forward massive processing power and functionality to the world of sound engineering and design. This innovation, utilizing powerful hardware to software synergy and browser-based solutions through the application of sound design and presentation APIs, is an often overlooked aspect of hypermedia in the emergent media based internet culture.

Let’s look at a few sound-oriented web APIs, their applications, and the future of the industry. We’ll give a few examples of how they’re changing the game, and how other API providers can benefit by understanding them, considering them, and possibly even implementing them into web applications.

Narrowing Down the Field

Sound engineering encompasses both traditional music recording and production as well as highly experimental offerings in sound manipulation, file convergence and conversion, and even instrument creation and design. That being said, the APIs we’re going to discuss here fall into one of three categories:

  • Sound Recording — APIs that allow for the recording of audio and the basic manipulation of audio data in a browser experience;
  • Sound Manipulation — APIs that allow for further manipulation of the audio signal above and beyond that of the “Sound Recording” APIs; and
  • Experimental — APIs that do not fit into any of the three moulds established above, and instead provide a framework for highly experimental and non-traditional audio implementation.

Defining the DAW

Before we delve too deeply into the solutions on offer here, we must also discuss a key industry term that will be used throughout — DAW. The DAW, or Digital Audio Workstation, is a key element of any digital sound engineering solution, porting audio tracks from the electrical impulses generated by instruments into digital tracks that can be edited and manipulated.

There are a range of DAWs that are currently in use today. The most popular, Avid’s Pro Tools, is a streamlined interface used in most professional audio settings, considered the “industry standard” for voice over work, scoring, and music production. Other solutions, like Logic Pro and Ableton, offer a “prosumer” (a portmanteau of “professional” and “consumer”) grade interface, offering a DAW at a lower price point while sacrificing or simplifying some functionality.

Other DAWs such as certain free versions of Cakewalk, Fruity Loops, and Audacity are seldom used in professional settings, but offer completely free functionality at the cost of technical assistance and modification through plugins and other such tools.

There are several significant drawbacks to the traditional DAW setup:

  • These tools (barring the free offerings) have an incredibly high cost of entry, requiring investments of hundreds of dollars;
  • DAWs are limited to desktop and laptop use — while some headway has been made for browser-based editing and mobile recording, these are often tied to specific hardware, and have yet to offer a true DAW experience;
  • There is no direct upload functionality on the vast majority of products to allow artists to share bounces or stems to Soundcloud, Bandcamp, or social media outlets;
  • DAWs are tied directly to the local processing and memory power of the machine which is hosting the DAW; and
  • Help and technical support is often limited or non-existent.

That’s not to say the concept or even common application of the DAW is necessarily a bad one — after all, much of the entertainment industry is based around one of these aforementioned tools. What should be gleaned from this, however, is that things can — and should — be done better.

Why Move the DAW?

It’s clear that the DAW needs to change because as it is, it is woefully unprepared for the future of professional audio engineering.

The natural progression that services are following in the modern era is one away from local processing. We’ve seen this time and time again, specifically in the movement away from local servers to SaaS, PaaS, and IaaS solutions. Take Adobe Creative Cloud, or Google Drive, where files are dispersed and made accessible from any user’s device. Powerful browser-based photo editing is now possible with Pixlr, providing most Photoshop capabilities. 

Audio engineering is not immune to change, and as professionals adopt cloud migration and web programmability solutions, browser-based solutions could become the norm, rather than the exception.

Part of this sea change in approach is due to the removal of limitations that have long restricted the viability of cloud solutions. While WiFi and hardline network speeds were once a concern, as true high-speed internet roll across the world, this concern is largely being mitigated.

Sound Recording

As any musician can tell you, the world of audio recording is one fraught with complex hardware interactions and huge mixing interfaces. Audio signals are brought into the production environment utilizing huge, hulking interfaces, and are then mixed either physically using gigantic workstations or virtually using complex digital audio workstations which require huge amounts of RAM and processing power in order to function.

At least, that was the world before cloud computing and browser-based solutions. As APIs have expanded in power and capability, solutions once relegated only to physical and local applications have slowly crept to the browser. A great example of this sort of expansion of functionality into a space never before considered is Soundation.

Soundation is essentially a Digital Audio Workstation, or DAW. By leveraging  the power of server processing and pairing this with local resources, Soundation can provide a professional-quality DAW, replete with samples, modulators, and effects, all with a non-install web-based application.

soundation – example of a browser-based DAW

This alone should prove the power of the API-driven web-based DAW environment, but it gets better. While the Soundation API is largely internal and undocumented, it does use one API that is somewhat more public and well-known — the Google Hangouts API.

This API enables Soundation users to collaborate over real-time using not only the Google Hangouts text chat functionality, but video chat as well. This fills a huge gap in the sound engineering landscape — live collaboration. Digital DAWs are infinitely portable, but they essentially mirror physical DAWs, meaning equipment needs to be lugged around, versions need to be matched, source files need to be compiled into a standard format, even within different versions of the same application, and so on.

Soundation is incredibly powerful because of how it closes distance – users utilize a simple web application to create content and collaborate in real-time, as if they were next to their fellow musician using the same DAW.

This is a huge step, and a great leverage of a powerful API. Likewise, Soundation utilizes other third party APIs effectively, allowing content to be uploaded to social networks like YouTube, Soundcloud, and Facebook. The way Soundation utilizes these APIs should be a yardstick for other developers, and is a perfect demonstration of just how far browser-based solutions have come.

Of note is the WebRTC working draft from W3C. While applications such as Soundation allow for real-time collaboration using third party APIs, this solution is being designed to harness the power of open specification to allow for real-time browser communication outside of limited third party APIs.

Sound Manipulation

Part and parcel to the concept of Sound Recording is Sound Manipulation — the addition of effects to an audio file such as reverb, delay, chorus, distortion, compression, and others.

Manipulation is different than recording. For one, sound recording applications tend to focus more on the provision of a clean baseline signal to ensure the data captured to track is as clear as possible. These applications rely on external data input, so this input needs to be as clean, as low latency, and as unaltered as much as possible.

Accordingly, much of the technology within sound recording applications that does focus on sound manipulation tend to be focused on the manipulation of this baseline audio within the confines of the original sound. In other words, they want to manipulate, say, a guitar signal, while maintaining the fact that it is indeed a guitar signal.

Sound manipulation in this class tends to rely on reverb, echo, sometimes reversal, and equalization in order to reach the desired tonality. But sound manipulation can be far more expansive. Tools within this range include things like bit crushing, which drops the bitrate of audio to generate “chiptune” or “mechanical” sounds, multi-cuts, which takes the same sample and applies a rhythmic range of effects to create a beat, and other strange sound tools and generators.

sound manipulation

Audiotool – microservices for audio effects

A great example of this is Audiotool. Audiotool is a fully featured audio manipulation workstation based around a nodular concept of sound tools and generators. While you can load your own samples and utilize the internal API to modulate them, the real power of this tool comes with the pre-formed waveforms and the included synthesizers and emulators, each using an internal API to harness server power to create dynamic and complex sounds.

Think of Audiotool as an emulator for the floor in a music studio. Generators, modulators, parametric equalizers, synthesizers, loops, and other such gear are spread out on an empty workspace, connected through draggable cables. These tools are then editable and interconnectable, allowing for creative sound generation.

Like Soundation, Audiotool also ties into a variety of APIs allowing for sharing to their own internal platform, SoundCloud, Youtube, and Facebook.


The world of audio APIs aren’t just within the realm of recording and remixing, however. As with any emergent technology, sound engineering APIs have developed additional off-shoots from the main applications that are best placed in a broad category simply termed “experimental”.

Many of the best examples of this sort of experimental API usage can be found in the Web Audio API from Chrome. This API expands greatly the audio functionality of the browser, and opens up a ton of possible functions that would otherwise be unavailable.

A great example of this is the Infinite Gangam Style site. Utilizing the Web Audio API, this app analyzes audio and breaks it into individual component “beats”. When these beats are collated and compared, nearest neighbors are found and intrinsically linked with a specified “similarity threshold”.

What this essentially means is that the site generates an infinite version of “Gangnam Style”, linking similar beats to other similar beats, and creating brand new passages from new combinations of beats and loops.

Echoing some of the power of the other sound recording and manipulation solutions, another great example here is Tibersynth, a sound synth utilizing vectorization to generate noise. Utilizing the Web Audio API under an interface coded in Raphael.js, the synth is essentially a controller mapped input device generating a white noise tone.

cluster cloud


The big thing here isn’t necessarily what tibersynth is now, as it’s still in its early stages, but what it represents — interactive audio synthesization driven by vector input is a hard enough thing to do on hardware, but to have it done so effectively and powerfully in a browser solution is amazing.

Perhaps the best example of the experimental power here is the Graphical Filter Editor, a website that allows you to utilize a sound sample and draw a graphical filter in real time, editing the parametric qualities of a sound file. At one time, this sort of thing took an entire bank of EQ hardware devices, whereas now, it takes a single webpage. That is the power of APIs in Sound Engineering.

While at first glance this might seem disconnected from the professional concept of sound engineering, one must consider the possible applications these sorts of solutions offer. The InfiniteGangnam page, for example, provides an algorithm with which an engineer can develop ambient sounds and looping music for applications such as video games and web design. The GraphicalFilter editor is a wonderful example of a standalone solution that could easily be implemented into a service allowing for clearer audio transcription and text-to-speech through dynamic filtering of noise.

These sorts of solutions will, over time, become less “experimental” and more “applied”, but much of the power of these experimental solutions has yet to truly be unlocked.

Future Programmable Sound Production

This is by no means an exhaustive discussion — the power of the emergent sound audio API field is still yet to be seen and fully tapped into. What this is, however, is a glimpse into a world seldom discussed when APIs are considered, and one that is sadly under-appreciated by musicians and sound designers everywhere.