Virtual Assistants Harness Third Party Developer Power

In a recent article we talked about home automation and the concept of virtual assistants. Although Amazon’s Alexa is one of the most exciting virtual assistants (VAs) around, she’s still not the best known; that honor has to go to Apple’s Siri, who is currently undergoing some big changes.

In June 2016, at their latest WWDC, Apple announced that third party apps can now be (with several key limitations) operated by Siri. That’s a far cry from the past state of affairs – pre-iOS 10 Siri can be used to launch native Apple apps, but that’s it. For what’s touted as a big move, the response has been mixed at best – Seeking Alpha and ArsTechnica have called SiriKit “dated” but full of “untapped potential” respectively.

Maybe that’s not surprising, since other services like Google Now and Alexa are already doing a lot of what Apple seems to be suggesting is totally new. In this post we’ll take a look at some of the mechanics of SiriKit and how it compares to similar products on the market, shedding light on how AI assistants with more open door attitudes towards third party developers are being set up for future success.

SiriKit – Basics and Limitations

As one can see from the SiriKit Programming Guide, SiriKit currently works with seven different domains:

Ride Booking (e.g. car share and taxi-like services)
Messaging
Photo Search
Payments
VoIP Calling
Workouts
Restaurant reservations (via Apple Maps only)
Climate/Audio (e.g. changing the car thermostat or radio station)

At first glance, this seems like a ragtag collection of services with few common threads holding them together. However, after a little more thought, it becomes clear that Apple views Siri very much as a multifaceted digital assistant that’s capable of competing with services like Amazon’s Alexa.

Apple’s SiriKit will soon embrace third party apps

SiriKit relies on the Intents and Intents UI frameworks, with definitions of how each intent works. Apple provides the example of the Payments domain, which relies on one intent to send payments and another to receive. This means that, in theory, you should avoid accidentally sending a payment when you’re trying to receive one.

Currently Siri is only capable of interacting with apps through what’s known as the Intents extension, which is primarily comprised of three objects:

An intent object defines the user’s intent and contains the data that Siri gathered from the user.
A handler object is a custom object that you define and use to resolve, confirm, and handle an intent.
A response object is a data object containing your response to an intent. When everything is working as it’s supposed to, this process seems fine. But what about when it’s not?

Let’s say that Siri’s voice recognition accurately receives 95% of a command but fails at the final hurdle, choosing the wrong name from a contact list. The process outlined above is so rigid that users will almost certainly have to discard everything that was correct and start all over again.

Custom Vocabulary

One of the problems with virtual assistants stems from the fact that we expect them to interact like another human being. You can’t ask a virtual assistant a question like “what time does the game start?” because it lacks the contextual knowledge to understand that you’re talking about your favorite sports team’s game.

One interesting feature of SiriKit, designed in part to nullify some of the problems caused by misunderstanding, is that it allows developers to specify custom vocabulary, i.e. define a custom term and the circumstances in which it might be used. Currently though, these categories are limited to contact names, photo tags/album names, workout names, and ride options.

Unfortunately, this is another example of Apple constricting what developers can do with SiriKit. Even the list of SiriKit’s compatible domains feels rather limited – no mapping, audio, or video control? Seeking Alpha points out how restrictive SiriKit feels:

Contrast SiriKit with Amazon’s Alexa Skill kit: Unlike in SiriKit, developers have complete freedom over what skills they want to develop. By June, over 1,000 third-party skills had been registered. Further, Alexa can manage complex multi-stage conversations.

Also Read: How APIs Are Breaking the Barriers to Smart Home Automation

Alexa, Cortana, and Google Now

The positive sentiment towards Alexa is echoed by users, one of whom said the following about his Amazon Echo hub when speaking to The Guardian: “When she [Alexa] does something wrong, it’s not like a broken vending machine. My frustration with her is more like with a human who’s learning.”

Siri has been the butt of jokes for so long – “I asked Siri to do X, and look at what it came back with!” – that it’s difficult to imagine people seeing it in this sort of light. Rene Ritchie of iMore comments that:

Natural language, sequential inference, and voice interface are incredibly enabling technologies, including and especially when it comes to accessibility and inclusivity. We need them everywhere.

Amazon Echo dot can now control Alexa from anywhere in a home

Amazon’s Echo dot can now control Alexa from anywhere in a home

As such, Siri, Google Now, and Cortana have an advantage over Alexa in that they’re available both inside the home and while people are on the move. Interestingly, Google and Apple are both working on products to bring their virtual assistants into the home too but there’s little talk of Amazon trying to bring Alexa to smartphones…yet.

We’ve already mentioned above that Alexa has over 1,000 skills, and Cortana is capable of integration with Android and web apps as well as Windows 10 Universal apps. PC Mag takes a closer look at how Cortana functions behind the scenes concluding that:

For third-party apps, the sky’s the limit when it comes to what an Action can do. One example Brown gave was Peel, a home-automation app whose Action was to turn on the lights and turn up the heat when Cortana sensed you were on the way home. She also mentioned Petzi, whose Action is to remotely feed your dog.

Google Now can also do some cool stuff like this, such as adding items to Wunderlist or Trello and sending messages via WhatsApp, so it seems more and more likely that all of the major players in the virtual assistant space will inevitably (and quickly) need to open things up as much as possible to third party developers. This comes with its own set of risks – exploits, NSFW content, and processes failing to work correctly – but these are risks that Apple, Amazon, Microsoft etc. may need to take to stay competitive.

Enter Viv

SiriKit may be off to a bit of a slow start, but that doesn’t mean that it’s doomed to failure. Far from it – Apple actually has a history of being slow out of the gates only to ramp things up further down the line. Take Apple Maps, for example, which was initially shunned by the vast majority of iPhone users in favor of Google Maps. A few years on and Apple Maps is now 3 times more popular on iOS than its Google equivalent.

Impressive looking Viv could provide “an intelligent
interface to everything”

The baby steps being taken into the custom vocabulary space by Apple are exciting because they may eventually put an end to those rather abrupt “Sorry, I didn’t understand that.” messages. It’s not quite true artificial intelligence, but a vocabulary library that “learns” from the input of thousands of programmers working with SiriKit is an interesting possibility – Siri currently only works when users are connected to the internet, so the collection of mass user input to analyze and improve its AI is hardly a new idea.

The really interesting thing about SiriKit, along with Alexa, Google Now, and even IBM Watson is that it suggests that voice commands are here to stay. Viv, an app by ex-Apple staff who created Siri, is already impressing with its ability to understand complex and conversational language like “was it raining in Seattle three Thursdays ago?”

If you’ve ever asked Siri about the results of a recent sports event only to have it throw back the results of a random matchup from several years ago, you’ll know that it’s far from being at this conversational level.

Viv also has the capacity to interact with various third party apps, being used to send money via Venmo, order flowers, and book a hotel during a demonstration at TechCrunch Disrupt 2016. What remains to be seen is whether these interactions will come with some of the same limitations that SiriKit has or will behave more like helpful, deeply intelligent, virtual assistants that comprehend commands from user inputs.

The Future of Virtual Assistants

No matter how advanced virtual assistants are getting, there’s no way that the manufacturers of an operating system can ever prepare their AI assistants to understand everything that gets thrown at them. It’s tough to imagine, without a ton of contextual knowledge built in, a virtual assistant that understands the difference between “isle” and “aisle” or the deck of a ship and one you might have in your back garden – homophones, homonyms and homographs are still a common source of frustration for anyone using virtual assistants.

Making intelligent VA more emotive through visual cues is an ongoing goal for Cortana

Making intelligent VAs more emotive is an ongoing goal for Cortana

With that being the case, third party development and APIs are, and will continue to be, crucial to progress as they increase value in the AI with new use cases and unforeseen abilities. With Amazon’s more transparent approach to third parties and Google Now’s excellent voice recognition and breadth of knowledge, Apple has its work cut out if it hopes to compete. SiriKit is a step in the right direction, and the fact that Siri is still by far the most used virtual assistant – in most places, anyway – means that Apple should have plenty of data to work with.

Ultimately, it appears that the winner of the VA wars may well come down not to who has the best smartphone or home hub app, but who offers the best overall experience across all devices in the evolving Internet of Things. This must inevitably mean open APIs that enable voice integration for the many devices on the market – as hyper specialization and competing device manufacturers will continue, a degree of standardization is necessary for a cohesive system.

In other words, Apple and Amazon probably aren’t going to make that smart egg minder or wine bottle capable of talking to you, but someone else might if the API is there to do so.