Are we ready for Voice UI?

Artem Petrov
Reinvently Insights
5 min readMar 31, 2017

--

Caught in the act… talking to her smart phone.

It’s awkward. How many times have you been caught talking to yourself? Now, we have an excuse, Voice UI, “I was just talking to my smartphone.” Uh huh. Instead of using Graphical UI’s to point and click, we can use our voice to Identify and Instruct. A few years ago, you’d have gotten that look:

Are you like.. a crazy person — V for Vendetta

But, today? You may not get that look, but you are probably thinking others are still thinking it. Talking to your smartphone, computer, home’s smart system, car or even a vending machine… it should be downright fashionable!

Some of the Biggest Names in Tech like Google, Apple, Microsoft and Amazon would like it to be chic. They’ve invested massively in their own proprietary voice recognition technologies to make it easier for you to interact with your digital devices.

There’s no doubt that voice recognition is improving by leaps and bounds, but are we ready for Voice UI to replace our point and click Graphical UI’s?

Yes. No. Maybe? There are some issues to address…

The Crazy Catch 22.

You aren’t crazy if you think you’re crazy. You know if someone catches you talking to your smartphone, they may think you are crazy. So, you only talk to it when you’re alone — which is kind of crazy.

A study by Creative Strategies shows that while 98% of iPhone users have used Siri, fewer than 10% are comfortable talking to Siri in public. Typical usage is either at home (39%) or while driving in their car (51%).

So perhaps, to be more precise, VUI is too new.

Everyone is not conditioned to it yet. It’s not for a lack of effort. Look at all of the sci-fi movies with “cyborg” characters arguably trying to bring us closer or more skeptical about what is likely to be the “new normal” of tomorrow:

  • I, Robot
  • Hal from 2001: A Space Odyssey
  • The Terminator Series
  • My Girlfriend’s a Cyborg
  • Ghost in the Shell (upcoming)
  • WestWorld

Recognition Issues, Still

Sometimes — too frequently, it simply doesn’t recognize what you say. Would I use Voice when making a financial transaction? Hell no.

I have an accent, so sometimes I have to repeat my complicated request a few times.

It’s necessary to speak clearly and loudly to be reliably recognized. I’m waking up at 5am and using Amazon’s Alexa to control the lights in my apartment. Whispering commands to avoid waking my wife can take several attempts. It can be like a freakin’ Verizon Wireless commercial gone bad,

Siri, can you hear me now? <silence>
Siri, can you hear me now? <silence>
Siri… (expletive), I’ll do it myself… (expletive).

Not Smart Enough, Yet

Voice systems are getting smarter. Developers are adding more and more features. But they are not smart enough yet to be useful for every occasion. It’s okay when it comes to answering short questions about time, weather, news, sports or to even get information from Wikipedia.

During last Oscar event I couldn’t stop asking Siri, “How old is Matt Damon? Jennifer Aniston?” And others. But it is still far, far away from being a real voice assistant with anything even close to an artificial intelligence.

Not Suited for Complicated Tasks

Have you ever tried to write or dictate an email using Siri while you’re driving? We’re talking about an email with a couple of paragraphs, not just a short SMS message. Argggh… not great experience.

In fact, presently about 99% of VUI usage is directed to asking very simple questions or giving easy instructions, like:

  • Alexa, Turn on my kitchen lights.”
  • “Okay Google, Play During Times of War by the Talking Heads”
  • “Hey Siri, Does Santa Claus Exist?”

There are plenty of visual elements that are missing in VUI comparing to GUI. Like visual text, structure, composition, underlining typos or grammatical mistakes or inserting links. You can’t immediately read what you just wrote, er… dictated.

IVR Services and Bad Elevator Music

We’ve gotten used to hating our experience with bank and insurance company IVR services. We’ve been struggling with them for decades. Calling your insurance company or bank is like going through the Nine Circles of Hell before your request can be processed or even talk to an operator.

Sometimes Doing it Manually is Faster

You still need to activate your voice assistant with a press of a button. While Alexa is permanently listening to you to command, for Siri , Google voice and Cortana, you have to activate it first. Sometimes it’s just faster to do something manually than activate Siri and try to formulate your request.

Too many Proprietary Services.

Each Voice UI (by Google, Amazon, etc.) has its own behavior patterns, names, functions and limitations. At home I talk to Alexa. Siri “lives” in my iPhone, Mac and Apple TV. My car has a proprietary voice system from BMW.

Every one of these systems has own issues and advantages and you have to get used to them. When I press voice control on the steering wheel of my car in 50% of cases I say “Alexa… (censored)….” and then something like “What’s the best route to Mineta San José International Airport?”

Perhaps there should be a universal standard?

Opportunities for developers

In 2016, Apple and Google opened their voice API’s to application developers. This means that you can use Siri / Google Voice services to create a companion VUI for your GUI inside your iOS and Android applications. Does it make sense to add voice commands to every single feature of every mobile app? Probably not, but in some cases, voice could be an excellent alternative to GUI

Some examples:

  • Operation of motor vehicles and machinery
  • For the vision-impaired.
  • 3rd party platforms for PC’s and Macs like VoiceAttack.
  • Voice authentication mechanisms instead of or in conjunction with fingerprints and passwords.

AR / VR and Voice Interfaces

I remember my excitement when I received Google Glass back in 2013. I could control basic features of the device with my voice, “OK Google, open camera” and then take a photo by winking. It felt like I was in the middle of Sci-Fi movie.

Today, the VoiceAttack service adds voice control to your Oculus or Vive experience. and looks like in virtual reality it’s much better to use voice commands that to type on a virtual keyboard. So, I think for Augmented and Virtual Reality, voice could work amazingly well as an input and output source.

Conclusion

Though voice UI is still not perfect, it is steadily becoming a “normal” part of our everyday life and business operations. Voice assistants are getting smarter; they are getting new skills and features. Reinvently has worked with VUI on client projects to tie their apps in with Alexa. And though early, we are also exploring a variety of options for an out-of-this-world Virtual Reality User Interface. However, I personally think that voice assistants could be used as a standalone device only for managing simple online or offline activities, like turn the light on, get recent news, play music, set temperature and even check a fuel level in your brand new BMW 5-series 2017. For more complicated workflow, VUI will remain a companion to traditional input/output sources like keyboards, touchscreens, touchpads and gestures.

--

--

Artem is a coach, serial entrepreneur, angel investor and executive in product and consulting technology companies. Artem lives in works in Silicon Valley.