The Ultimate UI: Voice User Interface

In the popular sci fi franchise of Star Trek the characters interact with their computer system through a combination of touch controls and voice commands. From a very early age in my life I recall thinking that the natural way in which they converse with the computer was truly remarkable. Even at that time, I thought computers would never be sophisticated enough to achieve that level of intuitive control. Fast-forward to today and the idea of speaking to our computer devices is now a reality that is rapidly evolving. Advancements in what is often called Voice User Interface (VUI) happen now on a monthly basis. Instead of taking years to make breakthroughs, those developments happen after a few months. I believe that the same level of intelligence exhibited by the Star Trek computer system will soon be achieved, perhaps within the next decade. In this article I will explain why I believe that VUI is the ultimate form of user interface technology and what this will mean for the future of computing.

In the early days of voice controls it was difficult to program a truly responsive system. Primitive voice recording software was used to convert speech to text. That text was then analyzed for keywords that would trigger the execution of commands on a system. This was tedious and not at all intuitive. All commands had to be pre-programmed by a human for what each would do. For the user, they were required to know ahead of time what the commands were. It was insufficient to simply say “hey, turn my TV on”, a more structured command like “Computer, turn TV on” was needed. Eventually a breakthrough would occur that would allow for the combination of natural speech recognition and intuitive command execution.1

In 2008 Google leveraged the power of machine learning to create “the Google Voice Search app for iPhone”. 2 Major breakthroughs in machine learning allowed for Natural Language Processing (NLP). NLP is a “subfield of computer science, information engineering, and artificial intelligence concerned with the interactions between computers and human (natural) languages, in particular how to program computers to process and analyze large amounts of natural language data.”3 Using NLP allowed for users to simply say what they wanted, the way they would naturally make a request of a fellow human. NLP does the heavy lifting in figuring out what that person just said. This only covers one component of a VUI, the next big innovation was taking the user’s request and finding a way to make a computer understand what to do with that request. For this the big tech companies once again turned to artificial intelligence.4 To return a useful result AI assistants such as Siri, Cortana, Alexa, etc will take the user’s request and determine what the user wants. For example, a user says “Alexa, what time is it?”, Alexa will reply “it’s 2:30 pm”. In years past a programmer would have to specifically program Alexa to know the command “what time is it”. However, with the help of AI, the command is analyzed and it is decided in that moment what to do. This is still limited in function, currently not every possible request can be executed this way. With each passing year, however, these AIs get more and more intelligent, rapidly assimilating new commands. 5

Now that this technology is quickly taking off, what makes it so useful? I believe there is a reason why Gene Roddenberry chose voice as a key method for interaction with the computer. The characters in Star Trek converse with the computer almost as though it was an artificial person who was also a member of the crew. The computer has an expertise and access to knowledge. The computer even seems to have it’s own personality. Voice control allows for hands free operation. You don’t even need to be in the same room as the listening device. You can continue to work with your hands, for example: cooking a meal. You make the request “Siri, set a timer for 20 minutes” or “Siri, how many grams is 2 ounces?” and you get a response without having to touch anything. VUI works for people who are blind and it works for people with physical impairments who can’t use traditional input methods. VUI also more closely match how we would interact with another human. This means that the learning curve to using a computer system is much lower. New users don’t need to learn to type on a keyboard, nor how to use a mouse and a GUI. In the same way that a child learns to ask their parents for things, that same child could speak to a computer in much the same way.6

In the near future these AI assistants will become extremely important tools in day-to-day business. Already medical doctors are using IBM’s Watson to research medical issues.7 Consulting an AI will soon become standard practice when troubleshooting a problem. An AI, such as Google’s assistant, will have access to the internet and a database of knowledge so vast that no human could ever remember it all. New workflows will be incorporated into business life as people are freed from their keyboards to use their hands and body to do other tasks. A VUI can be mobile, and need only be as small as the combination of a microphone and speaker. This will allow computing to go to places where traditional mouse and keyboards (or even touch devices) can’t, such as riding a bike or a police office out on patrol. VUI truly represents the ultimate interface in human-computer interaction.

  1. https://medium.com/swlh/the-past-present-and-future-of-speech-recognition-technology-cf13c179aaf
  2. ibid.
  3. https://en.wikipedia.org/wiki/Natural_language_processing
  4. https://www.forbes.com/sites/herbertrsim/2017/11/01/voice-assistants-this-is-what-the-future-of-technology-looks-like/#381c04b6523a
  5. https://medium.com/swlh/the-past-present-and-future-of-speech-recognition-technology-cf13c179aaf
  6. https://medium.com/@goodrebels/how-voice-user-interface-is-taking-over-the-world-and-why-you-should-care-54474bd56f81
  7. https://www.fool.com/investing/2018/01/12/4-ways-ibm-watsons-artificial-intelligence-is-chan.aspx