November 2017
Text by Alan Houser

Image: © mikkelwilliam/istockphoto.com

Alan Houser is a technical publishing consultant, trainer, and developer. He is a past president and fellow of the Society for Technical Communication, and sits on the OASIS DITA Technical Committee and Lightweight DITA Subcommittee. In his spare time, Alan enjoys engaging his Amazon Echo voice assistant.


arh[at]groupwellesley.com
https://twitter.com/arh


 


 

Available voice applications


While true voice-based user assistance is in the near future, these applications are available in the Amazon Echo Skills Store, and provide insight into possible designs for dialogue-based user assistance.

  • Tide
  • Stain Remover
  • Guitar Teacher
  • My Workouts
  • Glad Recycler

Exploring voice applications for user assistance

Voice assistants such as Amazon Alexa or Apple Siri have been around long enough to become much relied-on aides for millions of users. They also offer great opportunities for user assistance. However, while the tools are already out there, the tc community has yet to fully seize these possibilities.

"Alexa: Who is the Chancellor of Germany?"

"The Chancellor of Germany is Angela Merkel."

"Alexa: When was she born?"

"Angela Merkel was born on July 17th, 1954."

This natural-language spoken interaction happened very recently, between me and a device  the Amazon Echo voice assistant in my home. "Alexa" is the trigger word for the Alexa; I can interact with the Amazon Echo at any time by saying the trigger word. Calling the device by name  Alexa  contributes to the feeling that the interaction is natural.

In yesterday's science fiction, humans often engage computers by voice. And why not? Virtually all humans are capable of speech. We use our voices all day, every day, to interact with each other. Speech is perhaps the most natural, easiest way for most humans to communicate with each other.

When humans engage computers, however, voice interaction has been relatively rare. Voice has been available in very specific niche areas, such as automated telephone attendants and voice dictation software. But until recently, voice control of computers was not widely available.

But a grand convergence of improvements in speech recognition, natural-language processing, and cloud computing is quickly making voice interaction possible, even preferable.

The possibilities of science fiction are here today, through the increased popularity of voice assistants  Apple Siri, Google Assistant, Amazon Alexa, Microsoft Cortana, and others. These voice assistants have been most popular for personal tasks (setting timers, checking calendars, setting reminders), playing music and other audio (with voice control), and for web searches and fact-checking. But the devices can also empower application developers to provide rich user assistance experiences, all based on the human voice.

The growing popularity of voice assistants

The first popular voice assistant was Apple Siri, released in 2011. Siri is only available on Apple computers, tablets, and smartphones. Google Assistant is a similar service for Android tablets and smartphones. With Windows 10, Microsoft introduced its voice assistant, Cortana.

More recently, manufacturers have sold voice assistants as dedicated devices. Amazon launched the Amazon Echo in 2015. Google followed with Google Home, and Apple has announced a Siri-enabled device called the HomePod, due to be released before the end of this year.

The research company eMarketer estimates that these dedicated devices are already in 35 million U.S. homes, double that of last year, and that they will be in three-quarters of U.S. households by 2020.

Soon voice assistants will be even more widely available. Amazon recently announced the Alexa Voice Service, which allows developers to deploy voice-based applications on any Internet-connected device with a microphone and speaker. This capability may help to launch a new wave of smart voice-enabled devices. Most importantly for technical communication, this capability will allow developers to provide voice-based user assistance in virtually any product or device.

Voice-based user assistance?

Voice assistants are particularly suitable for technical communication. They can provide user assistance in a natural dialogue. We just need to program them to do so. The good news is that most manufacturers of voice assistants support third-party developers. They provide software developer kits (SDKs) and frameworks that spare developers the technical minutiae of deploying a voice-based application.

Some manufacturers of voice assistants focus on supporting developers who wish to provide voice control of Internet-connected devices, such as lights, thermostats, and security cameras. Their SDKs and developer documentation reflect this focus. In fact, Apple calls their developer framework "HomeKit," as it focuses on home automation.

Google and Amazon are more expansive in their developer support. Both Google and Amazon allow developers to deploy custom voice applications. Google calls these applications actions, while Amazon refers to them as skills. Both Amazon and Google provide excellent documentation as well as copious examples and tutorials. Individuals with rudimentary development skills can successfully deploy a basic application.

According to Amazon, more than 15,000 third-party Amazon Echo skills are available. Currently Amazon supports skills in U.S. English, U.K. English, and German.

These devices are clearly at a tipping point in capabilities, market penetration, and support for third-party developers. It's time to consider them viable platforms for delivering user assistance.

Concepts and terminology for designing voice applications

If you wish to program voice applications for user assistance, you must be familiar with the concepts and terminology of voice application development. Let's consider the language and terminology necessary to express, at a high level, the design of a voice application. Here, we will use terms from the Amazon Echo developer documentation, though the concepts are universal across voice platforms.

  • The trigger word is the word or phrase that signals to your device that you are about to begin an interaction. For Amazon Echo, the trigger word is "Alexa" (customers can configure a different trigger word from a small set of options, such as "computer"). "Hey Siri" and "OK Google" are other trigger words for Siri and Google voice assistants respectively. Most voice assistants are always on and listening for the trigger word.
  • The intent is an action that your user requests, like "search flights" or "get help." When you develop your application, you may choose to confirm specific intents. For example: "Alexa: Tell me how to print." (Alexa response) "Do you want help on printing?" An application may offer many intents within the flow of a dialogue.
  • An utterance is a literal word or phrase that customers might say to initiate an action. These are the literal words and phrases that form the user interface of your voice application. You must anticipate your users' utterances  as in real life; different people may speak the same request in different ways. One user may say "Search for help on printing." Another may say "Find printing help." Or "Tell me how to print." Your application must handle all reasonable variants of these phrases.
  • A slot further defines an intent. In the utterance "Find flights to New York," New York is a slot that your application will use to constrain the search to New York flights.

You must program your application to accept a reasonably wide variety of word/phrasing variations for the same intent. For example, a user may say "Yes," "Sure," or "OK" to confirm an action. You must also provide a response if your application does not understand the user's utterance. For example, your application may respond to an unknown utterance: "I'm sorry, I do not understand what you said. Can you please repeat or phrase your question another way?"

Just as a voice application may provide several or many different intents, your application will likely respond to different utterances, in different contexts, throughout the flow of your application.

Candidates for voice-based user assistance

Consider the possibilities of voice-based user assistance for the following products and services:

  • Products/services that are not computer-based, such as mechanical devices or machinery.  
  • Products/services in which voice interaction is natural, such as smartphone apps.
  • Keyboard-based products or services in which voice assistance might be helpful. For example, voice-based user assistance for desktop software applications. Here, voice-based assistance might be less likely to interrupt a user's workflow than conventional keyboard-based help.
  • Tasks done outdoors, or in environments that aren't conducive to getting help by typing on a computer, table, or smartphone. For example, when stuck on a roadside with a flat automobile tire, having a voice application guide you through changing the tire could be very useful.
  • Tasks that require the use of both hands. Imagine a surgeon, airline mechanic, or thousands of other roles or tasks that don't allow the user to easily type on a keyboard.

Design considerations for voice

Similar to a conversation between humans, a voice application consists of requests by the user and responses by the application. And just like a normal conversation, the interaction between user and voice application can go back and forth, with a natural progression of the interaction.

Voice applications present new user interface challenges compared to the graphical user interfaces of conventional applications and help systems. When designing voice applications, especially complex question-and-response applications that will provide robust user assistance, developers must consider user interface issues:

Flow of control

Especially if you are creating a complex, multi-turn voice application with many options and paths, you will want to flowchart the application flow. Map the entire possible user journey through the application. For complex voice assistance applications, you may want to divide your flowchart into specific modules or areas of the application.

Context and navigation

Your application must maintain the user’s context as he/she navigates through the application. How will your users:

  • Invoke the application?
  • Go to the first step?
  • Go to the next step?
  • Go back a step?
  • Repeat a step?
  • Start over?

Understanding your user

Users can express the same intention in many different ways, such as, "Help me print," "Help me with printing," or "Tell me how to print." Your application will need to handle the utterances your user will use to engage your application.

Confirmation

In your voice application, you may want to confirm some or all of the user's utterances. For example, your application may confirm by saying "Do you want help with printing?" However, excessive confirmation, particularly for utterances like "yes" and "no,” can be tedious for the user.

Error handling

You will need to handle circumstances in which your application does not understand the user's utterance.

Prototyping and user testing

Because your voice application must understand the user, you will want to prototype your application with real users. This will help you to prove or disprove your assumptions about the user, and improve your voice interaction model. You may discover that your users do not interact with your application as expected, particularly with regard to the words and phrases they use to express their intents.

Conclusion

Voice assistants provide the capability of natural interaction with computers through voice dialogues. As voice assistants and voice-enabled devices become ubiquitous, they provide exciting opportunities for delivering user assistance through voice. We can begin to design voice-based user assistance today.

References