Touchtone and Speech Recognition

By Lizanne Kaiser, Ph.D

Chances are, the majority of people you know have experience using a speech-enabled automated telephone system (e.g. “Please say your account number”). Speech recognition technology is advancing rapidly and today, call centers are looking toward speech to provide a higher level of customer service than traditional touchtone phone systems can offer. The acceptance rate of speech recognition among callers is also growing. In fact, 85% of people say that speech is easier to use than touchtone and 90% feel that speech adds value to phone-based transactions.

Despite that, many contact centers still rely on traditional touchtone for caller interactions, even while implementing or migrating to speech. For some, touchtone inputs are necessary for security or legal reasons, for others it’s a matter of preference. Either way, using both technologies can be beneficial, but the key is knowing how best to marry these two different modes of input into one seamless automated caller experience.

Speech and Touchtone: Different Caller Experiences

Contact centers migrating existing touchtone applications to speech, or creating new self-service applications using speech, often assume that the same general call flow architecture and usability best practices that apply to touchtone can be applied to speech applications. In fact, a caller’s experience in a speech-enabled application versus a touchtone-only application is very different and callers place different expectations on a speech system than they do on a touchtone-only system.

Having a “conversation” with a touchtone-only system is mechanical and generally non-intuitive. The caller can only interact with the system by listening to instructions and providing a mechanical response (using keys on the telephone keypad) within limited menu options.

When a caller interacts with a speech application he or she subconsciously compares it to having a conversation with a real person – even though the caller knows it’s an automated system. Because of this, the caller places a higher expectation on the speech-enabled system, similar to the expectation placed on a live agent.

Callers perceive a higher level of control with speech and even have a pre-existing mental model of how the conversation will unfold, based on prior experiences with agents. If the system’s call flow does not mirror the mental model of that task, callers tend to interrupt the system to ask for what they want. The best speech-enabled systems guide the caller unobtrusively, without sacrificing the caller’s sense of control or conversational naturalness.

Speech allows automated systems to leverage a caller’s natural ability for language and conversation to provide a better caller experience. The intent is not to fool callers into thinking they’re speaking with a real person, but to avoid “deal breakers” – points where the dialog flow becomes so awkward or unnatural that the caller begins focusing more on the system than on the interaction they need to accomplish.

Marrying Speech and Touchtone Happily

When creating a contact center system incorporating both speech and touchtone, call centers need to design for speech first, and then incorporate touchtone into the design as a secondary feature. Wherever possible, the system should support touchtone as an alternative input mode to speech. Avoid randomly switching back and forth between speech-only and touchtone-only, as this can confuse callers. For instance, if the system encourages the caller to say information, the caller should be able to say or touchtone a response, whichever they prefer:

System:      Please tell me your 10-digit home phone number.
Caller:       5554492350
or
System:      Please tell me your 10-digit home phone number.
Caller:       [Caller enters 5554492350 on their telephone keypad]

Touchtone offers an effective fallback for speech as well. Well-designed and fully tuned speech recognition systems can have recognition accuracy rates in the 90^th percentile. Nevertheless, speech recognition systems can have difficulty accurately recognizing what the caller is saying, especially if the caller is in a noisy environment, on a cell phone, using a speaker phone, or has an accent or voice quality that is challenging for the recognition engine. When speech recognition errors occur, the dialog design should offer callers context-specific, hierarchical error messages, with each level providing new or additional prompting that helps guide the caller to use the fallback touchtone function.

System:      Which type of account are you calling about – Checking, Savings, or CD?
Caller:       I’m calling about my checking account. [loud background noise]
System:      Sorry, I didn’t get that. Please say Checking, Savings, or CD. Or if you’re calling about something else, just say Other.
Caller:       Checking! [loud background noise]
System:      I still couldn’t quite catch that. Let’s try the phone keypad instead. For checking, please press one. For savings, press two. For CDs, press three. For all other questions, press four.
Caller:       [Caller enters 1 on the phone keypad]

Avoid wording like “For checking, say or press one” that neither sounds conversationally natural, nor helps constrain the callers’ possible responses effectively. Even though this prompt directs callers to just say “One,” callers will still say a wide range of responses to a speech system, based on whatever seems natural in that conversational context (such as, “I’m calling about my Checking account.”).

For security or legal issues, some companies encourage or require callers to input information via touchtone, such as Social Security Numbers, account numbers, or confirmations. Depending on the security or legal requirements, the system can be designed to guide the caller to use touchtone, but recognize either speech or touchtone as a valid input; or guide the caller to use touchtone and accept this as the only valid input. If touchtone is required, the error handling prompts need to clearly guide a caller who attempts to use speech instead of touchtone.

Meeting Expectations

In order to provide the expected interaction, both speech and touchtone applications must provide value to the caller. With speech, it’s important to anticipate what callers say at any given point in the dialog and build those utterances into the recognition grammar file active at that point in the application. Unlike touchtone where there are limited options, if speech dialog prompts and recognition grammars are poorly designed, the caller may end up saying things that cannot be recognized by the automated system. Remember that deal breakers destroy the natural rhythm of the interaction and callers feel that the system has not lived up to expectations.

With speech recognition, it’s a balancing act between ensuring there are enough different utterances listed in the grammar to maximize in-grammar coverage (that what the caller said is listed in the grammar) and at the same time not overloading any particular grammar with so many possibilities that it significantly compromises in-grammar accuracy (that what the caller said is in the grammar and is correctly matched as such by the recognition engine). With touchtone, the limited options can often remove the uncertainty of in-grammar coverage and maximize in-grammar accuracy, but at the same time, provide a less satisfying caller interaction.

When choosing to merge touchtone and speech applications remember to evaluate your callers, their expectations, and your call center’s requirements – knowing that the goal is to offer callers the best way of getting self-service through an automated phone system. All systems yield the best return on investment when they are designed to address business requirements, match the callers’ needs and expectations, and offer a conversational, helpful, and intuitive interface that represents your organization’s brand.

Lizanne Kaiser, Ph.D., is aSenior Principal Consultant of Voice Services for Genesys Telecommunications Laboratories, Inc.

Benefits of Speech Recognition

Flatten call flows, compared to touchtone
Shorten calls up to 50% vs. touchtone
Increase self-service usage from 20-60%
Decrease hold time by as much as 35%
85% more effective in routing calls vs. 54% with touchtone

[From the February/March 2006 issue of AnswerStat magazine]