Skip to the content.

Watson Voice Agent

This is a “landing page” for Watson Voice Agent. In late 2019, IBM announced a new integrated solution called Watson Assistant for Voice Interaction (WAVI). This solution provides customers with an easy option to modernize their traditional Interactive Voice Response (IVR) system by enabling their end users to speak in natural language in order to get their problems solved. Say good-bye to complex phone trees (“press 1 for new reservations, press 2 for existing reservations”) – and hello to simple actions (“I need help changing my existing reservation for my trip to Hawaii on August 26”).

What is Watson Assistant for Voice Interaction (WAVI)?

Let’s break down the four components behind the covers of this Watson Voice Agent.

Watson Assistant

Create your Watson Assistant skill in a similar way as a chatbot; train intents, entities, and write the dialog flow. The difference with a voice solution is you need to make sure you are tailoring it to an experience that will work for callers. Keep Watson responses short, and train it so it will be able to handle common voice utterances such as “um”, “sure”, “yep”, “nope”, etc.

Speech to Text (STT)

As the name implies, this service transcribes voice into text in order to flow the input into the text-based Watson Assistant. The Base model for Speech-To-Text is very good. You may not need to do any customization for your solution. However, custom training might be needed for other intents (e.g. for Watson to understand you say Gastroenterology or Otolaryngology). Acoustic training can be done to handle accents or background noises.

Text to Speech (TTS)

The Text-To-Speech service takes the text output (response) from Watson Assistant and synthesizes it back into audio to be played to the caller. There are two components here that you should be aware of.

  1. Voices - It’s important that you select a Watson voice that resonates with the end user. To address the common customer complaint about the voice sounding robotic, in 2019 we announced neural voices - a HUGE advancement in our Speech capabilities and a major difference maker for us. To help you select a voice, check out this page with voice samples.

  2. Custom Words - If you notice Watson mispronouncing words, you can easily create custom words (with pronunciation). While many applications will not require custom words, for a different caller scenario you might have to train Watson on specific terms (such as Gastroenterology and Otolaryngology). For more information check out the documentation on customization.

Voice Gateway

IBM’s Voice Gateway is a SIP orchestrator and a very important part of voice-over-IP technology that handles the orchestration between the caller (telephone) and Watson. No steps are required for setup since the WAVI solution handles them for you.

Note: Each of these components (Cloud services) are provisioned over WAVI’s simple user interface and the solution does the integration behind the covers. You do need a SIP Trunk to enable you to dial a phone number in order to connect to the solution. Enterprises use major providers such as Avaya, Cisco, or Mitel - but you can sign up for a Twilio trial account and get one for free.

References for Further Reading