Thursday, August 13, 2009

What does Google Voice have to do with Customer Experience Intelligence?

Transcript of message (feel free to listen to the message above by clicking the "play" triangle)
hello mister panic managing my name is scott B Q and i'm calling from think london and thinklondon is official economic development agencies funded unsupported by the mayor's office of london england we provide free confidential assistance to cos planning a considering a physical presence in london i'm calling to ask all clear bridges firm plans for your extension what you're considering opening office or facility in london also calling to let you know that if you have plans for a physical presence in europe within the next 2 to 3 years this is upcoming opportunity to meet directly with the special deli kitchen that will be in washington D C from september 10th the 11th of 2009 mister locations compostible by system a refunded advisorsto this is london 2012 summer olympic games and executives from think ones in we're currently scheduling individual meetings with this delegate chin for cos of the defined interest in the physical presence in europe with the next 2 to 3 acres once more my name is scott Q and i'm calling from that sink one's an exam with the bowman agency you may contact me at(703) 770-8052 again scott Q think london (703) 770-8052 and also be sending you a followup email i thank you for your time and i hope you have a wonderful weekend

I got a Google Voice account shortly after the service opened up to new subscribers.

The service allows a user to establish a phone number that can follow you and ring your work, cell, home office, etc., according to rules as simple or as complex as you like, and the service lets you pick up, transfer, conference, or send a call to voicemail (even letting you "listen" to a call as a message is being recorded and cut in if you like, just like you used to do with the old fashioned tape answering machines).

The most useful feature, in theory, is the free transcription service - once a message is recorded, Google does a speech-to-text transcription and forwards the message to your gmail account (or Google Voice phone client) for perusing so you can read it without listening.

Here's the connection to customer experience intelligence - vendors like Clarabridge ingest customer feedback (from call centers, surveys, web sites, blogs, social media, etc) and the nirvana of customer experience applications for a while has been being able to quickly, seamlessly ingest voice recordings as a data source). Most customer experience vendors prefer to operate on feedback that is already in text form, for a number of reasons:
- text is the predominant media (in surveys, web sites, emails, and call center notes)
- voice files are notoriously tricky to transcribe with accuracy due to noise, sound quality, accents, speaking styles, etc.
- to truly assess sentiment, category of feedback, and to perform real meaningful analytics on text-sourced voice of the customer information, you need to apply Natural Language Processing (NLP) somewhere (either in the transcription or in the analysis of text) to accurately determine the words in the recording, to assess the meaning (part of speech, use case) of the words, to categorize conversation topics accurately, and to accurately map customer sentiment to the specific feedback contained in a conversation.

Applying NLP to mis-transcribed voice calls can produce humourous, and incorrect results, and so far I believe that no speech-to-text vendors have yet integrated NLP into the speech processing stage (being able to correctly determine what a word is, and what part of speech a word is, for instance).

I was excited to try Google Voice to see if the world's biggest name in search had cracked the code on good transcription. If they'd done it, then perhaps we'd all be closer to being able to talk to our computers and analyze the spoken word with robust text mining technology.

My conclusion for now - the technology is STILL not ready for prime time.

Some findings:
  • Even though Google knows my name (it's in my username, in my address book, etc) it transcribes voice messages with every possible name EXCEPT mine.
"Hello Steve"
"Hello slid" (a personal favorite).
"Hello said..."
"Hello Mister Panic" (another personal favorite)

Luckily, Google Voice didn't deem it necessary to call me by my college nickname
"Hello squid" (though I wouldn't have minded it as much as I did back then...)

Google needs to match up its transcription to a list of all my names in my address book, and to my user name.

  • It is DEFINITELY affected by cellphone quality (lower). Calls from my home phone (a high quality VOIP line) transcribe with much more accuracy than calls from my cell (which sound fuzzier, contain road noise).
  • The service does not do a good job finding beginning and end of sentences. Most messages come across as a big run-on sentence. If NLP were applied to the fragment - it should be able to estimate ends of sentences, clauses, and connected words/concepts, but lacking basic punctuation the job of processing ideas is harder to do right now particularly with longer messages. You can see the message transcription and listen to the source at the top of this post.

At Clarabridge we HAVE actually run text transcriptions through our customer experience solution. And it works. But accuracy of categorization, and therefore precision of analysis (how well does a category or sentiment get mapped to customer calls) and recall (how many messages are recalled when analyzing a specific type of customer feedback) can suffer.

Sometimes "good" is good enough, and your mileage will vary when you try to directly connect voice calls into text analytics solutions.

By comparison, poorly formed sentences, misspellings associated with type-written feedback work fare better than auto-transcriptions of voice calls - because the NLP algorithm in the Clarabridge engine can very accurately and consistently decipher the intent even with typewritten errors. Speech transcriptions just are more erroneous, at least today, and harder to accurately decipher.

We'll keep hoping for improvements in the space -- if Google isn't there yet, likely most other vendors aren't either.

In the meantime, if you want to ingest speech content, we generally recommend using a speech to text vendor who also runs a sample of transcribed recordings through a human assisted "error correction" stage - it costs more, but it raises the accuracy of the text to the 80+% range, and at that level of accuracy it can sail through text analytics solutions and produce very high quality customer experience insights.