Transcript of message (feel free to listen to the message above by clicking the "play" triangle)
I got a Google Voice account shortly after the service opened up to new subscribers.
The service allows a user to establish a phone number that can follow you and ring your work, cell, home office, etc., according to rules as simple or as complex as you like, and the service lets you pick up, transfer, conference, or send a call to voicemail (even letting you "listen" to a call as a message is being recorded and cut in if you like, just like you used to do with the old fashioned tape answering machines).
The most useful feature, in theory, is the free transcription service - once a message is recorded, Google does a speech-to-text transcription and forwards the message to your gmail account (or Google Voice phone client) for perusing so you can read it without listening.
Here's the connection to customer experience intelligence - vendors like Clarabridge ingest customer feedback (from call centers, surveys, web sites, blogs, social media, etc) and the nirvana of customer experience applications for a while has been being able to quickly, seamlessly ingest voice recordings as a data source). Most customer experience vendors prefer to operate on feedback that is already in text form, for a number of reasons:
- text is the predominant media (in surveys, web sites, emails, and call center notes)
- voice files are notoriously tricky to transcribe with accuracy due to noise, sound quality, accents, speaking styles, etc.
- to truly assess sentiment, category of feedback, and to perform real meaningful analytics on text-sourced voice of the customer information, you need to apply Natural Language Processing (NLP) somewhere (either in the transcription or in the analysis of text) to accurately determine the words in the recording, to assess the meaning (part of speech, use case) of the words, to categorize conversation topics accurately, and to accurately map customer sentiment to the specific feedback contained in a conversation.
Applying NLP to mis-transcribed voice calls can produce humourous, and incorrect results, and so far I believe that no speech-to-text vendors have yet integrated NLP into the speech processing stage (being able to correctly determine what a word is, and what part of speech a word is, for instance).
I was excited to try Google Voice to see if the world's biggest name in search had cracked the code on good transcription. If they'd done it, then perhaps we'd all be closer to being able to talk to our computers and analyze the spoken word with robust text mining technology.
My conclusion for now - the technology is STILL not ready for prime time.
- Even though Google knows my name (it's in my username, in my address book, etc) it transcribes voice messages with every possible name EXCEPT mine.
"Hello slid" (a personal favorite).
"Hello Mister Panic" (another personal favorite)
Luckily, Google Voice didn't deem it necessary to call me by my college nickname
"Hello squid" (though I wouldn't have minded it as much as I did back then...)
Google needs to match up its transcription to a list of all my names in my address book, and to my user name.
- It is DEFINITELY affected by cellphone quality (lower). Calls from my home phone (a high quality VOIP line) transcribe with much more accuracy than calls from my cell (which sound fuzzier, contain road noise).
- The service does not do a good job finding beginning and end of sentences. Most messages come across as a big run-on sentence. If NLP were applied to the fragment - it should be able to estimate ends of sentences, clauses, and connected words/concepts, but lacking basic punctuation the job of processing ideas is harder to do right now particularly with longer messages. You can see the message transcription and listen to the source at the top of this post.
At Clarabridge we HAVE actually run text transcriptions through our customer experience solution. And it works. But accuracy of categorization, and therefore precision of analysis (how well does a category or sentiment get mapped to customer calls) and recall (how many messages are recalled when analyzing a specific type of customer feedback) can suffer.
Sometimes "good" is good enough, and your mileage will vary when you try to directly connect voice calls into text analytics solutions.
By comparison, poorly formed sentences, misspellings associated with type-written feedback work fare better than auto-transcriptions of voice calls - because the NLP algorithm in the Clarabridge engine can very accurately and consistently decipher the intent even with typewritten errors. Speech transcriptions just are more erroneous, at least today, and harder to accurately decipher.
We'll keep hoping for improvements in the space -- if Google isn't there yet, likely most other vendors aren't either.
In the meantime, if you want to ingest speech content, we generally recommend using a speech to text vendor who also runs a sample of transcribed recordings through a human assisted "error correction" stage - it costs more, but it raises the accuracy of the text to the 80+% range, and at that level of accuracy it can sail through text analytics solutions and produce very high quality customer experience insights.