Speech Recognition – Page 3

FJ Cruiser Car Computer Project

16 April 2006 by Terry Gold

I decided that at the beginning of the year that I would try to surround myself with Speech Recognition products. I heard Bill Joy speak years ago and he talked about trying to understand future technology by figuring out ways to live with it today. I’m not sure he said these words, but I think of it as "prototyping the future."

I’ve decided it is time to build a car computer. This is a personal project even though there might be some interesting business and technology lessons to be learned. (In other words, I’m paying for this out of my pocket and I don’t have to be limited by an ROI or a business case.)

Since I bought my last car ten years ago, I decided that a new car with a factory navigation system wasn’t going to give me everything I wanted and it would certainly be obsolete well before my ten year trade-in target. I’ve played with the navigation system in an Acura TL, and it’s pretty good, especially the speech recognition, but in a couple of years it is going to as obsolete as my first Palm Pilot.

I looked at a lot of cars and settled on the new Toyota FJ Cruiser.

I’ve owned three jeeps, but I always wanted one of the old Toyota Land Cruisers. Just this month Toyota released the FJ Cruiser and I was fortunate (with some help) to get the first one to arrive at Longmont Toyota. I’ll write about my experience with the dealership another time – it’s a good sales and service story and I was very happy with the experience.

This is not a fancy "leather and burled walnut" kind of vehicle. It is a lot more complicated than my old jeeps, but the dash is metal, plastic and vinyl, so installation of the car computer will (I hope) be a lot easier.

In phase one of my project, I’m installing an aftermarket system from Pioneer, the AVIC-Z1. It’s also brand new and I’m already experiencing the joy of trying to put a complicated audio/navigation system that no one has ever seen, into a vehicle that no one has ever seen. The Z1 is on order at ~~Ultimate Electronics~~ in Boulder who is going to do the installation for me, because while I’ve installed a few stereos before and I can solder fairly well, I don’t have the confidence to be cutting wires under the dash of my new FJ. (Yet – that will happen in Phase Two) Update – Ultimate couldn’t deliver so after waiting for weeks, I cancelled the order and now I’m scheduled to get it installed at Extreme Automotive.

The Z1 has speech recognition built in, gps navigation ("Go to Denver"), Bluetooth phone integration and a 30GB hard drive for ripping MP3s from CDs. When I get it installed I’ll do a full review, particularly about how well the speech recognition works.

Phase two of the project will be to add a real car computer. The after market car computer market is just getting started but it is an exciting place – it reminds me of the early days of home computers where hackers would get together and figure out out to build their own computers. If you want to follow along, there is a great community of developers at mp3car.com and I’ve found some good books that are (for the moment) very current. Car PC Hacks – Tips & Tools for Geeking Your Ride is my favorite. It’s an O’Reilly book written by Damien Stolarz with a lot of help from other enthusiasts. It has the most references to websites and sources for parts and I’ve already dog-eared many pages. I also bought Geek My Ride – Build the Ultimate Tech Rod which has a lot of good photos and finally Build Your Own Car PC, a book from the UK which also has a lot of photos and specifics about hardware.

As car computers start to go mainstream, plenty of people are going to say "why would I want a car computer?" Just remember, plenty of people said "why would I want a home computer" and "why would I want a cell phone" and those two markets have done just fine. I believe that speech recognition will play an important roll in making it happen, which is why I’m going to devote a few weekends to understanding where it is going.

gotspeech.net

14 April 2006 by Terry Gold

There is a new Speech Recognition blog on the web – gotspeech.net. As they say in their introduction:

This site has been designed as an information resource where speech developers can gather to exchange ideas, source code and just hang out with other speech developers. You will find informative blogs, current news and articles written by some of the best known names in the industry. So, go ahead and sign up. Check out the forums. Read some of the articles and blogs. Come back often and above all participate in our community.

While it has been created by some Microsoft Speech Server developers, but I’ll bet over time that it will be a place where anyone interested in speech recognition can go to learn about building applications.

My first Video Blog

24 March 2006 by Terry Gold

My first video blog. I’m still working out the technical details, so I may have to republish this a few times.

speech conference

31 January 2006 by Terry Gold

I’m at speech tek west this week. people often ask me, what is the best dictation software? It depends, I answer. If you just want to play around with it, and you’re running Windows XP, you already have a speech recognition engine installed on your computer. simply go to the control panel and turn it on. according to most reviews, though, if you want to do serious speech recogniti, You should look at Dragon NaturallySpeaking 8.

by now you may have guessed that I’m doing this post using just speech recognition. I am using nuance is Dragon NaturallySpeaking 8. To be fair, I spent about 10 minutes training it, and I’m also watching Groundhog Day on the TV. I’m really amazed it’s working at all. I haven’t opened the manual, and I spent very little time training the speech engine.

I really think that with a little practice, I could use this instead of a keyboard.I think this has great potential and I’m going to keep working with it. Good night!

Terry

Assistant 1.0

25 January 2006 by Terry Gold

It still feels funny to say I have an assistant. Most entrepreneurs probably wait too long to find someone to help them. I waited almost 10 years, which was about 8 years too long.

Suzanne joined Gold Systems as my first assistant and trained me to be more effective, and she took on jobs that I didn’t even know needed to be done. We became a better company, and my life was easier. Then she got married, moved to Alaska and I was fortunate (thanks Judy!) to have a chance to hire Angela. Angela is doing a fantastic job and like Suzanne, when I adopt new technology, she has to adapt to it. Angela has been thinking about how my new Treo 700w is changing her job, and she wishes for a new product to go along with it. Here is Angela’s first "guest blog."

Terry
—————————————–

If you normally read Terry’s blog, then you know that he recently retired both his Nokia 3650 cell phone and Axim X30 Pocket PC and upgraded to the new Palm 700w. My life as his assistant has not been the same since!

This morning I received three calls from Terry first thing in the morning, each time he told me something new, but then claimed there was something else that he forgot. Within a few seconds, the phone was ringing again with the forgotten thought. Secretly, I knew he was driving in his car playing with his new phone and saying “Dial Angela Watson Office” using the voice recognition feature because it’s just cool!

Its funny how gadget driven this man I work for is, but I’ve got to admit, that thing really is cool. Terry has been teasing me with the idea of getting another one for my own use. (Mind you, I would opt for a more stylish carrying case other than the leather belt clip, maybe something pink.)

As much fun as having a 700w of my very own would be, it misses the most valuable tool that all portable devices miss in order to aid the professional executive assistant. The capability to pull up your executive’s schedule!

50% of my day is spent combing over Terry’s calendar, contacts and tasks and making sure I’ve allowed enough time for him to eat and sleep. Assistants struggle with the problem of not having their executive’s calendar while working remote, and we either carry a tickler file with a print-out of their schedule, or we bribe the company IT wizard to come to our home to set up the VPN.

I luckily do have the ability to work in full force at home. (Jerry likes home cooked meals.) But until Microsoft comes out with Windows Mobile – Assistant Version 1.0, the best I can do for a mobile devise is still the good ol’ laptop and separate cell phone.

Respectfully Submitted by:

Angela Watson

Senior Executive Assistant to Terry Gold

Gold Systems

——————-

Terry here – Jerry, the IT wizard that Angela bribed in the story above, has just discovered that Angela can in fact get to my schedule or anyone else’s that she has access to. We run Exchange Server here with OWA (Outlook Web Access) enabled. That means that we can use any web browser, including the web browser on the 700w or my old Axim, to access the Exchange Server. Since Angela has permission to access my calender, she can use OWA to check my schedule. It’s not as nice as a client on her PocketPC, but it is a step in the right direction.

This is the last post (for awhile) about my new phone. I expect I’ll be reviewing some new speech recognition products in the next few weeks as I continue to try to surround myself with the technology.

Voice Command Cheat Sheet

21 January 2006 by Terry Gold

I’ve lived with my Treo 700w phone for almost two weeks now, and despite having to do a hard reset yesterday, I’m loving it. I’ve quit carrying my Pocket PC, and even gave it away a few days ago, so you know I’m serious about this new device.

Right now I’m in an airport using my phone as a high-speed modem. Despite what Verizon says, it can be done, you just need a little piece of software called pdaNet. I’m connected with a USB cable, but I expect the software will evolve to do the trick over Bluetooth.

Last week I mentioned that a lot of speech recognition applications suffer from a lack of documentation and "cheat sheets". Piyush Dogra from Microsoft forwarded this cheat sheet to me the very next day. Eric Badger, one of the developers of the product created it and has given me permission to share it. Thank you Eric! This is one of coolest pieces of software I’ve seen in a long time. As you point out "Knowing what to say makes all the difference when using Voice Command." As of today I have 829 contacts in Outlook, and Voice Command never misses when I say a person’s name.

I also find myself saying "What’s my next appointment" because it is just easier than opening the schedule and scrolling around the screen. Speech recognition really shines when you have deep trees of information that you need to directly access. It’s a long story, but even though my phone came with Voice Command, I ended up buying a copy at the local computer store. The retail product does come with very good documentation that should get you going. If you have Voice Command or a Windows Mobile phone, you’ve got to give it a chance. Learn a few commands and you will wonder how you got by without it.

Here’s Eric’s Cheat Sheet – Enjoy!

Terry

Voice Command Cheat Sheet for Treo 700w

Knowing what you can say makes all the difference when using Voice Command.

===== CALLING A CONTACT =====

Commands:

Call <contact>
Call <contact> at home
Call <contact> at work
Call <contact> on mobile
Call <contact> on cell
Call <contact> on cellular
Call <contact> at home two
Call <contact> at work two
Call <contact> at car
Call <contact> on radio
Call <contact> on pager
Call <contact> at assistant

To confirm that you want to make the call after Voice Command responds:

You can say "Yes" or "Correct" to call.
You can say "No" or "Incorrect" to try again.

If Voice Command asks you which location, you can:

Repeat one of the locations that Voice Command offers to call.
Say "No" to try again.

Related commands:

You can say "Call back" to call back the last call that you received.
You can say "Redial" to call back the last call that you made.

Examples:

Call Karen Archer on cell
Call Frank Miller
Call City Light and Power
Call Barbara Sparrow Home

Notes:

Voice Command indexes by the Contact’s first and last name if it exists. If you have a nickname entered, you can use that

too. Voice Command will only let you call by company name if there is no first or last name.

You must prefix contact calling with the "call" keyword. If you use "dial", it won’t work!

===== DIALING A NUMBER ======

Commands:

Dial <7-digit number>
Dial <10-digit number>
Dial <1+10-digits>
Dial <N-1-1>

Examples:

Dial 555-0200
Dial 800-555-1212
Dial 1-800-555-1212
Dial 411

You must prefix digit dialing with the "dial" keyword. If you use "call", it won’t work!

===== CHECKING CALENDAR =====

Commands:

What are my appointments today?
What are my appointments tomorrow?
What’s my next appointment?

===== START MENU =====

Commands:

Start <program>

Example:

Start Solitare
Start Messaging
Start Internet Explorer
Start Pictures and Video

Notes:

Voice Command will index any file that is in or inside of \windows\program files
You have to say the file name exactly as it is written. It may be helpful to rename shortcuts.
Also, you can put links to web pages here and go straight to a saved web page this way.

===== MEDIA =====

Commands:

Play music
Play media
Play artist
Play album
Play genre

Play <artist name>
Play <album name>
Play <genre name>
Play <everything>

Play
Pause
Stop
Next
Previous (track)
Shuffle on
Shuffle off
What song is this?
What track is this?

Examples:

Play The Beatles
Play The White Album
Play Rock
Play Everything

Notes:

You cannot play individual tracks using voice
Voice Command will index the media based on the metadata. You can use a metadata editor to groom the fields.

Speech Recognition and the trough of disillusionment

12 January 2006 by Terry Gold

In 1995 Gartner came up with what they call The Hype Cycle to explain how new technologies get hyped, fall out of favor with the press, and then ultimately (sometimes) go on to be mainstream. One phase is the Trough of Disillusionment, and I believe that Speech Recognition may be in the trough now. All great technologies must go through it. Even as the technology continues to improve and some amazing things are happening, it seems to me that some people are getting tired of hearing how great it is going to be and they just want it to understand everything they say with little tolerance for errors.

There are two issues that have little to do with the science of speech recognition. The first is Human Factors. (I capitalize it because I believe it is so important.) No one would disagree with me that Human Factors is important, but we still see applications being built that seem to go out of their way to make life difficult for the user. That’s another soap box for another time – I’ll just say that it is very hard to make something very simple, but it is worth the effort.

The other issue is documentation, or at least expectation setting. If you encounter speech recognition on the telephone, there is almost never documentation in hand for what the system can understand, and since we’re years away from a system that can understand everything a person might say (hey, people can’t even do it!) you have to guess at what you might be able to say, or you have to wait for the system to prompt you.

Lately I’ve been trying to surround myself with speech recognition, just to live with it and understand what works and what doesn’t work. I have "Wise Crackin’ Shrek and Donkey" and all sorts of gadgets that do speech recognition. My latest is the new Palm 700W, which is a Palm Trio phone that runs Windows Mobile. Sort of like an Intel based Apple – they both came out this month, causing many people to wonder if in fact Hell has frozen over. My very first Palm was made by U.S. Robotics and until switching to the Pocket PC a few years ago I always liked the Palms, so I was happy to have the best of both worlds when the Trio came out last week.

I quickly loaded a cool little application called Microsoft Voice Command. (I think it comes with it – not sure)

It’s been around for awhile and runs on Windows Mobile and Pocket PC Phone Edition. You push a button on the phone and then speak to it. You can say "Call Terry Gold at work", or anyone else that is in your contacts. No training required. I tried 20 different names and it got every one right except for "Dan DeGolier". I have over 900 contacts, so there were a lot to differentiate. Now, I just looked in my contacts and I had Dan in as "Daniel B. DeGolier". When I changed it to "Dan DeGolier" and let it automatically sync, it immediately got it right. (Sorry Dan, the text-to-speech still makes a mess of your last name.)

It isn’t just for speed dialing though. You can say things like "What are my appointments", "What calls have I missed", and even "Start Program", where Program is any program you have loaded on your phone. I’m going to see if I can do most of the command and control using just speech recognition. This is what Bill Joy once called "prototyping the future." You figure out some way to live with the technology of the future, and that lets you think even farther ahead.

But back to the second challenge of great speech recognition. The one thing it couldn’t recognize was me saying "Display Terry Gold". According to the website, it is supposed to bring up my contact. In fact it wouldn’t work on any other name. Determined to make it work, I kept at it. No matter how carefully I spoke it, up would pop Media Player and Bill Monroe would start to sing "Long Black Veil". Bill Monroe is the Father of Bluegrass music and I’m an amateur Bluegrass mandolin player, so I took it as a great compliment that Voice Command was getting us confused. After all, we did grow up only 37 miles and 50 years apart.

Since Voice Command had worked so well up to this point, I didn’t give up. After figuring out that I could say "Help" and then "Contacts", I realized that the software was actually looking for me to say "Show Terry Gold", not "Display Terry Gold."

My guess is that the developers realized that "Display" and "Play" were too similar late in the product life cycle, especially for guys like me who pronounce "Display" as two words – "Dis" "Play". "Terry Gold" sounds enough like "Bill Monroe" that I can see that mistake. The documentation on the web didn’t get changed, and now some people are having a bad experience through no fault of the technology. The product is so great though, that hopefully this won’t turn anyone off. I’ll see if I can get to them to point out the typo.

Simply having a cheat sheet is a great help with speech recognition devices. That’s how I found this mistake – I was making my own little cheat sheet. It is easy enough to just ask for help, but I wanted something on paper that I could have on my desk until I figured out the common commands that I would be using.

I have another application that can recognize hundreds of commands, and it does a great job, but the documentation listed all of the commands in alphabetical order. Again I made my own cheat sheet of the ten commands that I cared about, and now I can’t imagine not using the product. I’ll bet most people tried it, didn’t know exactly what to say, and gave up on it. I’ll write about that one another day.

When I first learned the vi editor, someone gave me a dog-eared card of the most common commands. It made all the difference in the world and I was soon raving about how superior vi was to any other editor in the world, especially Emacs. All because of that card. Until speech recognition advances to the point where we really can just say anything, let’s see more cheat sheets, more obvious commands and help prompts that don’t make the user feel like an idiot.

Speech Recognition in the mountains of Colorado

27 April 2005 by Terry Gold

Mike Castillo, an engineer at Gold Systems wrote the following email, and with his permission I’m publishing it below. It is entertaining, it illustrates that people on the street are starting to see the potential in speech recognition, and it shows what can happen when everyone in a company understands that they are in sales.

The only thing I’ve edited out is the name of the county where this happened and the name of the airline mentioned. We did not build the application mentioned, but I do fly that airline regularly and I don’t want them canceling my frequent flier account. I’m also publishing this because I’m toying with the idea of a Gold Systems blog that is totally focused on our industry so I want to encourage Mike and others to step up and contribute so I don’t have to do all of the writing myself.

Mike’s Email:

I just participated in my all too frequent Jury duty (for some off reason, I get called very frequently; something about <this> county is weird). So what does this have to do with Gold? Well, I made several potential sales contacts without really trying, and it was kind of entertaining for me. Here’s how it got started. D.A. is questioning jurors to try and decide final pool. D.A. gets to me and says, "you’re the computer guy" (we had to fill out a questionnaire, and I had listed my masters degree in computer science.). He asks for more details about what I do. "Well, I write Speech Recognition applications" was my short summary answer. "Is that speech recognition stuff ready for prime time yet?" "I think so. Speech recognition vendors are touting 95% accuracy rates". "But isn’t it expensive?" "Well actually Microsoft jumped into the market and lowered the prices. Other vendors seem to also be willing to give more in price in order to preserve their market share." "Hmm… we (<The> County) may need to look into this". "Please give me a call and let’s talk. I think we might be able to help you." …. (moves on to next juror). At lunch I give him my contact information. Nothing will probably come from it (especially since we – the jury – voted Not Guilty, so the D.A. wasn’t real happy at the end of the trial), but you never know.

But, that’s not all. At lunch, I went over the county recorders office to take care of some personal business and a woman comes up to me and says, "Are you the speech rec guy". So she is building homes for disabled people and wants to voice automate them. Turns out she was in the original "large pool" of potential jurors and got "voted out". I explain that Gold System doesn’t really do that type of application, but she pushes and really wants help and really decides she wants my number anyways, and I said I might be able to do something on the side (since I have played around with some of the PC based speaker recognition, and actually have played around with home automation stuff.).

Then, during our jury breaks, all the other jurors were all over me with every possible question (e.g. "my I-Mac says it has speaker recognition, should I try and use it?" "I called the <Big> Airlines application once and I cussed it out because it was so bad, and that thing hung up on me. Did your company write that application?") about speech recognition. I’ve never had so much attention in my life. So I educated 12 other people for a good 15 minutes in speech recognition and basically its ROI ‘story’. Who knows, I may have hit another potential customer in all my lectures.

Yes, I do love brownies, but, no, this isn’t a brownie point kind of letter. Just think of it as my very first blog posting. Hopefully it’s a little interesting.

Mike

VisiCalc creator comments on speech recognition

13 April 2005 by Terry Gold

Brad Feld sent me a link to a post by Bob Frankston, the co-creator of VisiCalc, about his experiences with IVR and speech recognition. I had already discovered Dan Bricklin’s blog, the other co-creator of VisiCalc, but to see one of my software hero’s commenting on my industry’s technology was really cool. To find out that I knew someone who knew them was even better. VisiCalc, followed closely by Flight Simulator and BASIC, opened my eyes to the potential of small computers. I completely changed the path I was on when I discovered software.

In Bob’s post (can I call him Bob, or should it be Mr. Frankston?), he talks about how speech recognition systems are better than touch-tone applications and then goes on to speculate how future applications could be even more useful if they adapted to the user and even taught them how to be a better user in the future. (Read the post, he says it better than I can summarize)

He’s right, and it is fun to think about how computers could get better about understanding what we want. We’ve actually got a lot of experience with this, but unfortunately it is all happening slower than I’d like to see. I was talking about this with a Human Factors Expert here at Gold Systems – that’s not her title, I just like how it sounds – and she sent me the following comments on the subject.

Here are Paula’s comments:

What he seems to be complaining about is that the system does not adapt to what he wants to do. He’s arguing that people can and do "learn paths" .

One method for providing "paths" is by setting up shortcuts. The experienced caller can complete tasks in fewer steps by linking commands together. In fact, the shortcuts we built into the (Large Insurance Company in the Midwest) application are used relatively infrequently.

Another method, which is what I think he has in mind, is to tailor the interaction for each use once he/she has been identified. So, once he logs into the system, the menus are organized according to his usage pattern. This is certainly doable but very expensive to code and test. Most of our customers do not see the value (ROI) for undertaking such a complicated design because, in reality, he will not spend more money with United Airlines if his IVR interaction was better. (Is that true? I’m not convinced. – Terry)

His 4th paragraph is really pointing at the issue that I harp-on which is most of the designs I strive for are ones that GET OUT OF THE CALLERS WAY. He says this a little differently by saying that he doesn’t want it to be friendlier or salesy. He just wants to make his reservation and get off the phone. The IVR shouldn’t be memorable or flashy- it’s a tool and only a tool. You respect it as such.

With respect to the 6th paragraph, he’s really talking about getting the design right. He’s pointing out that his ISV (software developer) failed to understand that there is information people need that they are not providing (800 number). I hit this frustration a lot- many times we let customers tell us what they want and we fix-bid contracts to include ONLY these features. When we have time built into schedules to work with the customer to define the application, we can help identify the information that callers are seeking and ensure a better application.

My thoughts-
Paula

Password Reset and security

18 March 2005 by Terry Gold

The market is really starting to notice one of our newer products that makes it easier, cheaper and more secure for people to reset lost passwords using speech recognition. Rich Schneier comments on the concept here and then Jason Groshart, a Gold Systems engineer wrote a great post about how it works and the future of the product. He talks about our version for the Microsoft Speech Server, but we’ve also built this application on the Avaya IR IVR. One of our customers, A Large Midwestern Insurance Company, saves the equivalent of twenty help desk agents with this application. Who would have thought that there were that many lost passwords in the world?

Speech Recognition – who gets it?

2 February 2005 by Terry Gold

I’m working on a white paper about speech recognition and I’d like to try out a few ideas. The tagline for this blog is Entrepreneurship, Speech Recognition, VoIP, Wireless and . . . and so far I’ve been pretty much stuck on entrepreneurship.

Actually when Jim and I started Gold Systems, speech recognition was something that you only saw working on an episode of Star Trek. We build telephone self service applications for large enterprises and most of them are speech recognition enabled now. Our very first paying customer in 1991 was a bank in Fargo North Dakota and we built an automated banking system for them. Yes – one of those touch-tone systems that you use when you can’t talk to a real person. Or at least that is how a lot of people view telephone self service, whether it is touch-tone or one of the newer ones that use speech recognition (we did our first speech rec application in 1995). And yet I remember this bank being pleasantly surprised that customer satisfaction actually rose after the system went into service. In fact the job satisfaction for the real people answering the phone also rose.

How can that be? This was a bank whose President had declared that they would never have voice mail. First of all this was a small regional bank when we started working with them. They didn’t have a 24 hour follow-the-sun call center. When the agents went home, the callers got a message telling them to call again tomorrow. It was a big improvement just to allow customers to get their account information 24 hours a day. But a strange thing happened – people who had been talking to the agents for years started using the automated system even during business hours. For one thing they never had to wait for the automated system, but also a lot of people apparently didn’t want to discuss their finances with a real person.

If you aren’t in the business you may not realize that an awful lot of people manage their finances by calling to check their account balance every day. You might also be surprised at how many people still use the phone as their primary communications device. If you are reading this – you are not like most people – and you may wonder why they don’t just check their balance on the web. Back then the web didn’t exist, but still MOST people – not necessarily those that read and write blogs – but most people still find their telephone to be a great way to get information. For the people who really did need to talk to a real human being, they weren’t stuck in queue behind a bunch of people who were just checking their checking account balance for the third time in one day.

Everybody loves to hate touch-tone systems that make it difficult to get the information they need while making it impossible to speak to a real person. Guess what – I hate them too, because they don’t have to be that way. Today most of our business is in developing speech recognition applications, but some companies are still deploying touch-tone applications. What really maters is whether the application is designed so that normal people, who are eating a burrito, driving their car and they

JUST WANT TO GET THEIR BANK BALANCE – whether they can get it easily and quickly and get back to eating that burrito. I was actually driving back from a meeting today, eating a burrito and even I wouldn’t try to log onto the web with my PocketPC while driving and eating. (One or the other, but not both)

From the very beginning Jim and I stressed to the developers that a great application, one that people will love to use, is first and foremost designed to get the job done and get out of the way. Developers, and I was one, love to think up new features. Do you know why some bad touch-tone applications have ten options on the main menu? Because there aren’t more buttons on the telephone. Some developers would put twenty options on the menu if they could get away with it. The first key to a great application whether it is touch-tone, speech recognition or web is to do a great job on the human factors design. It may be massively complex behind the interface but the part of the application that the user deals with must be simple, natural and easy to use. And if the caller DOES want to talk to a real human, LET THEM! Who doesn’t know how to ultimately get to a human – the only question is how mad the caller is going to be by the time they get there, so our philosophy is to make the caller want to use the self service option, but let them opt out if that is what they want.

With the coming-of-age of speech recognition we’re being handed a double edge sword. Now the human factors work that we’ve always done with touch-tone is even more important. One of the keys to getting good recognition performance is to ask questions that generate consistent responses. If you confuse the caller with a complex question – and remember they are driving or whatever and not always paying close attention – they will answer something like “ah, uh oh let’s see, I think uh, yes my account number is uh CLICK.” It is going to be a long time before speech recognition engines can get much out of that sort of response.

I’ll close this post with this thought. If it were easy there wouldn’t be so many bad systems in the world, so please don’t try this at home just because it sounds fun to make a computer talk and you have a few spare developers walking around. Leave it to the professionals and put your efforts into filling all the orders that your happy customers will give you when they discover that you’ve made it easier than ever to do business with your company over the phone.