Speaking Out in Favor of Voice Software
As a legal secretary, Regina Schneider could once type 105 words per minute. But in 1992 she fell victim to an occupational injury: carpal tunnel syndrome, an injury to the wrists increasingly suffered by typists and other keyboard operators.
“Despite surgery, use of my hands was absolutely minimal and typing was out of the question,” the Novato, Calif., woman said recently. That seemed to rule out secretarial work or any office job requiring the use of a computer.
Until 1995, that is, when she persuaded her workers’ compensation insurer to buy her a $700 version of Kurzweil VOICE, a computer program that can recognize tens of thousands of spoken words and thus enable users to write, calculate and even program computers.
“Technology put me out of work, and now technology is going to put me back to work,” said Schneider, who uses a word processor, database and other computer programs in her full-time studies to become a rehabilitation counselor.
People like Schneider who suffer from keyboard-related injuries represent one of the few specialized groups that has benefited so far from voice-recognition software. Others include doctors and lawyers, who use voice-activated programs to give dictation, and Wall Street brokers, who use them to execute trade orders.
But the dream machine--one that can understand spoken English just like a regular person--is still stuck on the drawing board after decades of dedicated research. Most consumers have had to settle for glitch-prone programs that execute a limited number of rudimentary spoken commands.
Building a computer that responds to the sound of the human voice has been the computer industry’s goal since before popular culture allowed “Star Trek’s” Capt. Kirk and the astronauts of “2001: A Space Odyssey” to speak to their machines. Many in the computer industry see a speech-based interface as the missing link that will convert the last of today’s technophobes into computer users.
Optimists confidently predict that a usable voice program will be on the market by about 2000. But substantial hurdles still exist on both the hardware and software sides.
“ ‘Star Trek’ takes place in the 24th century, so we’ve got a little time yet,” said Roger Matus, director of marketing for Dragon Systems, a Newton, Mass., company that makes one of the leading computer-dictation programs.
Speech-recognition systems work in the reverse of a person turning a thought into a sentence. A human being starts with an idea, chooses the words to express it, strings them together and then creates the sounds necessary to utter them.
A computer, by contrast, starts with a “heard” sentence and subjects it to a series of statistical algorithms that model human speech to determine the string of words that were spoken. Finally it checks the positions of the words in context, to confirm that the sentence makes sense according to the rules of English syntax.
While the approach sounds simple, the task itself can be overwhelming: The number of possible 20-word sentences that draw on a 25,000-word vocabulary is expressed as nine followed by 87 zeros. There are additional complications when accents and homonyms--”to,” “too” and “two”--are added to the mix.
Rather than consider so many possibilities, speech-recognition software makes a series of assumptions and educated guesses to narrow things down. Systems for sale today, promoters say, manage to get it right about 95% of the time.
In large measure, that degree of success is the result of the systems’ narrow functions. Programs that need to recognize only a limited number of words and phrases are comparatively easy to design and require a relatively small amount of computing power.
Indeed, simple applications of voice recognition have already infiltrated most peoples’ daily lives.
Telephone companies are replacing directory assistance operators with electronic versions that can often find a phone number based on a spoken request. (Human operators are supposed to step in if the computer is confounded.) Other operator-assisted functions are being transferred to computers as well.
“AT&T; has a system which handles collect calls automatically using speech recognition,” said William Meisel, editor of the Encino-based Speech Recognition Update newsletter. “The system only has to recognize a handful of words, and it’s saving them $100 million a year and the need for about 75,000 operators.”
With voice-recognition technology, phone customers can place calls by simply picking up the phone and saying, “Call Mom.” That capability could be critical for cellular phone companies, since support is growing in several states for laws requiring drivers to dial by voice, rather than push-button, for safety’s sake.
Dictation systems that turn spoken words into computer text are gaining popularity as functionality goes up and prices come down. Most dictation systems have been aimed at narrow markets where the vocabulary is limited and the sentence structure is relatively easy to predict.
For example, IBM this week will release MedSpeak/Radiology, a system that allows radiologists to dictate their observations of a patient’s X-rays to a Pentium-equipped PC. The doctor’s words flash on a screen about five seconds after they are spoken--significantly faster than standard practice, in which a doctor dictates a report onto tape, which may not be transcribed until as much as 72 hours later.
It’s also cheaper. Massachusetts General Hospital in Boston and Memorial Sloan-Kettering Cancer Center in New York, which helped IBM develop MedSpeak/Radiology, routinely pay $12,000 a year per radiologist for transcription services. IBM’s system will sell for $4,500 and can be shared by many doctors.
Finally, there are the people who suffer from repetitive stress injuries or other disabilities that prevent them from typing.
Clothing retailer L.L. Bean spent thousands of dollars to install Dragon Systems’ DragonDictate software on about 20 computers at its Freeport, Maine, headquarters. The goal was to help employees who suffer from keyboard-related injuries and head off more problems.
“The price isn’t that expensive when you put it up against the cost of a disability,” said Ted Rooney, the company’s manager of employee health management. “If you’ve got good, trained employees, you want to be able to use them, and this gives them a chance.”
But technological hurdles still remain. The most advanced systems on the market today require users to pause between words so the computer has a fighting chance of following along. A truly successful product will be able to understand continuous speech and a wide vocabulary.
For that, researchers must devise more powerful models of human language. When they do, they’ll need computers that can perform computations even faster and store hundreds of thousands of words in their random access memory.
“There are two things working in our favor,” said David Nahamoo, manager of IBM’s human language technologies research division: Every two to three years, the error rate of their algorithms is cut in half and the power of conventional computers doubles.
“With all of these things working together,” he said, “we should be able to bring out the technology to handle any kind of vocabulary in continuous speech in the three- to five-year time frame.”
Karen Kaplan covers technology and careers. She can be reached via e-mail at [email protected]