Recently when my father passed away, he left an extensive collection of hand written material in the form of a journal, letters, and thoughts. While the material contains extensive history of our family, few others will have a desire to read the many hand-written pages. Voice recognition software has substantially simplified and reduced the time needed to manage the collection. My process is as follows:
- Scan each page of the collection for digital preservation.
- Using speech recognition software (such as Dragon NaturallySpeaking 10), I am able to read the handwritten text directly into a Word document (with the aid of microphone connected into USB port) with all the correct punctuation and formatting. The software recognizes over 98% of what I read. I am able to read at normal speeds exceeding 100 words a minute.
- Once the text is captured, I make any editing changes and the text is now ready for further research. I can search the document with key words, cut and paste information, and so forth.
What is voice recognition?
Voice or speech recognition is the ability of a machine or program to receive and interpret dictation, or to understand and carry out spoken commands.
For use with computers, analog audio must be converted into digital signals. This requires analog-to-digital conversion. For a computer to decipher the signal, it must have a digital database (or vocabulary) of words or syllables, and a speedy means of comparing this data with signals. The speech patterns are stored on the hard drive and loaded into memory when the program starts. A comparator checks these stored patterns against the output of the A/D converter.
In practice, the size of a voice-recognition program’s effective vocabulary is directly related to the random access memory capacity of the computer in which it is installed. A voice-recognition program runs many times faster if the entire vocabulary can be loaded into RAM, as compared with searching the hard drive for some of the matches. Processing speed is critical as well, because it affects how fast the computer can search the RAM for matches.
All voice-recognition systems or programs make errors. Screaming children, barking dogs, and loud external conversations can produce false input. (Much of this can be avoided by using the system in a quiet room.) There is also a problem with words that sound alike but are spelled differently and have different meanings–for example, “hear” and “here.”
When is voice recognition used?
- Dictation: Dictation is the most common use. This includes medical transcriptions, legal and business dictation, as well as general word processing. In some cases special vocabularies are used to increase the accuracy of the system.
- Command and Control: Voice recognition is designed to perform functions and actions on the system and is defined as Command and Control systems. Utterances like “Open Word” and “Start a new (command) will do just that.
- Telephony: Some PBX/Voice Mail systems allow callers to speak commands instead of pressing buttons to send specific tones.
- Wearables: Because inputs are limited for wearable devices, speaking is a natural possibility.
- Medical/Disabilities: Many people have difficulty typing due to physical limitations such as repetitive strain injuries (RSI), muscular dystrophy, and many others. For example, people with difficulty hearing could use a system connected to their telephone to convert the caller’s speech to text.
- Embedded applications: Some cellular phones and automobiles include voice recognition that allows utterances such as “Call Home.”
Tips on using speech recognition
The following tips are based on my experience with Nuance Dragon NaturallySpeaking 9:
- Ensure that your computer meets the hardware specifications.
- Use the latest version of the software Dragon NaturallySpeaking.
- Install Dragon NaturallySpeaking after you have installed Microsoft Office.
- Make sure that you have selected the U.S. vocabulary when you setup your user. Sometimes people mistakenly install the UK or Australian English model. This will cause many problems.
- Ensure that you have selected the correct microphone type when setting up your user specifications. If you are using a digital microphone then select it; if you’re using a desktop array microphone then select it, and so on.
- When completing the initial training period, speak in a consistent and even tone of voice. Do not speak too softly or loudly and do not speak too fast or slow.
- If the program is not responding well after you complete the first session of general training then do additional training. Ensure that your hardware is setup properly before you conduct the additional training.
- Run the audio setup wizard once a week or when you find that the program is not responding well to your voice.
- Ensure that the microphone is positioned consistently at the same distance from your mouth. The microphone should be positioned approximately 3 to 4 cm from your mouth and slightly below your bottom lip.
- You must make corrections with Dragon every time it makes a mistake. If you do not make the proper corrections with Dragon Natural Speaking it will not improve. The ability for the program to become more accurate entirely depends upon you making the corrections.
- In order to correct, you must say either “correct that” or “correct the wrong words.”
- You are not making corrections if you select incorrect words with the mouse and then type over them. You are only making an edit. Dragon will never remember this is a correction and will never improve as a consequence.
- Dragon is optimized for Microsoft Word and Microsoft Outlook. It will work in other applications with various degrees of success. It does not work well with Microsoft Excel.
- The key to getting good accuracy from speech recognition is to think about a sentence before you say it, and then speak in phrases or sentences at a time. Do – not – speak – each – word. All language programs need words to be spoken as sentences, not as individual words.
- If you purchase a new microphone, especially if you are going from an analogue to a digital microphone, it is best to complete a short general training session to start a new user. This is because different microphones sound differently to the program.
- Dragon NaturallySpeaking (version 10) has some inbuilt tolerance for “UM” and “AH’s” but it is not really capable of telling the difference between words and utterances. Again, think of a sentence carefully before you pronounce it to the program. This will help enormously.
- Back up your speech files to another mediumevery few months. With Dragon NaturallySpeaking this is called the USER file and is located in the Dragon NaturallySpeaking program directory. This will save you a lot of heartache if something goes wrong with your computer. Speech files can become corrupted quite easily.