Spelling as a Complementary Strategy for Speech Recognition
Interspeech '12: Proceedings of the International Conference on Spoken Language Processing, 2012.
We compare a variety of strategies for incorporating spelling to create more robust voice-only speech interfaces. These strategies use different combinations of speaking the word, spelling the word, and spelling the word using a phonetic alphabet. For correcting a single recognition error, spelling the word or speaking and spelling the word reduced error rates substantially. Phonetic-spelling was very accurate with error rates on a 5K task approaching zero. Most importantly, multiple input strategies could be used simultaneously with only a modest degradation in performance compared to allowing only a single input strategy. Thus our work shows that spelling-based input strategies enable speech interfaces to provide users with a simple, natural and effective way to both avoid and correct recognition errors.