|
I've made available the scripts I used to train an HTK recognizer using the CMU pronunciation dictionary, Wall Street Journal WSJ0 corpus and optionally the TIMIT and WSJ1 corpora. You'll also need the sph2pipe utility to decompress the WSJ audio files. The training regimen is mostly based on the tutorial presented in the HTKBook. You'll need a unix system with Perl installed. I only tested this on my own system, so I doubt this recipe is completely baked. But hopefully it will provide a good starting point. Let me know if you find ways to improve it. A variety of acoustic models trained using this recipe are available for download. There is also a similar CMU Sphinx recipe available. You can read about all the gory details here. You should be able to get a system that performs similar to the gender independent SI-84 systems described in the paper by Woodland et al: Large Vocabulary Continuous Speech Recognition Using HTK. To evaluate the system, I used the WSJ 5K non-verbalized 5k closed vocabulary set and the WSJ standard 5K non-verbalized closed bigram language model. On the Novemeber 1992 ARPA WSJ evaluation (330 sentences from 8 speakers), I got a word internal system with a word error rate of 7.85% and a cross word system with a word error rate of 6.91%. Results on other test sets are given below. The scripts and configuration files in the recipe are released under a new BSD license. This excludes the decision tree phonetic questions (tree_ques.hed) and the test set index files (si_dt_05_odd.ndx, si_dt_s2.ndx, si_dt_s6.ndx) which I did not write.
|