Sphinx WSJ training recipe -------------------------------- http://www.keithv.com/software/sphinx/ Here is a recipe to to train the CMU Sphinx speech recognizer using the CMU pronouncing dictionary, WSJ0 corpus and optionally the WSJ1 corpus. The Resource Management corpus is also used to perform the initial forced alignment of the WSJ training data. This is mostly based on the tutorial from: http://www.speech.cs.cmu.edu/sphinx/tutorial.html I evaluated on the November 1992 ARPA WSJ set (Nov'92, 330 sentences) and the San Jose Mercury sentences from the WSJ1 Hub 2 test set (si_dt_s2, 207 sentences). Nov'92 was evaluated using the WSJ 5K non-verbalized 5k closed vocabulary set and the WSJ standard 5K non-verbalized closed bigram language model. si_dt_s2 was evaluated using a 60K vocabulary and bigram language model trained on the English Gigaword corpus (language model not included in this recipe). Models were evaluated using the Sphinx-3 decoder operating in close to real-time. Results (in % word error rates): +-----------------+---------+----------+ | Training data | Nov'92 | si_dt_s2 | +-----------------+---------+----------+ | WSJ SI-84 | 28.91% | 52.42% | | WSJ SI-284 | 7.34% | 24.27% | | WSJ all | 6.33% | 21.26% | +-----------------+---------+----------+ Basic steps: 1) If you want to evaluate results, install the sctk utilities from NIST. Available from: http://www.nist.gov/speech/tools/index.htm 2) Download the Resource Management Database from CMU: http://www.speech.cs.cmu.edu/databases/rm1/index.html Unpack in a directory below the sphinx_recipe directory. For example: wget http://www.speech.cs.cmu.edu/databases/rm1/rm1_cepstra.tar.gz tar -xvvzf rm1_cepstra.tar.gz 4) Setup the environment variables contained in: add_to_your_env 5) Download and build the trainer: cd sphinx_recipe svn co https://cmusphinx.svn.sourceforge.net/svnroot/cmusphinx/trunk/SphinxTrain cd SphinxTrain configure make 6) Now prep and train the Resource Management acoustic model: perl scripts_pl/setup_tutorial.pl rm1 cd ../rm1 perl scripts_pl/RunAll.pl 7) Download and build the sphinxbase package: cd .. svn co https://cmusphinx.svn.sourceforge.net/svnroot/cmusphinx/trunk/sphinxbase cd sphinxbase ./autogen.sh --prefix=$CMU_ROOT make make check make install 8) Download and build the Sphinx3 decoder: cd .. svn co https://cmusphinx.svn.sourceforge.net/svnroot/cmusphinx/trunk/sphinx3 cd sphinx3 ./autogen.sh --prefix=$CMU_ROOT make make check make install 9) (Optional) If real-time performance is crucial, you might want to use semi-continuous models and decode with Sphinx-2: cd .. svn co https://cmusphinx.svn.sourceforge.net/svnroot/cmusphinx/trunk/sphinx2 cd sphinx2 ./autogen.sh --prefix=$CMU_ROOT make make check make install 10) (Optional) If you want to try out recognition on a mobile device, you might want to install PocketSphinx. Note: you'd want to pass --enable-fixed to autogen.sh if your platform doesn't have a floating point unit. cd .. svn co https://cmusphinx.svn.sourceforge.net/svnroot/cmusphinx/trunk/pocketsphinx cd pocketsphinx ./autogen.sh --prefix=$CMU_ROOT --build=i686-linux make make check make install 11) You'll need lm3g2dmp for evaluation: cd .. svn co https://cmusphinx.svn.sourceforge.net/svnroot/cmusphinx/trunk/share/lm3g2dmp cd lm3g2dmp make 12) Copy all the SphinxTrain and lm3g2dmp executables to $CMU_ROOT/bin cd .. cp $CMU_ROOT/SphinxTrain/bin*/* $CMU_ROOT/bin cp $CMU_ROOT/lm3g2dmp/lm3g2dmp $CMU_ROOT/bin 13) Download the CMU dictionary from: http://www.speech.cs.cmu.edu/cgi-bin/cmudict Copy to $CMU_COMMON/c0.6 For example: cd $CMU_COMMON wget ftp://ftp.cs.cmu.edu/afs/cs.cmu.edu/data/anonftp/project/fgdata/dict/c06d.gz gunzip c06d.gz mv c06d c0.6 cd $CMU_ROOT 14) Install sph2pipe from: http://www.ldc.upenn.edu/Using sph2pipe must be on your path. For example: wget ftp://ftp.ldc.upenn.edu/pub/ldc/misc_sw/sph2pipe_v2.5.tar.gz tar -xvvzf sph2pipe_v2.5.tar.gz cd sph2pipe_v2.5 gcc -o sph2pipe *.c -lm cp sph2pipe /usr/local/bin 15) Rock and roll (hopefully): go.sh wsj_si84 - train using SI-84 data, eval on Nov'92 go.sh wsj_si284 - train using SI-284 data, eval on Nov'92 go.sh wsj_all - train using all WSJ data, eval on Nov'92 The default is to train continuous 3-state HMMs with 8000 tied states, 32 Gaussians/state, and decode with Sphinx-3. Training and decoding behavior can be changed in the config files pointed to by CONFIG_TRAIN and CONFIG_DECODE in go.sh. Have fun! Keith Vertanen Revision history: ----------------- Mar 10th, 2008 - Remove parameter from scripts that are no longer present in the current CMU SphinxTrain build. - Updated to work with latest CMU dictionary. - Added example commands to obtain RM1, etc. - Added single quotes around find commands. Oct 16th, 2006 - Initial release of Sphinx recipe. Dec 18th, 2006 - Added PocketSphinx decoding support. - Changed Sphinx-2 decoding beam widthes to perform similar to Sphinx-3 on Nov'92 test set.