forked from romanows/Sphinx-4-Acoustic-Model-Adaptation-Data
-
Notifications
You must be signed in to change notification settings - Fork 0
Data to generate acoustic model adaptation prompts/transcripts from CMU Arctic prompts
License
hmeutzner/Sphinx-4-Acoustic-Model-Adaptation-Data
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
Sphinx 4 Acoustic Model Adaptation Data by Brian Romanowski This project contains data used during the CMU Sphinx 4 acoustic model adapatation process. It includes nearly all utterances from the CMU ARCTIC project (http://festvox.org/cmu_arctic/), about 1108 more than the 20 prompts provided in the CMU adaptation tutorial (http://cmusphinx.sourceforge.net/wiki/tutorialadapt) A blog post describes some of the issues surrounding acoustic model adaptation in Sphinx that are out-of-scope in this narrow project (http://pwnetics.wordpress.com/2011/07/01/sphinx-4-language-model-adaptation) Provided are the four files required during the adaptation process: * prompts for speakers * transcriptions of the to-be recorded speech * pronunciation dictionary entries for transcribed words * miscellaneous bookkeeping information And there is an additional file that has pronunciation alternatives and the dictionary words inline with the transcriptions. Originally, the idea was that a human annotator could quickly delete the unused pronunciations, but there are so many that this is not practical. Note that 4 ARCTIC prompts that contained numerals were left-out of the generated prompt data, as the mapping from numerals to speech is not always clear. The Python 2.7 code used to generate the adaptation prompt data is included. To use it, you'll have to edit filesystem paths and download the CMU pronunciation dictionary and the original CMU ARCTIC prompts. Please let me know about the bugs you find! Feel free to fork the code to generate prompts for other corpora!
About
Data to generate acoustic model adaptation prompts/transcripts from CMU Arctic prompts
Resources
License
Stars
Watchers
Forks
Releases
No releases published
Packages 0
No packages published