Academic
Publications
Unsupervised acoustic and language model training with small amounts of labelled data

Unsupervised acoustic and language model training with small amounts of labelled data,10.1109/ICASSP.2009.4960579,Scott Novotney,Richard M. Schwartz,J

Unsupervised acoustic and language model training with small amounts of labelled data   (Citations: 3)
BibTex | RIS | RefWorks Download
We measure the effects of a weak language model, estimated from as little as 100k words of text, on unsupervised acous- tic model training and then explore the best method of using word confidences to estimate n-gram counts for unsupervised language model training. Even with 100k words of text and 10 hours of training data, unsupervised acoustic modeling is robust, with 50% of the gain recovered when compared to su- pervised training. For language model training, multiplying the word confidences together to get a weighted count pro- duces the best reduction in WER by 2% over the baseline lan- guage model and 0.5% absolute over using unweighted tran- scripts. Oracle experiments show that a larger gain is possi- ble, but better confidence estimation techniques are needed to identify correct n-grams.
Cumulative Annual
View Publication
The following links allow you to view full publications. These links are maintained by other sources not affiliated with Microsoft Academic Search.
Sort by: