Bayesian sensing hidden Markov models for speech recognition

Bayesian sensing hidden Markov models for speech recognition,10.1109/ICASSP.2011.5947493,George Saon,Jen-Tzung Chien

Bayesian sensing hidden Markov models for speech recognition   (Citations: 1)
BibTex | RIS | RefWorks Download
We introduce Bayesian sensing hidden Markov models (BS-HMMs) to represent speech data based on a set of state-dependent basis vectors. By incorporating the prior density of sensing weights, the relevance of a feature vector to different bases is determined by the corresponding precision parameters. The BS-HMM parameters, consisting of the basis vectors, the precision matrices of sensing weights and the precision matrices of reconstruction errors, are jointly estimated by maximizing the likelihood function, which is marginalized over the weight priors. We derive recursive solutions for the three parameters, which are expressed via maximum a posteriori estimates of the sensing weights. Experimental results on an LVCSR task show consistent gains over conventional HMMs with Gaussian mixture models for both ML and discriminative training scenarios. Index Terms— Speech recognition, Bayesian learning, basis representation, acoustic model buried Markov models [5] which relaxed the conditional independence assumption for the representation of speech features. A set of state-dependent basis vectors was trained to express the conditionally dependent feature vectors. In yet another approach, subspace Gaussian mixture models [6] were constructed to represent speech features by using the state-dependent weights and a common largescale GMM structure. The feature representation was seen as sensing based on different subspaces of a global GMM. In this study, we address the basis representation of speech features for hidden Markov modeling and present the Bayesian sensing framework to ensure model regularization for speech recognition. The resulting BS-HMMs are constructed by a set of basis vectors, the precision matrix of sensing weights, and the precision matrix of reconstruction errors. The precision matrix of weights naturally reflects how relevant the input feature is encoded by the basis vectors similar to the perspective of relevance vector machine (RVM) [7]. Importantly, we maximize the marginal likelihood of the training data over random weights and jointly estimate the three sets of parameters. Multivariate solutions are derived by maximum likelihood (ML) type II estimation and expressed through recursive formulas. These formulas are interpreted in terms of the mean vector and covariance matrix of the a posteriori distribution of the sensing weights. The maximum a posteriori (MAP) estimate of the sensing weights plays a central role in BS-HMMs. Experimental results on an LVCSR task show consistent improvements over standard HMMs with Gaussian mixture models.
Cumulative Annual
View Publication
The following links allow you to view full publications. These links are maintained by other sources not affiliated with Microsoft Academic Search.
    • ...Undeterred by this state of affairs, we experiment with discriminative training for a new class of acoustic models called Bayesian sensing HMMs [6] which combine ideas from relevance vector machines [7] and bayesian dictionary learning...
    • ...In [6], we discuss the estimation of BS-HMM parameters according to the ML type II criterion by maximizing the marginal likelihood of the training data X = {xt}...
    • ...It is noteworthy that the ML type II solutions for Φi and Ri are obtained as a special case of (9) and (11) by setting γ den t (i) and Di to zero as shown in the companion paper [6]...

    George Saonet al. Discriminative training for Bayesian sensing hidden Markov models

Sort by: