Academic
Publications
On Smoothing and Inference for Topic Models

On Smoothing and Inference for Topic Models,Arthur Asuncion,Max Welling,Padhraic Smyth,Yee Whye Teh

On Smoothing and Inference for Topic Models   (Citations: 16)
BibTex | RIS | RefWorks Download
Latent Dirichlet analysis, or topic modeling, is a flexible latent variable framework for model- ing high-dimensional sparse count data. Various learning algorithms have been developed in re- cent years, including collapsed Gibbs sampling, variational inference, and maximum a posteriori estimation, and this variety motivates the need for careful empirical comparisons. In this paper, we highlight the close connections between these approaches. We find that the main differences are attributable to the amount of smoothing applied to the counts. When the hyperparameters are op- timized, the differences in performance among the algorithms diminish significantly. The ability of these algorithms to achieve solutions of com- parable accuracy gives us the freedom to select computationally efficient approaches. Using the insights gained from this comparative study, we show how accurate topic models can be learned in several seconds on text corpora with thousands of documents.
Cumulative Annual
View Publication
The following links allow you to view full publications. These links are maintained by other sources not affiliated with Microsoft Academic Search.
    • ...For some applications, topic models are sensitive to the hyperparameters (Asuncion et al. 2009) and it is necessary to get the right values for the hyperparameters...

    Jie Tanget al. Topic level expertise search over heterogeneous networks

    • ...Another reason is that VB achieves a generalization power comparable with CGS and CVB [1]...

    Tomonari Masadaet al. Steering Time-Dependent Estimation of Posteriors with Hyperparameter I...

    • ...the importance of properly adapting the priors (hyperparameters) in LDA-based models [11, 12]...
    • ...For an excellent comparison of different inference methods such as variational Bayes (VB), collapsed Gibbs sampling (CGS), collapsed variational Bayes (CVB) and maximum a-posteriori (MAP) inference for (unsupervised) LDA, we refer to [12]...
    • ...While the extension of [12] to the supervised case might appear straightforward at first sight, several new aspects arise in the supervised case:...
    • ...In the EM algorithm for MAP solution in LDA [12], the E-step involves the computation of wjk = P(zijjwij,�i) and the M-step involves maximization w.r.t...
    • ...� and �. To ensure that wjk’s are valid probabilities, [12] impose the constraint � > 1,� > 1 in their MAP solution...
    • ...In (2), the hyperparameters are optimized using Maximum likelihood (ML) estimation for Dirichlet distribution whereas in the MAP solution by [12], the hyperparameters are optimized using ML for Polya distribution...
    • ...As in LDA, the collapsed variational distribution is assumed to factorize as follows 2 [12, 14]:...
    • ...The computational complexity in the training stage for Supervised-LDA is similar to that of LDA (see Section 5 in [12] for a related discussion)...

    Balaji Lakshminarayananet al. Inference in Supervised latent Dirichlet allocation

    • ...Our algorithm is modeled on the CVB0 variational approximation to LDA described in (Asuncion, et al. 2009)...

    Daniel Ramageet al. Characterizing Microblogs with Topic Models

    • ...kd are expected counts derived from q(z). For more details, see Asuncion et al. [7]...
    • ...Perplexity is a widely-used metric for topic models that indicates the quality of the model (and a lower perplexity indicates a better model) [7]...

    Hazeline U. Asuncionet al. Software traceability with topic modeling

Sort by: