Sign in
Author

Conference

Journal

Organization

Year

DOI
Look for results that meet for the following criteria:
since
equal to
before
between
and
Search in all fields of study
Limit my searches in the following fields of study
Agriculture Science
Arts & Humanities
Biology
Chemistry
Computer Science
Economics & Business
Engineering
Environmental Sciences
Geosciences
Material Science
Mathematics
Medicine
Physics
Social Science
Multidisciplinary
Keywords
(6)
Empirical Likelihood
Evaluation Method
Evaluation Metric
Exact Computation
Harmonic Mean
Latent Dirichlet Allocation
Subscribe
Academic
Publications
Evaluation Methods for Topic Models
Evaluation Methods for Topic Models,10.1145/1553374.1553515,Hanna M. Wallach,Iain Murray,Ruslan Salakhutdinov,David M. Mimno
Edit
Evaluation Methods for Topic Models
(
Citations: 15
)
BibTex

RIS

RefWorks
Download
Hanna M. Wallach
,
Iain Murray
,
Ruslan Salakhutdinov
,
David M. Mimno
A natural
evaluation metric
for statistical topic models is the probability of heldout documents given a trained model. While
exact computation
of this probability is in tractable, several estimators for this prob ability have been used in the topic model ing literature, including the
harmonic mean
method and
empirical likelihood
method. In this paper, we demonstrate experimentally that commonlyused methods are unlikely to accurately estimate the probability of held out documents, and propose two alternative methods that are both accurate and ecient. In this paper we consider only the simplest topic model,
latent Dirichlet allocation
(LDA), and compare a number of methods for estimating the probability of heldout documents given a trained model. Most of the methods presented, however, are applicable to more complicated topic models. In addition to com paring evaluation methods that are currently used in the topic modeling literature, we propose several al ternative methods. We present empirical results on synthetic and realworld data sets showing that the currentlyused estimators are less accurate and have higher variance than the proposed new estimators.
Conference:
International Conference on Machine Learning  ICML
, pp. 11051112, 2009
DOI:
10.1145/1553374.1553515
Cumulative
Annual
View Publication
The following links allow you to view full publications. These links are maintained by other sources not affiliated with Microsoft Academic Search.
(
portal.acm.org
)
(
portal.acm.org
)
(
dl.acm.org
)
(
www.informatik.unitrier.de
)
(
www.cs.utoronto.ca
)
(
www.machinelearning.org
)
(
www.cs.umass.edu
)
(
www.cs.umass.edu
)
(
www.cs.toronto.edu
)
(
doi.acm.org
)
(
www.cs.mcgill.ca
)
(
www.cs.toronto.edu
)
More »
Citation Context
(11)
...Efficiently and reliably computing the marginal likelihood in topic models is an active research area [
37
], [38] due to the intractable sum over correlated y in (17)...
...We take the view of [
37
], [38] and define an importance sampling approximation to the marginal likelihood...
...The new importance sampling proposal in (19) results in much faster and more accurate estimation of the marginal likelihood (17) than the standard approach [39], [
37
] of using posterior Gibbs samples...
...The latter results in the harmonic mean approximation for the likelihood [39], [
37
] and suffers from 1) requiring a (slow) Gibbs sampler at test time (prohibiting online computation) and 2) the high variance of the harmonic mean estimator (making classification inaccurate)...
...We compared this against the classification performance of WSJTM using the commonly used harmonic mean likelihood approximation [
37
], [39]...
Timothy M. Hospedales
,
et al.
Identifying Rare and Subtle Behaviors: A Weakly Supervised Joint Topic...
...However, Wallach et al. have pointed out in [
3
] that when sampling from highdimensional distributions the variance of empirical likelihood may be very large...
Shaoze Lei
,
et al.
Topic model with constrainted word burstiness intensities
...Evaluation Plan. Evaluating topic models is a dicult task, due to a lack of ground truth [
16
]...
Stephen W. Thomas
.
Mining software repositories using topic models
...We point out that reliably computing the marginal likelihood in topic models is an active research area [
16
, 17] due to the intractable sum over correlated y in Eq. (5)...
...We take the view of [
16
, 17] and define an importance sampling approximation p(x ∗ c) ≈ 1...
...Note that this importance sampling proposal (Eq. (7)) is much faster and more accurate than the standard approach [
16
, 18] of using posterior Gibbs samples in Eq. (6), which results in the unstable harmonic mean approximation to the likelihood – an estimator which has huge variance in theory and in practice [16]...
...Note that this importance sampling proposal (Eq. (7)) is much faster and more accurate than the standard approach [16, 18] of using posterior Gibbs samples in Eq. (6), which results in the unstable harmonic mean approximation to the likelihood – an estimator which has huge variance in theory and in practice [
16
]...
Jian Li
,
et al.
Learning Rare Behaviours
...Wallach et al. [
5
] gives a evalua tion method of this generative ability using Latent Dirichlet Allocation (LDA)...
Yonghui Wu
,
et al.
A comparative study of topic models for topic clustering of Chinese we...
References
(18)
Latent dirichlet allocation
(
Citations: 1957
)
David M. Blei
,
Andrew Y. Ng
,
Michael I. Jordan
Journal:
Journal of Machine Learning Research  JMLR
, vol. 3, pp. 9931022, 2003
Marginal Likelihood from the Gibbs Output
(
Citations: 607
)
Siddhartha Chib
Journal:
Journal of The American Statistical Association  J AMER STATIST ASSN
, vol. 90, no. 432, pp. 13131321, 1995
Studies in Lower Bounding Probability of Evidence using the Markov Inequality
(
Citations: 6
)
Vibhav Gogate
,
Bozhena Bidyuk
,
Rina Dechter
Conference:
Uncertainty in Artificial Intelligence  UAI
Pachinko allocation: DAGstructured mixture models of topic correlations
(
Citations: 98
)
Wei Li
,
Andrew Mccallum
Conference:
International Conference on Machine Learning  ICML
, pp. 577584, 2006
MALLET: A Machine Learning for Language Toolkit
(
Citations: 118
)
Andrew Kachites McCallum
Published in 2002.
Sort by:
Citations
(15)
Identifying Rare and Subtle Behaviors: A Weakly Supervised Joint Topic Model
Timothy M. Hospedales
,
Jian Li
,
Shaogang Gong
,
Tao Xiang
Journal:
IEEE Transactions on Pattern Analysis and Machine Intelligence  PAMI
, vol. 33, no. 12, pp. 24512464, 2011
Topic model with constrainted word burstiness intensities
Shaoze Lei
,
JianWen Zhang
,
Shifeng Weng
,
Changshui Zhang
Conference:
International Symposium on Neural Networks  ISNN
, pp. 6874, 2011
Mining software repositories using topic models
Stephen W. Thomas
Conference:
International Conference on Software Engineering  ICSE
, pp. 11381139, 2011
TreeStructured Stick Breaking Processes for Hierarchical Data
Ryan Prescott Adams
,
Zoubin Ghahramani
,
Michael I. Jordan
Published in 2010.
Learning Rare Behaviours
Jian Li
,
Timothy M. Hospedales
,
Shaogang Gong
,
Tao Xiang
Conference:
Asian Conference on Computer Vision  ACCV
, pp. 293307, 2010