Sign in
Author

Conference

Journal

Organization

Year

DOI
Look for results that meet for the following criteria:
since
equal to
before
between
and
Search in all fields of study
Limit my searches in the following fields of study
Agriculture Science
Arts & Humanities
Biology
Chemistry
Computer Science
Economics & Business
Engineering
Environmental Sciences
Geosciences
Material Science
Mathematics
Medicine
Physics
Social Science
Multidisciplinary
Keywords
(18)
Asymptotic Distribution
Asymptotic Equivalence
Asymptotic Theory
Conditional Distribution
Conditional Probability
Expected Value
Goodness of Fit Test
Maximum Likelihood Estimate
Model Selection
Multinomial Distribution
Parameter Space
Particle Physics
poisson distribution
Probability Model
Random Variable
Conditional Maximum Likelihood
Independent Identically Distributed
Method of Moment
Related Publications
(3)
Estimating the Size of a Truncated Sample
Empirical Bayes estimation of the number of species
Frequentist Model Average Estimators
Subscribe
Academic
Publications
Estimating the Size of a Multinomial Population
Estimating the Size of a Multinomial Population,10.1214/aoms/1177692709,The Annals of Mathematical Statistics,Lalitha Sanathanan
Edit
Estimating the Size of a Multinomial Population
(
Citations: 68
)
BibTex

RIS

RefWorks
Download
Lalitha Sanathanan
This paper deals with the problem of estimating the number of trials of a multinomial distribution, from an incomplete observation of the cell totals, under constraints on the cell probabilities. More specifically let $(n_1, \cdots, n_k)$ be distributed according to the multinomial law $M(N; p_1, \cdots, p_k)$ where $N$ is the number of trials and the $p_i$'s are the cell probabilities, $\sum^k_{i=1}p_i$ being equal to 1. Suppose that only a proper subset of $(n_1, \cdots, n_k)$ is observable, that $N, p_1, \cdots, p_k$ are unknown and that $N$ is to be estimated. Without loss of generality, $(n_1, \cdots, n_{l1}), l \leqq k$ may be taken to be the observable random vector. For fixed $N, (n_1, \cdots, n_{l1}, N  n)$ has the
multinomial distribution
$M(N; p_1, \cdots, p_l)$ where $n$ denotes $\sum^{l1}_{i=1}n_i$ and $p_l$ denotes $1  \sum^{l1}_{i=1}p_i$. If the
parameter space
is such that $N$ can take any nonnegative integral value and each $p_i$ can take any value between 0 and 1, such that $\sum^{l1}_{i=1}p_i < 1$ then, clearly, the only inference one can make about $N$ is that $N > n$. In specific situations, it might, however, be possible to postulate constraints of the type \begin{equation*}\tag{1.1} p_i = f_i(\theta),\quad i = 1, \cdots, l\end{equation*} where $\theta = (\theta_1, \cdots, \theta_r)$ is a vector of $r$ independent parameters and $f_i$ are known functions. This may lead to estimability of $N$. The problem of estimating $N$ in such a situation is studied here. The present investigation is motivated by the following problem. Experiments in
particle physics
often involve visual scanning of film containing photographs of particles (occurring, for instance, inside a bubble chamber). The scanning is done with a view to counting the number $N$ of particles of a predetermined type (these particles will be referred to as events). But owing to poor visibility caused by such characteristics as low momentum, the distribution and configuration of nearby track patterns, etc., some events are likely to be missed during the scanning process. The question, then, is: How does one get an estimate of $N$? The usual procedure of estimating $N$ is as follows. Film containing the $N$ (unknown) events is scanned separately by $w$ scanners (ordered in some specific way) using the same instructions. For each event $E$ let a $w$vector $Z(E)$ be defined, such that the $j$th component $Z_j$ of $Z(E)$ is 1 if $E$ is detected by the $j$th scanner and is 0 otherwise. Let $\mathscr{J}$ be the set of $2^w w$vectors of 1's and 0's and let $I_0$ by the vector of 0's. Let $x_I$ be the number of events $E$ whose $Z(E) = I$. For $I \in \mathscr{J}  \{I_0\}$, the $x_I$'s are observed. A
probability model
is assumed for the results of the scanning process. That is, it is assumed that there is a probability $p_I$ that $Z(E)$ assumes the value $I$ and that these $p_I$'s are constrained by equations of the type (1.1) (These constraints vary according to the assumptions made about the scanners and events, thus giving rise to different models. An example of $p_I(\theta)$ would be $E(\nu^{\Sigma^w_{j=1}I_j}(1  \nu)^{w\Sigma^w_{j=1}I_j})$ where $I_j$ is the $j$th component of $I$ and expectation is taken with respect to the twoparameter beta density for $v$. This is the result of assuming that all scanners are equally efficient in detecting events, that the probability $v$ that an event is seen by any scanner is a
random variable
and that the results of the different scans are locally independent. For a discussion of various models, see Sanathanan (1969), Chapter III. $N$ is then estimated using the observed $x_I$'s and the constraints on the $P_I$'s, provided certain conditions (e.g., the minimum number of scans required) are met. The following formulation of the problem of estimating $N$, however, leads to some systematic study including a development of the relevant
asymptotic distribution
theory for the estimators. The $Z(E)$'s may be regarded as realizations of $N$
independent identically distributed
random variables whose common distribution is discrete with probabilities $p_I$ at $I$ (In particle counting problems, it is usually true that the particles of interest are sparsely distributed throughout the film on account of their
Poisson distribution
with low intensity. Thus in spite of the factors affecting their visibility outlined earlier, the events can be assumed to be independent.). The joint distribution of the $x_I$'s is, then, multinomial $M(N; p_I, I \in \mathscr{J})$. The problem of estimating $N$ is now in the form stated at the beginning of this section. Since the estimate depends on the constraints provided for the $p_I$'s, it is important to test the "fit" on the model selected. The
conditional distribution
of the $x_I$'s $(I \neq I_0)$ given $x$ is multinomial $M(x; p_I/p(I \neq I_0))$ where $x$ is defined as $\sum_{I\neq I_0} x_I$ and $p$ as $\sum_{I\neq I_0}P_I$. The corresponding $\chi^2$
goodness of fit test
may therefore be used to test the adequacy of a model in question. Various estimators of $N$ are considered in this paper and among them is, of course, the
maximum likelihood
estimator of $N$.
Asymptotic theory
for
maximum likelihood
estimation of the parameters of a
multinomial distribution
has been developed before for the case where $N$ is known but not for the case where $N$ is unknown.
Asymptotic theory
related to the latter case is developed is Section 4. The result on the asymptotic joint distribution of the relevant
maximum likelihood
estimators is stated in Theorem 2. A second method of estimation considered is that of maximizing the likelihood based on the
conditional probability
of observing $(n_1,\cdots, n_{l1})$, given $n$. This method is called the
conditional maximum likelihood
(C.M.L.) method. The C.M.L. estimator of $N$ is shown (Theorem 2) to be asymptotically equivalent to the
maximum likelihood
estimator. Section 5 contains an extension of these results to the situation involving several multinomial distributions. This situation arises in the particle scanning context when the detected events are classified into groups based on some factor like momentum which is related to visibility of an event, and a separate scanning record is available for each group. A third method of estimation considered is that of equating certain linear combinations of the cell totals (presumably chosen on the basis of some criterion) to their respective expected values.
Asymptotic theory
for this method is given in Section 6. This discussion is motivated by a particular case which is applicable to some models in the particle scanning problem, using a criterion based on the method of moments for the unobservable random variable, given by the number of scanners detecting an event (Discussion of the particular case can be found in Sanathanan (1969) Chapter III.). In the next section we give some definitions and a preliminary lemma.
Journal:
The Annals of Mathematical Statistics
, vol. 43, no. 1972, pp. 142152, 1972
DOI:
10.1214/aoms/1177692709
Cumulative
Annual
View Publication
The following links allow you to view full publications. These links are maintained by other sources not affiliated with Microsoft Academic Search.
(
projecteuclid.org
)
Citation Context
(18)
...,
...
XinSheng Hu
,
et al.
Estimating Animal Abundance in Ground Beef Batches Assayed with Molecu...
...This could sound puzzling for those who often appeal to the wellknown result of
Sanathanan (1972)
, which states that marginal likelihood and conditional likelihood are asymptotically equivalent...
Alessio Farcomeni
,
et al.
Reference Bayesian methods for recapture models with heterogeneity
...There are two classical approaches for estimatingN in such problems (
Sanathanan, 1972
)...
...In simple models either approach is relatively straightforward to implement, and both estimators are asymptotically equivalent (
Sanathanan, 1972
)...
J. Andrew Royle
,
et al.
Analysis of Multinomial Models With Unknown Index Using Data Augmentat...
...Abundance was estimated with the Huggins estimator (
Sanathanan 1972
;
Huggins 1989
,
1991
;
Alho 1990
)...
Kevin R. Bestgen
,
et al.
Population Status of Colorado Pikeminnow in the Green River Basin, Uta...
...the conditional multinomial likelihood (
Sanathanan, 1972
), given the...
...information matrix of the parameters (
Sanathanan, 1972
)...
Francesco Bartolucci
,
et al.
A Class of Latent Marginal Models for Capture–Recapture Data With Cont...
Sort by:
Citations
(68)
Estimating Animal Abundance in Ground Beef Batches Assayed with Molecular Markers
XinSheng Hu
,
Janika Simila
,
Sindey Schueler Platz
,
Stephen S. Moore
,
Graham Plastow
,
Ciaran N. Meghen
Journal:
PLOS One
, vol. 7, no. 3, 2012
Estimating species richness by a Poissoncompound gamma model
(
Citations: 2
)
JiPing Wang
Journal:
Biometrika
, vol. 97, no. 3, pp. 727740, 2010
Estimation of Tcell repertoire diversity and clonal size distribution by Poisson abundance models
(
Citations: 3
)
Nuno Sepúlveda
,
Carlos Daniel Paulino
,
Jorge Carneiro
Journal:
Journal of Immunological Methods  J IMMUNOL METHOD
, vol. 353, no. 1, pp. 124137, 2010
Reference Bayesian methods for recapture models with heterogeneity
(
Citations: 1
)
Alessio Farcomeni
,
Luca Tardella
Journal:
Test
, vol. 19, no. 1, pp. 187208, 2010
Uncovering a Latent Multinomial: Analysis of MarkRecapture Data with Misidentification
William A. Link
,
Jun Yoshizaki
,
Larissa L. Bailey
,
Kenneth H. Pollock
Journal:
Biometrics
, vol. 66, no. 1, pp. 178185, 2010