A Bernoulli Two-armed Bandit

A Bernoulli Two-armed Bandit,10.1214/aoms/1177692553,The Annals of Mathematical Statistics,Donald A. Berry

A Bernoulli Two-armed Bandit   (Citations: 35)
BibTex | RIS | RefWorks Download
One of two independent Bernoulli processes (arms) with unknown expectations $\rho$ and $\lambda$ is selected and observed at each of $n$ stages. The selection problem is sequential in that the process which is selected at a particular stage is a function of the results of previous selections as well as of prior information about $\rho$ and $\lambda$. The variables $\rho$ and $\lambda$ are assumed to be independent under the (prior) probability distribution. The objective is to maximize the expected number of successes from the $n$ selections. Sufficient conditions for the optimality of selecting one or the other of the arms are given and illustrated for example distributions. The stay-on-a-winner rule is proved.
Journal: The Annals of Mathematical Statistics , vol. 43, no. 1972, pp. 871-897, 1972
Cumulative Annual
View Publication
The following links allow you to view full publications. These links are maintained by other sources not affiliated with Microsoft Academic Search.
Sort by: