Sign in
Author

Conference

Journal

Organization

Year

DOI
Look for results that meet for the following criteria:
since
equal to
before
between
and
Search in all fields of study
Limit my searches in the following fields of study
Agriculture Science
Arts & Humanities
Biology
Chemistry
Computer Science
Economics & Business
Engineering
Environmental Sciences
Geosciences
Material Science
Mathematics
Medicine
Physics
Social Science
Multidisciplinary
Keywords
(11)
Approximate Algorithm
Approximation Method
Cross Entropy
crossentropy method
Direct Search
Markov Process
Monte Carlo Method
Monte Carlo Simulation
Optimal Algorithm
Value Function
Markov Decision Process
Subscribe
Academic
Publications
CrossEntropy Optimization of Control Policies With Adaptive Basis Functions
CrossEntropy Optimization of Control Policies With Adaptive Basis Functions,10.1109/TSMCB.2010.2050586,IEEE Transactions on Systems, Man, and Cyberne
Edit
CrossEntropy Optimization of Control Policies With Adaptive Basis Functions
(
Citations: 4
)
BibTex

RIS

RefWorks
Download
Lucian Busoniu
,
Damien Ernst
,
Bart De Schutter
,
Robert Babuska
This paper introduces an algorithm for
direct search
of control policies in continuousstate discreteaction Markov deci sion processes. The algorithm looks for the best closedloop policy that can be represented using a given number of basis functions (BFs), where a discrete action is assigned to each BF. The type of the BFs and their number are specified in advance and determine the complexity of the representation. Considerable flexibility is achieved by optimizing the locations and shapes of the BFs, to gether with the action assignments. The optimization is carried out with the
crossentropy method
and evaluates the policies by their empirical return from a representative set of initial states. The return for each representative state is estimated using
Monte Carlo
simulations. The resulting algorithm for crossentropy pol icy search with adaptive BFs is extensively evaluated in problems with two to six state variables, for which it reliably obtains good policies with only a small number of BFs. In these experiments, crossentropy policy search requires vastly fewer BFs than value function techniques with equidistant BFs, and outperforms policy search with a competing optimization algorithm called DIRECT.
Journal:
IEEE Transactions on Systems, Man, and Cybernetics  TSMC
, vol. 41, no. 1, pp. 196209, 2011
DOI:
10.1109/TSMCB.2010.2050586
Cumulative
Annual
View Publication
The following links allow you to view full publications. These links are maintained by other sources not affiliated with Microsoft Academic Search.
(
dx.doi.org
)
(
www.informatik.unitrier.de
)
(
ieeexplore.ieee.org
)
(
ieeexplore.ieee.org
)
(
ieeexplore.ieee.org
)
More »
Citation Context
(3)
...For instance, policy search was combined with evolutionary computation in [55], [56, Ch. 3], and with crossentropy optimization in [57], [
58
]...
Lucian Busoniu
,
et al.
Approximate reinforcement learning: An overview
...STI control [21], [23], [
24
] assumed a onetoone mapping between drug application and effectiveness, so that whenever a drug is fully applied, its effectiveness is equal to some maximum value...
...Note that the expected values of � 1 and � 2 when the drugs are applied are, respectively, 0.7 and 0.3, equal to their deterministic values in [21], [23], [
24
]...
...as soon as V becomes nonzero due to the introduction of virus copies, the patient becomes infected and the state drifts away from xn. More interesting are the unhealthy equilibrium xu = [163573, 5, 11945, 46, 63919,
24
] � , which is stable and...
...This solution is better than our previous one in [
24
], which keeps one drug on in steady state...
Lucian Busoniu
,
et al.
Optimistic planning for sparsely stochastic systems
...Another related class of algorithms with adaptive bases are those concerning of direct policy improvement (or actor only algorithms) [
8
,9]...
Dotan Di Castro
,
et al.
Adaptive Bases for Reinforcement Learning
References
(35)
Dynamic Programming and Optimal Control
(
Citations: 1610
)
Dimitri Bertsekas
Published in 1995.
Reinforcement learning: An introduction
(
Citations: 5592
)
Richard S. Sutton
,
Andrew G. Barto
Conference:
Neural Information Processing Systems  NIPS
, vol. 9, no. 5, pp. 10541054, 1998
NeuroDynamic Programming
(
Citations: 1640
)
Dimitri Bertsekas
,
John Tsitsiklis
Published in 1996.
FeatureBased Methods for Large Scale Dynamic Programming
(
Citations: 194
)
John N. Tsitsiklis
,
Benjamin Van Roy
Journal:
Machine Learning  ML
, vol. 22, no. 13, pp. 5994, 1996
Variable Resolution Discretization in Optimal Control
(
Citations: 143
)
Remi Munos
,
Andrew W. Moore
Journal:
Machine Learning  ML
, vol. 49, no. 23, pp. 291323, 2002
Sort by:
Citations
(4)
Approximate policy iteration: a survey and some new methods
(
Citations: 3
)
Dimitri P. Bertsekas
Journal:
Journal of Control Theory and Applications
, vol. 9, no. 3, pp. 310335, 2011
Approximate reinforcement learning: An overview
(
Citations: 1
)
Lucian Busoniu
,
Damien Ernst
,
Bart De Schutter
,
Robert Babuska
Published in 2011.
Optimistic planning for sparsely stochastic systems
Lucian Busoniu
,
R ´ emi Munos
,
Bart De Schutter
,
Robert Babuska
Published in 2011.
Adaptive Bases for Reinforcement Learning
(
Citations: 1
)
Dotan Di Castro
,
Shie Mannor
Journal:
Computing Research Repository  CORR
, vol. abs/1005.0, pp. 312327, 2010