Cross-Entropy Optimization of Control Policies With Adaptive Basis Functions

Cross-Entropy Optimization of Control Policies With Adaptive Basis Functions,10.1109/TSMCB.2010.2050586,IEEE Transactions on Systems, Man, and Cyberne

Cross-Entropy Optimization of Control Policies With Adaptive Basis Functions   (Citations: 4)
BibTex | RIS | RefWorks Download
This paper introduces an algorithm for direct search of control policies in continuous-state discrete-action Markov deci- sion processes. The algorithm looks for the best closed-loop policy that can be represented using a given number of basis functions (BFs), where a discrete action is assigned to each BF. The type of the BFs and their number are specified in advance and determine the complexity of the representation. Considerable flexibility is achieved by optimizing the locations and shapes of the BFs, to- gether with the action assignments. The optimization is carried out with the cross-entropy method and evaluates the policies by their empirical return from a representative set of initial states. The return for each representative state is estimated using Monte Carlo simulations. The resulting algorithm for cross-entropy pol- icy search with adaptive BFs is extensively evaluated in problems with two to six state variables, for which it reliably obtains good policies with only a small number of BFs. In these experiments, cross-entropy policy search requires vastly fewer BFs than value- function techniques with equidistant BFs, and outperforms policy search with a competing optimization algorithm called DIRECT.
Journal: IEEE Transactions on Systems, Man, and Cybernetics - TSMC , vol. 41, no. 1, pp. 196-209, 2011
Cumulative Annual
View Publication
The following links allow you to view full publications. These links are maintained by other sources not affiliated with Microsoft Academic Search.
    • ...For instance, policy search was combined with evolutionary computation in [55], [56, Ch. 3], and with cross-entropy optimization in [57], [58]...

    Lucian Busoniuet al. Approximate reinforcement learning: An overview

    • ...STI control [21], [23], [24] assumed a one-to-one mapping between drug application and effectiveness, so that whenever a drug is fully applied, its effectiveness is equal to some maximum value...
    • ...Note that the expected values of � 1 and � 2 when the drugs are applied are, respectively, 0.7 and 0.3, equal to their deterministic values in [21], [23], [24]...
    • soon as V becomes nonzero due to the introduction of virus copies, the patient becomes infected and the state drifts away from xn. More interesting are the unhealthy equilibrium xu = [163573, 5, 11945, 46, 63919, 24] � , which is stable and...
    • ...This solution is better than our previous one in [24], which keeps one drug on in steady state...

    Lucian Busoniuet al. Optimistic planning for sparsely stochastic systems

    • ...Another related class of algorithms with adaptive bases are those concerning of direct policy improvement (or actor only algorithms) [8,9]...

    Dotan Di Castroet al. Adaptive Bases for Reinforcement Learning

Sort by: