Academic
Publications
Numerical analysis of continuous time Markov decision processes over finite horizons
Numerical analysis of continuous time Markov decision processes over finite horizons   (Citations: 4)
BibTex | RIS | RefWorks Download
Continuous time Markov decision processes (CTMDPs) with a finite state and action space have been considered for a long time. It is known that under fairly general conditions the reward gained over a finite horizon can be maximized by a so-called piecewise constant policy which changes only finitely often in a finite interval. Although this result is available for more than 30 years, numerical analysis approaches to compute the optimal policy and reward are restricted to discretization methods which are known to converge to the true solution if the discretization step goes to zero. In this paper, we present a new method that is based on uniformization of the CTMDP and allows one to compute an ε-optimal policy up to a predefined precision in a numerically stable way using adaptive time steps.
Journal: Computers & Operations Research - CoR , vol. 38, no. 3, pp. 651-659, 2011
Cumulative Annual
View Publication
The following links allow you to view full publications. These links are maintained by other sources not affiliated with Microsoft Academic Search.
    • ...While efficient approximation [7,17,22] is of interest for a practitioner, being unable to determine whether or not optimal control exists—or if the optimal quality which was describedmorethanhalfacenturyagobyBellman[5]maybealimitthatcannotbereached— is very dissatisfying from a theoretical point of view...

    Markus N. Rabeet al. Finite optimal control for time-bounded reachability in CTMDPs and con...

    • ...Uniformization and discretization have been applied to approximate the optimal rewards for both infinite and finite horizons [26], [18], [27], [28], [29]...

    Martin R. Neuhäußeret al. Time-Bounded Reachability Probabilities in Continuous-Time Markov Deci...

    • ...The method we present is an adaption and extension of a recent algorithm [11] to compute the accumulated reward in a CTMDP over a finite interval...
    • ...This then provides both theoretical and practical insights: (i) on the theoretical side, we have that all maximal (or minimal) values arising in model checking can be obtained by finite memory policies, (ii) on the practical side, we exploit recent algorithmic advances [11] to arrive at an efficient approximation algorithm—providing upper and lower bounds— based on the well known notion of uniformization...
    • ...The following notations are mainly taken from [12] and are used similarly in [11]...
    • ...It is based on uniformization [16] for CTMCs and its recent extension to CTMDPs with rewards [11], which, in our notation, treats the approximation of R max (C [0,T] true)...
    • ...Eqns. (9) and (10), combined with the uniformization approach (8), can be used to derive (see [11]) the following sequences of vectors:...
    • ...Notice that for zero rewards for probabilistic reachability, we have w (0) = w (0) = r = 0. From the known bounds for g ∗ , new bounds for g ∗−δ can then be computed as follows (see [11, Theorem 3]): Model Checking Algorithms for CTMDPs 235...
    • ...where d is the decision vector chosen by the selection procedure using g t .A s shown in [11] the local error of a step of length δ is in O(δ 2 ) such that theoretically the...

    Peter Buchholzet al. Model Checking Algorithms for CTMDPs

Order by: