Some contributions to Markov decision processes

Chu, Shanyun
Some contributions to Markov decision processes. PhD thesis, University of Liverpool.

[thumbnail of ChuShanyun_Jul2015_2038000.pdf] Text
ChuShanyun_Jul2015_2038000.pdf - Unspecified
Available under License Creative Commons Attribution.

Download (744kB)


In a nutshell, this thesis studies discrete-time Markov decision processes (MDPs) on Borel Spaces, with possibly unbounded costs, and both expected (discounted) total cost and long-run expected average cost criteria. In Chapter 2, we systematically investigate a constrained absorbing MDP with expected total cost criterion and possibly unbounded (from both above and below) cost functions. We apply the convex analytic approach to derive the optimality and duality results, along with the existence of an optimal finite mixing policy. We also provide mild conditions under which a general constrained MDP model with state-action-dependent discount factors can be equivalently transformed into an absorbing MDP model. Chapter 3 treats a more constrained absorbing MDP, as compared with that in Chapter 2. The dynamic programming approach is applied to a reformulated unconstrained MDP model and the optimality results are obtained. In addition, the correspondence between policies in the original model and the reformulated one is illustrated. In Chapter 4, we attempt to extend the dynamic programming approach for standard MDPs with expected total cost criterion to the case, where the (iterated) coherent risk measure of the cost is taken as the performance measure to be minimized. The cost function under our consideration is allowed to be unbounded from the below, and possibly arbitrarily unbounded from the above. Under a fairly weak version of continuity-compactness conditions, we derive the optimality results for both the finite and infinite horizon cases, and establish value iteration as well as policy iteration algorithms. The standard MDP and the iterated conditional value-at-risk of the cost function are illustrated as two examples. Chapter 5 and 6 tackle MDPs with long-run expected average cost criterion. In Chapter 5, we consider a constrained MDP with possibly unbounded (from both above and below) cost functions. Under Lyapunov-like conditions, we show the sufficiency of stable policies to the concerned constrained problem. Furthermore, we introduce the corresponding space of performance vectors and manage to characterize each of its extreme points with a deterministic stationary policy. Finally, the existence of an optimal finite mixing policy is justified. Chapter 6 concerns an unconstrained MDP with the cost functions unbounded from the below and possibly arbitrarily unbounded from the above. We provide a detailed discussion on the issue of sufficient policies in the denumerable case, establish the average cost optimality inequality (ACOI) and show the existence of an optimal deterministic stationary policy. In Chapter 7, an inventory-production system is taken as an example of real-world applications to illustrate the main results in Chapter 2 and 5.

Item Type: Thesis (PhD)
Additional Information: Date: 2015-07 (completed)
Depositing User: Symplectic Admin
Date Deposited: 01 Feb 2016 16:47
Last Modified: 17 Dec 2022 01:06
DOI: 10.17638/02038000