Ph.D. Dissertation, Graduate School of Informatics, Kyoto
University, Japan, Mar. 2005.
This thesis builds a novel bridge between reinforcement learning and
Reinforcement learning is a very interesting framework in the sense
that an empirical sequence composed of states, actions, and rewards
is generated from two different sources: one is the policy which an
agent decides and another is the probabilistic structure of the
unknown environment of the agent.
Accordingly, the empirical sequence is used as training data for
estimating the probabilistic structure of the environment, while the
empirical sequence itself is to be optimized for maximizing the
(discounted) sum of rewards by improving the policy.
Although information theory has established some effective tools for
the estimation and some comprehensive laws for analyses of such
complicated sources, it is somewhat surprising that this fruitful
theory had not been utilized before in the field of reinforcement
In information theory, the source coding methods estimate the
probabilistic structure of information source for efficiently
allotting codewords to each sequence.
This is essentially equivalent to that the agent estimates the
probabilistic structure of the environment.
Also, optimizing the policy means changing the probabilistic structure
of information source so that most likely outcomes asymptotically
include most preferred ones with probability one.
How close the relationship between the two fields is !
In this study, Shannon's theory is introduced based on the so-called
type method as a basic but powerful tool for the analysis of
reinforcement learning process.
Formulating return maximization in reinforcement learning in terms of
information theory, this study focuses on the asymptotic property
about the process of single/multi-agent reinforcement learning.
The analysis in this study sheds further light on general
stochastic processes of reinforcement learning such as
non-Markovian, non-stationary, and/or non-ergodic processes, which the
traditional theory of reinforcement learning cannot deal with.
Then, several experimental analyses and applications are given, based
on the derived theoretical results.
The topics treated in this study are important for considering the
theoretical and practical aspects of reinforcement learning.
- 1. Introduction
- 1.1 Standard Framework of Reinforcement Learning
- 1.2 Preview of This study
- 2. Analysis of Return Maximization Using the Asymptotic Equipartition
Property: Single Agent Case
- 2.1 Introduction
- 2.2 Type of Empirical Sequence
- 2.3 Asymptotic Equipartition Property
- 2.4 Role of Stochastic Complexity
- 2.5 Conclusions
- 2.6 Related Theorems
- 2.7 Proofs
- 3. Analysis of Return Maximization Using the Asymptotic Equipartition
Property: Multi-Agent Case
- 3.1 Introduction
- 3.2 Framework of Multi-Agent Reinforcement Learning
- 3.3 Type of Empirical Sequence
- 3.4 Main Results
- 3.5 Discussions and Conclusions
- 4. Analysis of Return Maximization Using the Asymptotic Equipartition
Property in General Processes
- 4.1 Introduction
- 4.2 Asymptotic Equipartition Property
- 4.3 Main Results
- 4.4 Conclusions
- 4.5 Proof
- 5. Effects of Domain Size and Complexity on Empirical Distribution
- 5.1 Introduction
- 5.2 Description as Information Source
- 5.3 Lempel-Ziv Coding
- 5.4 Experiments
- 5.5 Discussions and Conclusions
- 6. A New Criterion Using Infomation Gain for Action Selection Strategy
- 6.1 Introduction
- 6.2 Source Coding for Return Sequence
- 6.3 An Example of Applications
- 6.4 Discussions and Conclusions
- 6.5 Rough Proof of Information Gain Convergence
- 7. Conclusions
How to get this ?
Send me e-mail, please. I send the PDF file of this dissertation soon.
This is also available in Kyoto University Library.
[B A C K]
All Right Reserved by Kazunori Iwata