### An Information Theoretic Analysis of Reinforcement Learning

#### Kazunori Iwata

*Ph.D. Dissertation*, Graduate School of Informatics, Kyoto University, Japan, Mar. 2005.-
#### Abstract

This thesis builds a novel bridge between reinforcement learning and information theory. Reinforcement learning is a very interesting framework in the sense that an empirical sequence composed of states, actions, and rewards is generated from two different sources: one is the policy which an agent decides and another is the probabilistic structure of the unknown environment of the agent. Accordingly, the empirical sequence is used as training data for estimating the probabilistic structure of the environment, while the empirical sequence itself is to be optimized for maximizing the (discounted) sum of rewards by improving the policy. Although information theory has established some effective tools for the estimation and some comprehensive laws for analyses of such complicated sources, it is somewhat surprising that this fruitful theory had not been utilized before in the field of reinforcement learning. In information theory, the source coding methods estimate the probabilistic structure of information source for efficiently allotting codewords to each sequence. This is essentially equivalent to that the agent estimates the probabilistic structure of the environment. Also, optimizing the policy means changing the probabilistic structure of information source so that most likely outcomes asymptotically include most preferred ones with probability one. How close the relationship between the two fields is ! In this study, Shannon's theory is introduced based on the so-called type method as a basic but powerful tool for the analysis of reinforcement learning process. Formulating return maximization in reinforcement learning in terms of information theory, this study focuses on the asymptotic property about the process of single/multi-agent reinforcement learning. The analysis in this study sheds further light on general stochastic processes of reinforcement learning such as non-Markovian, non-stationary, and/or non-ergodic processes, which the traditional theory of reinforcement learning cannot deal with. Then, several experimental analyses and applications are given, based on the derived theoretical results. The topics treated in this study are important for considering the theoretical and practical aspects of reinforcement learning.#### Contents

- Preface
- Acknowledgments
- 1. Introduction
- 1.1 Standard Framework of Reinforcement Learning
- 1.2 Preview of This study
- 2. Analysis of Return Maximization Using the Asymptotic Equipartition Property: Single Agent Case
- 2.1 Introduction
- 2.2 Type of Empirical Sequence
- 2.3 Asymptotic Equipartition Property
- 2.4 Role of Stochastic Complexity
- 2.5 Conclusions
- 2.6 Related Theorems
- 2.7 Proofs
- 3. Analysis of Return Maximization Using the Asymptotic Equipartition Property: Multi-Agent Case
- 3.1 Introduction
- 3.2 Framework of Multi-Agent Reinforcement Learning
- 3.3 Type of Empirical Sequence
- 3.4 Main Results
- 3.5 Discussions and Conclusions
- 4. Analysis of Return Maximization Using the Asymptotic Equipartition Property in General Processes
- 4.1 Introduction
- 4.2 Asymptotic Equipartition Property
- 4.3 Main Results
- 4.4 Conclusions
- 4.5 Proof
- 5. Effects of Domain Size and Complexity on Empirical Distribution
- 5.1 Introduction
- 5.2 Description as Information Source
- 5.3 Lempel-Ziv Coding
- 5.4 Experiments
- 5.5 Discussions and Conclusions
- 6. A New Criterion Using Infomation Gain for Action Selection Strategy
- 6.1 Introduction
- 6.2 Source Coding for Return Sequence
- 6.3 An Example of Applications
- 6.4 Discussions and Conclusions
- 6.5 Rough Proof of Information Gain Convergence
- 7. Conclusions
- Bibliography
- Index

#### How to get this ?

Send me e-mail, please. I send the PDF file of this dissertation soon. This is also available in Kyoto University Library.

All Right Reserved by Kazunori Iwata