## Paper Abstracts

### Sampling Shape Contours Using Optimization over a Geometric Graph

#### Abstract

Consider selecting points on a contour in the x-y plane. In shape analysis, this is frequently referred to as contour sampling. It is important to select the points such that they effectively represent the shape of the contour. Generally, the stroke order and number of strokes are informative for that purpose. Several effective methods exist for sampling contours drawn with a certain stroke order and number of strokes, such as the English alphabet or Arabic figures. However, many contours entail an uncertain stroke order and number of strokes, such as pictures of symbols, and little research has focused on methods for sampling such contours. This is because selecting the points in this case typically requires a large computational cost to check all the possible choices. In this paper, we present a sampling method that is useful regardless of whether the contours are drawn with a certain stroke order and number of strokes or not. Our sampling method thereby expands the application possibilities of contour processing. We formulate contour sampling as a discrete optimization problem that can be solved using a type of direct search. Based on a geometric graph whose vertices are the points and whose edges form rectangles, we construct an effective objective function for the problem. Using different shape datasets, we demonstrate that our sampling method is effective with respect to shape representation and retrieval.

#### Keywords

contour sampling, shape representation, shape retrieval, geometric graph
[B A C K]

### Extending the Peak Bandwidth of Parameters for Softmax Selection in Reinforcement Learning

#### Abstract

Softmax selection is one of the most popular methods for action selection in reinforcement learning. Although various recently proposed methods may be more effective with full parameter tuning, implementing a complicated method that requires the tuning of many parameters can be difficult. Thus, softmax selection is still worth revisiting, considering the cost savings of its implementation and tuning. In fact, this method works adequately in practice with only one parameter appropriately set for the environment. The aim of this paper is to improve the variable setting of this method to extend the bandwidth of good parameters, thereby reducing the cost of implementation and parameter tuning. To achieve this, we take advantage of the asymptotic equipartition property in a Markov decision process to extend the peak bandwidth of softmax selection. Using a variety of episodic tasks, we show that our setting is effective in extending the bandwidth and that it yields a better policy in terms of stability. The bandwidth is quantitatively assessed in a series of statistical tests.

#### Keywords

reinforcement learning, softmax selection, parameter bandwidth, asymptotic equipartition property
[B A C K]

### A Spatially Correlated Mixture Model for Image Segmentation

#### Abstract

In image segmentation, finite mixture modeling has been widely used. In its simplest form, the spatial correlation among neighboring pixels is not taken into account, and its segmentation results can be largely deteriorated by noise in images. We propose a spatially correlated mixture model in which the mixing proportions of finite mixture models are governed by a set of underlying functions defined on the image space. The spatial correlation among pixels is introduced by putting a Gaussian process prior on the underlying functions. We can set the spatial correlation rather directly and flexibly by choosing the covariance function of the Gaussian process prior. The effectiveness of our model is demonstrated by experiments with synthetic and real images.

#### Keywords

image segmentation, Gaussian processes, mixture models
[B A C K]

### Marginalized Viterbi Algorithm for Hierarchical Hidden Markov Models

#### Abstract

The generalized Viterbi algorithm, a direct extension of the Viterbi algorithm for hidden Markov models (HMMs), has been used to find the most likely state sequence for hierarchical HMMs. However, the generalized Viterbi algorithm finds the most likely whole level state sequence rather than the most likely upper level state sequence. In this paper, we propose a marginalized Viterbi algorithm, which finds the most likely upper level state sequence by marginalizing lower level state sequences. We show experimentally that the marginalized Viterbi algorithm is more accurate than the generalized Viterbi algorithm in terms of upper level state sequence estimation.

#### Keywords

time series data, hierarchical HMM, finding the most likely state sequence, generalized Viterbi algorithm, marginalized Viterbi algorithm
[B A C K]

### Matching Handwritten Line Drawings with Von Mises Distributions

#### Abstract

A two-dimensional shape is generally represented with line drawings or object contours in a digital image. Shapes can be divided into two types, namely ordered and unordered shapes. An ordered shape is an ordered set of points, while an unordered shape is an unordered set. As a result, each type typically uses different attributes to define the local descriptors involved in representing the local distributions of points sampled from the shape. Throughout this paper, we focus on unordered shapes. Since most local descriptors of unordered shapes are not scale-invariant, we usually make the shapes in an image data set the same size through scale normalization, before applying shape matching procedures. Shapes obtained through scale normalization are suitable for such descriptors if the original whole shapes are similar. However, they are not suitable if parts of each original shape are drawn using different scales. Thus, in this paper, we present a scale-invariant descriptor constructed by von Mises distributions to deal with such shapes. Since this descriptor has the merits of being both scale-invariant and a probability distribution, it does not require scale normalization and can employ an arbitrary measure of probability distributions in matching shape points. In experiments on shape matching and retrieval, we show the effectiveness of our descriptor, compared to several conventional descriptors.

#### Keywords

shape descriptor, shape matching, shape retrieval, von Mises distribution, scale invariance
[B A C K]

### An Information-Theoretic Analysis of Return Maximization in Reinforcement Learning

#### Abstract

We present a general analysis of return maximization in reinforcement learning. This analysis does not require assumptions of Markovianity, stationarity, and ergodicity for the stochastic sequential decision processes of reinforcement learning. Instead, our analysis assumes the asymptotic equipartition property fundamental to information theory, providing a substantially different view from that in the literature. As our main results, we show that return maximization is achieved by the overlap of typical and best sequence sets, and we present a class of stochastic sequential decision processes with the necessary condition for return maximization. We also describe several examples of best sequences in terms of return maximization in the class of stochastic sequential decision processes, which satisfy the necessary condition.

#### Keywords

reinforcement learning, stochastic sequential decision process, information theory, asymptotic equipartition property
[B A C K]

### A Redundancy-Based Measure of Dissimilarity among Probability Distributions for Hierarchical Clustering Criteria

#### Abstract

We introduce novel dissimilarity into a probabilistic clustering task to properly measure dissimilarity among multiple clusters when each cluster is characterized by a subpopulation in the mixture model. This measure of dissimilarity is called redundancy-based dissimilarity among probability distributions. From aspects of both source coding and a statistical hypothesis test, we shed light on several of the theoretical reasons for the redundancy-based dissimilarity among probability distributions being a reasonable measure of dissimilarity among clusters. We also elucidate a principle in common for the measures of redundancy-based dissimilarity and Ward's method in terms of hierarchical clustering criteria. Moreover, we show several related theorems that are significant for clustering tasks. In the experiments, properties of the measure of redundancy-based dissimilarity are examined in comparison with several other measures.

#### Keywords

clustering, mixture model, dissimilarity measure, information theory, Ward's method
[B A C K]

### A Statistical Property of Multiagent Learning Based on Markov Decision Process

#### Abstract

We exhibit an important property called the asymptotic equipartition property on empirical sequences in an ergodic multi-agent Markov decision process. Using the asymptotic equipartition property which facilitates the analysis of multi-agent learning, we give a statistical property of multi-agent learning, such as reinforcement learning, near the end of the learning process. We examine the effect of the conditions among the agents on the achievement of a cooperative policy in three different cases: blind, visible, and communicable. Also, we derive a bound on the speed with which the empirical sequence converges to the best sequence in probability, so that the multi-agent learning yields the best cooperative result.

#### Keywords

multi-agent system, reinforcement learning, Markov decision process, asymptotic equipartition property, stochastic complexity
[B A C K]

### The Asymptotic Equipartition Property in Reinforcement Learning and its Relation to Return Maximization

#### Abstract

We discuss an important property called the asymptotic equipartition property on empirical sequences in reinforcement learning. This states that the typical set of empirical sequences has probability nearly one, that all elements in the typical set are nearly equi-probable, and that the number of elements in the typical set is an exponential function of the sum of conditional entropies if the number of time steps is sufficiently large. The sum is referred to as stochastic complexity. Using the property we elucidate the fact that the return maximization depends on two factors, the stochastic complexity and a quantity depending on the parameters of environment. Here, the return maximization means that the best sequences in terms of expected return have probability one. We also examine the sensitivity of stochastic complexity, which is a qualitative guide in tuning the parameters of action-selection strategy, and show a sufficient condition for return maximization in probability.

#### Keywords

reinforcement learning, Markov decision process, information theory, asymptotic equipartition property, stochastic complexity, return maximization
[B A C K]

### On the Effects of Domain Size and Complexity in Empirical Distribution of Reinforcement Learning

#### Abstract

We regard the events of a Markov decision process as the outputs from a Markov information source in order to analyze the randomness of an empirical sequence by the codeword length of the sequence. The randomness is an important viewpoint in reinforcement learning since the learning is to eliminate the randomness and to find an optimal policy. The occurrence of optimal empirical sequence also depends on the randomness. We then introduce the Lempel-Ziv coding for measuring the randomness which consists of the domain size and the stochastic complexity. In experimental results, we confirm that the learning and the occurrence of optimal empirical sequence depend on the randomness and show the fact that in early stages the randomness is mainly characterized by the domain size and as the number of time steps increases the randomness depends greatly on the complexity of Markov decision processes.

#### Keywords

reinforcement learning, Markov decision process, Lempel-Ziv coding, domain size, stochastic complexity
[B A C K]

### A New Criterion Using Information Gain for Action Selection Strategy in Reinforcement Learning

#### Abstract

In this paper, we regard the sequence of returns as outputs from a parametric compound source. Utilizing the fact that the coding rate of the source shows the amount of information about the return, we describe $\ell$-learning algorithms based on the predictive coding idea for estimating an expected information gain concerning future information, and give a convergence proof of the information gain. Using the information gain, we propose the ratio $w$ of return loss to information gain as a new criterion to be used in probabilistic action selection strategies. In experimental results, we found that our $w$-based strategy performs well compared with the conventional Q-based strategy.

#### Keywords

reinforcement learning, predictive coding, information gain, probabilistic action selection strategy
[B A C K]

### A Generation Method of Initial Training Data on Active Learning of Multi-layer Perceptron

#### Abstract

Many active learnings in the training of a partially trained MLP have been proposed. We note any active learning performance depends on initial training data. The initial training data plays an important role for active learning performance, because any active learning algorithm generates additional training data, based on existing training data. Most of conventional methods have generated initial data at random using a pseudo-random number. However, in some practical cases, we can not prepare enough data by the limit of time and cost. Therefore, the bias of initial data becomes critical, especially in the case of a small initial data. In this paper, by using $L_{2}$-discrepancy, we verify the effect of the bias of initial data for the resulting classification accuracy. We consequently confirm an advantage of low-discrepancy sequence as a generation method of initial training data, compared with a pseudo-random number. Finally, we also discuss some advantages and disadvantages of low discrepancy sequences.

#### Keywords

active learning, multi-layer perceptron, initial training data, pseudo-random number, low-discrepancy sequence
[B A C K]

All Right Reserved by Kazunori Iwata