Classical reinforcement learning (Q- or Sarsa-Learning) can be extended with models of the environment. These models are usually transition tables that contain the probability of arriving at a particular state given another state and one action.
In model-free learning, these transition probabilities are "incorporated" into the evaluation function. Having only a single transition table as a model, therefore, is no advantage over the model-free variants. Its state predictions are just as limited by the Markov property as the evaluation prediction of model-free variants is.
Are there machine learning methods to automatically generate the model for an exploring agent with incomplete knowledge about its environment (e.g. its position) that go beyond the update of a single transition table?
Are there, for example, methods that generate different transition tables which "segment" the environment based on their predictive success?
Reviewing the literature I could not find any answer. The Encyclopedia of Machine Learning provides a current overview of Hierarchical Reinforcement Learning. But this is different from what I want to know, because of at least one of the following reasons.
- The model is already given, the problem is the learning of an evaluation function with a hierarchical model.
- Learning the model is intricately connected to external reward, not independent from it.
- All these methods are concerned with optimizing reinforcement learning by pooling different states that can be treated as one. I am looking for one that differentiates apparently identical states to improve predictions. (This also distinguishes my question from this question.)
A simple illustration is a grid world where the agent has to reach a particular goal position. In contrast to traditional reinforcement learning as well as the hierarchical methods mentioned above, however, the state of the agent is not its absolute position in the environment but the four cells surrounding it. This introduces transition ambiguity.
I am looking for a way to automatically resolve this ambiguity.
Edit:
xxxxxx
x....x
xxxxxx
An agent that moves in the above grid world and that perceives the four surrounding cells, for example, might model the environment with two separate transition tables. One for the left half and one for the right half of the environment. Individually, each table is unambiguous, although one single transition table for the whole environment would not be.
Edit:
A single ambiguous transition table (the states are the cells north, east, south, and west of the agent respectively and the actions are movements in one of these directions):
x.xx x.x. xxx.
x.xx, n 1 0 0
x.xx, e 0 1 0
x.xx, s 1 0 0
x.xx, w 1 0 0
x.x., n 1 0 0
x.x., e 0 1 1
x.x., s 0 1 0
x.x., w 1 1 0
xxx., n 0 0 1
xxx., e 0 0 1
xxx., s 0 0 1
xxx., w 0 1 0
Notice the ambiguity in x.x., e
and x.x., w
. This ambiguity can be resolved by "segmenting" the environment or splitting up the transition table as follows.
Two unambiguous transition tables
x.xx x.x.
x.xx, n 1 0
x.xx, e 0 1
x.xx, s 1 0
x.xx, w 1 0
x.x., n 0 1
x.x., e 0 1
x.x., s 0 1
x.x., w 1 0
x.x. xxx.
xxx., n 0 1
xxx., e 0 1
xxx., s 0 1
xxx., w 1 0
x.x., n 1 0
x.x., e 0 1
x.x., s 1 0
x.x., w 1 0
Edit: Related question