Q-learning algorithm in reinforcement learning

Question

In reinforcement learning, there is a known example of a robot/agent that is placed in a random room whose task is to find the exit of a house. (Illustration here http://mnemstudio.org/path-finding-q-learning-tutorial.htm) The house has multiple rooms, with some rooms connected to the others.

The link above provides a simple solved example of how Q-learning works. It is very intuitive and fairly easy to follow.

It repetitively updates the $Q(s,a)$ matrix until it converges. So for every row $s$, I will select the element with the highest value and the find the corresponding column $a$. This means that the $\max {a_i}$ will be the optimum action I should take while in state $s$. This is good for the case of discrete states.

In the case of continuous states, there is a something called the features-based representation. Hence, $Q(s,a) = w_1f_1(s,a) + w_2f_2(s,a) +\cdots +w_nf_n(s,a)$. $f_n$ are the features and $w_n$ are the weights.

My questions are:

What will happen to this $Q(s,a)$ function in the continuous case? Will it also update itself repetitively like the one in the discrete case?
If it does update and 'converge', how should we interpret the result?

Your insights will be very helpful.

Please reply if you disagree on the duplicate. I got the feeling that the main source of confusion here is how to get from the tabular case to the continuous case. I have tried to answer that in the linked question. — mlwida, Dec 21 '15 at 12:01

score 1 · Answer 1 · answered Dec 04 '15 at 09:07

One way to make the connection between discrete and continuous cases is to think of the discrete case as having indicator features for every state. So yes the update for the Q function still works. This is a standard thing done for large state spaces and tabular representations just cannot cope anymore.

Q-learning algorithm in reinforcement learning

1 Answers1