In reinforcement learning, there is a known example of a robot/agent that is placed in a random room whose task is to find the exit of a house. (Illustration here http://mnemstudio.org/path-finding-q-learning-tutorial.htm) The house has multiple rooms, with some rooms connected to the others.
The link above provides a simple solved example of how Q-learning works. It is very intuitive and fairly easy to follow.
It repetitively updates the $Q(s,a)$ matrix until it converges. So for every row $s$, I will select the element with the highest value and the find the corresponding column $a$. This means that the $\max {a_i}$ will be the optimum action I should take while in state $s$. This is good for the case of discrete states.
In the case of continuous states, there is a something called the features-based representation. Hence, $Q(s,a) = w_1f_1(s,a) + w_2f_2(s,a) +\cdots +w_nf_n(s,a)$. $f_n$ are the features and $w_n$ are the weights.
My questions are:
What will happen to this $Q(s,a)$ function in the continuous case? Will it also update itself repetitively like the one in the discrete case?
If it does update and 'converge', how should we interpret the result?
Your insights will be very helpful.