Why the letter Q was chosen in the name of Q-learning?
Most letters are chosen as an abbreviation, such as $\pi$ standing for policy and $v$ stands for value. But I don't think Q is an abbreviation of any word.
Why the letter Q was chosen in the name of Q-learning?
Most letters are chosen as an abbreviation, such as $\pi$ standing for policy and $v$ stands for value. But I don't think Q is an abbreviation of any word.
I'm sorry to disappoint everyone, but Q doesn't stand for anything :)
Q-learning was proposed by Watkins in his PhD thesis in 1989, see p.96. The Q in the equation on that page is updated in certain way at each step. The Q is the expected return from action at a given state, see the definition of Q on p.46. The return is in a economic or game theory sense, i.e. discounted probability weighted rewards, not a computer science term like a return from a function.
Notice, how he already used P for probability and R for reward, so he grabbed Q for the return. That's it. There's no deeper meaning for a choice of a letter Q.
The reason Q-Learning is called so because it uses Q values to form it's estimates. The usual learning rule is, $Q(s_t,a_t)\gets Q(s_t,a_t)+\alpha(r_t+\gamma \times \max_{a} Q(s_{t+1},a)-Q(s_t,a_t))$ and it should be clear why it is called Q-Learning.
But the actual question in my view is why Q-Learning is called so. Though there does not seem to be a satisfactory answer, this link mentions that Andrew Barto, who is one of the founders of Modern Reinforcement Learning, thinks that $Q$ stands for Quality, called so because it characterizes how good the result of pulling an arm would be.