16

Why the letter Q was chosen in the name of Q-learning?

Most letters are chosen as an abbreviation, such as $\pi$ standing for policy and $v$ stands for value. But I don't think Q is an abbreviation of any word.

amoeba
  • 93,463
  • 28
  • 275
  • 317
draw
  • 261
  • 2
  • 6
  • 1
    In my metaphorical understanding, Q is a function that associates a quantity (call it reward, cost or whatever else is being optimized) for an action in a given state. – knk May 16 '18 at 16:00
  • 1
    @sycorax the original question framed implied an understanding of Q-learning and in order to offer an explanation, it would help to add context. The OP would be lost with any explanation without establishing a grounding. – knk May 16 '18 at 16:06
  • Does the metaphorical Q = Quantity help? I think of it as a quantification of action given states – knk May 16 '18 at 16:18

2 Answers2

35

I'm sorry to disappoint everyone, but Q doesn't stand for anything :)

Q-learning was proposed by Watkins in his PhD thesis in 1989, see p.96. The Q in the equation on that page is updated in certain way at each step. The Q is the expected return from action at a given state, see the definition of Q on p.46. The return is in a economic or game theory sense, i.e. discounted probability weighted rewards, not a computer science term like a return from a function.

Notice, how he already used P for probability and R for reward, so he grabbed Q for the return. That's it. There's no deeper meaning for a choice of a letter Q.

Aksakal
  • 55,939
  • 5
  • 90
  • 176
  • 4
    No deeper meaning but it *is* a meaning (that Q fits with P and R in the alphabet) and stands for *something*. – Sextus Empiricus May 16 '18 at 18:29
  • 2
    @MartijnWeterings It's not a meaning at all. It's a purely syntactical choice of letter, with no semantic considerations whatsoever. – David Richerby May 17 '18 at 11:44
  • Sure, there may be few semantic considerations (and this might be debated because differences between latin or greek letters, letters in different positions of the alphabet, or uppercase versus lowercase might form a gray area between syntactics and semantics). I consider the choice of Q as 'meaningful' because the form of the letter (Which is somewhat arbitrary) does express to some extend the meaning of the variable/parameter. The meaning relates to the choice of letter. There would not have been a good choice when u or v would have been chosen, or i,j,k or x,y,z or $\alpha, \beta, \gamma$. – Sextus Empiricus May 17 '18 at 12:29
  • @MartijnWeterings, Q also sounds like a *queue*, which brings somewhat relevant connotations too – Aksakal May 17 '18 at 12:38
  • @Aksakal, that might have been reinforcing the use of Q. But, I don't think it is strong. I don't know much about this topic, but in a quick overview of that thesis to me it seems very plausible that the letter $Q$ has been used for a quantity like $\sum_i R_i P_i$ or $\sum_i V_i P_i$. Eventually 'some name' like 'action-value' was given to it but those letters used in that thesis seem to stick much more to the alphabet. E.g. $f$ $g$ $h$ for functions $x$ $y$ for variables $V$ $U$ for value function and it's approximate. , etc. – Sextus Empiricus May 17 '18 at 13:03
  • @MartijnWeterings, I meant similarly to "queue the music," you say "queue learning" – Aksakal May 17 '18 at 16:44
1

The reason Q-Learning is called so because it uses Q values to form it's estimates. The usual learning rule is, $Q(s_t,a_t)\gets Q(s_t,a_t)+\alpha(r_t+\gamma \times \max_{a} Q(s_{t+1},a)-Q(s_t,a_t))$ and it should be clear why it is called Q-Learning.

But the actual question in my view is why Q-Learning is called so. Though there does not seem to be a satisfactory answer, this link mentions that Andrew Barto, who is one of the founders of Modern Reinforcement Learning, thinks that $Q$ stands for Quality, called so because it characterizes how good the result of pulling an arm would be.

  • 2
    Read that thesis and tell me how "quality" makes a sense in the context of the expected return – Aksakal May 16 '18 at 18:33
  • Though I agree with you, the thesis was written after Watkins consulted Andy about a number of things. Andy may have had a better idea than you think he does. – Ameet Deshpande May 16 '18 at 18:37
  • Quality doesn't even exist as a distinct concept in learning. You can use the word in its usual sense from English, of course. The expected return, on the other hand, is very well defined in game theory, there's no need to dilute it by attaching vague concepts such as quality. You're not maximizing quality, you're maximizing discounted rewards under the suitable probability measure. If you want to be a little more broad, then you can maximize the utility. – Aksakal May 16 '18 at 18:58