Highest Voted 'q-learning' Questions - Statistical Analysis Stack Exchange

24

votes

4 answers

Why does Q-Learning use epsilon-greedy during testing?

In DeepMind's paper on Deep Q-Learning for Atari video games (here), they use an epsilon-greedy method for exploration during training. This means that when an action is selected in training, it is either chosen as the action with the highest…

machine-learning reinforcement-learning q-learning

asked Mar 29 '17 at 17:35

Karnivaurus

5,909
10
36
52

21

votes

1 answer

What is the difference between episode and epoch in deep Q learning?

I am trying to understand the famous paper "Playing Atari with Deep Reinforcement Learning" (pdf). I am unclear about the difference between an epoch and episode. In algorithm $1$, the outer loop is over episodes, while in figure $2$ the x-axis is…

neural-networks terminology reinforcement-learning q-learning

asked Dec 11 '16 at 17:45

A.D

2,114
3
17
27

19

votes

2 answers

How exactly to compute Deep Q-Learning Loss Function?

I have a doubt about how exactly the loss function of a Deep Q-Learning Network is trained. I am using a 2 layer feedforward network with linear output layer and relu hidden layers. Let's suppose I have 4 possible actions. Thus, the output of…

least-squares deep-learning loss-functions reinforcement-learning q-learning

asked Dec 02 '16 at 22:51

A.D

2,114
3
17
27

16

votes

2 answers

Why was the letter Q chosen in Q-learning?

Why the letter Q was chosen in the name of Q-learning? Most letters are chosen as an abbreviation, such as $\pi$ standing for policy and $v$ stands for value. But I don't think Q is an abbreviation of any word.

terminology reinforcement-learning history q-learning

asked May 16 '18 at 15:53

draw

261
2
6

14

votes

4 answers

Why don't we use importance sampling for one step Q-learning?

Why don't we use importance sampling for 1-step Q-learning? Q-learning is off-policy which means that we generate samples with a different policy than we try to optimize. Thus it should be impossible to estimate the expectation of the return for…

reinforcement-learning q-learning

asked Mar 19 '18 at 10:39

siva

451
3
12

12

votes

2 answers

Is planning in Dyna-Q a form of experience replay?

In Richard Sutton's book on RL (2nd edition), he presents the Dyna-Q algorithm, which combines planning and learning. In the planning part of the algorithm, the Dyna-agent randomly samples n state-action pairs $(s, a)$ previously seen by the agent,…

reinforcement-learning q-learning

asked Feb 25 '18 at 19:44

Julep

485
3
11

12

votes

5 answers

epsilon-greedy policy improvement?

I am learning reinforcement learning from David Silver's open course and Richard Sutton's book. While I enjoy the course and the book much, I am currently confused in $\epsilon$-greedy policy improvement. Both the book and the open course have a…

reinforcement-learning q-learning

asked Nov 27 '16 at 09:08

Mou

638
2
5
14

11

votes

2 answers

Reinforcement learning in non stationary environment

Q1: Are there common or accepted methods for dealing with non stationary environment in Reinforcement learning in general? Q2: In my gridworld, I have the reward function changing when a state is visited. Every episode the rewards reset to the…

markov-process reinforcement-learning stationarity q-learning

asked Oct 18 '17 at 14:45

Voltronika

213
3
7

10

votes

2 answers

Overview over Reinforcement Learning Algorithms

I'm currently searching for an Overview over Reinforcement Learning Algorithms and maybe a classification of them. But next to Sarsa and Q-Learning + Deep Q-Learning I can't really find any popular algorithms. Wikipedia gives me an overview over…

reinforcement-learning q-learning

asked Jan 24 '18 at 13:33

greece57

201
1
4

10

votes

1 answer

How efficient is Q-learning with Neural Networks when there is one output unit per action?

Background: I am using Neural Network Q-value approximation in my reinforcement learning task. The approach is exactly the same as one described in this question, however the question itself is different. In this approach the number of outputs is…

machine-learning neural-networks reinforcement-learning q-learning

asked Jun 26 '16 at 11:57

Serhiy

959
1
8
11

9

votes

1 answer

Is Deep-Q Learning inherently unstable

I'm reading Barto and Sutton's Reinforcement Learning and in it (chapter 11) they present the "deadly triad": Function approximation Bootstrapping Off-policy training And they state that an algorithm which uses all 3 of these is unstable and…

deep-learning reinforcement-learning q-learning

asked Jun 12 '18 at 23:46

enumaris

1,075
2
9
19

8

votes

1 answer

Proof of Convergence for SARSA/Q-Learning Algorithm

I would like to ask if someone can refer to me the paper containing the proof of convergence of $Q-$learning/SARSA (either/both), one of the learning algorithms in reinforcement learning. The iterative algorithm for SARSA is as follows: $$ Q(s_t,…

reinforcement-learning q-learning

asked Jul 19 '17 at 11:42

cgo

7,445
10
42
61

8

votes

3 answers

Why there is no transition probability in Q-Learning (reinforcement learning)?

In reinforcement learning, our goal is to optimize state-value function or action-value function, which are defined as following: $V^{\pi}_s = \sum p(s'|s,\pi(s))[r(s'|s,\pi(s))+\gamma V^{\pi}(s')]=E_{\pi}[r(s'|s,a)+\gamma…

reinforcement-learning q-learning

asked Dec 18 '16 at 19:20

whatsname

113
1
7

7

votes

1 answer

MDP and Sate Value Finding?

I have a complex MDP (I think) as follows. anyone can describe me simply how the value for state $V(A)^*$ is find? First Update: really for this solved question I need a canonical answer, step by step solution, if any for learning purpose. Second…

machine-learning mathematical-statistics markov-process reinforcement-learning q-learning

asked Nov 12 '20 at 22:34

Maryam Panahi

29
5

7

votes

1 answer

Q-learning when to stop training?

I'm using Q-learning for my side project. After few million episodes, I found the cumulative rewards seems to reach stable. I'm wondering if there's a scientific way(s) to determine when to stop training rather than observe the cumulative rewards.

machine-learning reinforcement-learning q-learning

asked Jan 13 '18 at 15:16

user2131907

173
1
5

Questions tagged [q-learning]