Highest Voted 'sarsa' Questions - Statistical Analysis Stack Exchange

2

votes

1 answer

Policy improvement in SARSA and Q learning

I have a rather trivial doubt in SARSA and Q learning. Looking at the pseudocode of the two algorithms in Sutton&Barto book, I see the policy improvement step is missing. How will I get the optimal policy by the two algorithms? Are they used to find…

asked Apr 07 '21 at 11:45

Jor_El

391
3
9

2

votes

1 answer

SARSA when policy is not epilon greedy

I would like to clarify a doubt that I have regarding SARSA. SARSA can be used for optimal control when the policy to take action $a$ is epsilon greedy. Suppose that the policy to take action $a$ is not an epsilon greedy one, but some other policy,…

reinforcement-learning q-learning sarsa

asked Oct 09 '20 at 14:22

calveeen

746
1
10

2

votes

1 answer

Can Q-learning or SARSA be used to find an stochastic policy?

If the optimal policy is known to be stochastic (e.g. like in the stone, paper, scissors game), can this stochastic policy be found using SARSA or Q-learning, or is it only possible with policy gradient approaches?

reinforcement-learning q-learning sarsa policy-gradient

asked Nov 30 '17 at 10:51

aorj

33
4

1

vote

0 answers

ε-Greedy with Q learning / SARSA can have stochastic policy?

Hello I'm now studying Q learning and SARSA with ε-Greedy , Softmax startegies. And have a question about my readings. In my readings, when SARSA with ε-Greedy, SARSA causes value-function oscilliations in case of stochastic polices. but I think…

reinforcement-learning q-learning sarsa stochastic-policy

asked Jan 17 '21 at 09:56

BE LEO

11
1

1

vote

1 answer

In SARSA and Q-learning algorithms in RL, is policy updated during the iteration for Q-value learning?

In the video by Prof Brunskill "Stanford CS234 winter 2019 lecture 4" for model-free control (https://www.youtube.com/watch?v=j080VBVGkfQ), at 57:49/1:17:45, the pseudo code for SARSA includes line 8 for e-greedy update of the current policy pi. It…

reinforcement-learning algorithms sarsa

asked Jul 22 '20 at 19:20

Ruye

11
3

1

vote

0 answers

Expected SARSA, SARSA and Q-learning

I would much appreciate if you could point me in the right direction regarding this question about targets for approximate q-function for SARSA, Expected SARSA, Q-learning (notation: S is the current state, A is the current action, R is the reward,…

reinforcement-learning q-learning sarsa

asked Jan 21 '20 at 08:18

Novak

111
4

1

vote

0 answers

How sensitive is reinforcement learning to the neural network structure

I am trying out Sarsa deep reinforcement learning on OpenAI gym CartPole-v0 problem. The state has 4 continuous features and the action is binary with either 0 or 1. The state-action vector is then fed to a neural network to output the state-action…

reinforcement-learning sarsa

asked Sep 15 '19 at 08:32

Le Hoang Long

11
1

1

vote

1 answer

Differences between Sarsa and Q-learning control procedural algorithms

I am referring to pages 130-131 of Sutton and Barto book on Reinforcement Learning available here: book I don't understand the slight difference that there is between the two procedural algorithms described respectively at page 130 for Sarsa and at…

q-learning sarsa

asked Mar 15 '19 at 16:31

hardhu

133
3

1

vote

0 answers

Convergence criterion for R-learning algorithm

I'm trying to find a policy for a simple game using R-learning algorithm. I have a field with values (agent can move in 4 directions) and the goal is to get from starting point to finish point with the highest score. Final policy gives me…

machine-learning markov-process reinforcement-learning q-learning sarsa

asked Dec 12 '18 at 13:38

Most Wanted

255
1
13

0

votes

0 answers

Building a simulator for continuous state, discrete action reinforcement learning

I am trying to build a simulator that optimizes the performance and temperature of a device. I want the device to perform well, but without making the device too hot. If the device becomes too hot, I want the internal circuitry to push down the…

simulation reinforcement-learning q-learning sarsa

asked Oct 22 '21 at 01:34

user338531

1

0

votes

1 answer

Deduce the Bellman equation from the Value and Q functions

I am trying to derive/deduce the bellman equation using Value and Q-functions. I came only so far with understanding it and tried it myself in Latex: Why is the $V^*$ suddenly in $Q^\pi$ function? Why not $Q^\pi = r + \gamma Q^\pi(s_{t+1},…

reinforcement-learning q-learning differential-equations sarsa

asked Mar 17 '21 at 12:15

johnny_1010

1
1

0

votes

1 answer

Purpose of trace-decay parameter in eligibility traces

In TD/SARSA-lambda, eligibility traces are decayed after each step by multiplying by the discount rate and the trace-decay parameter. I understand that: The discount rate is used to reduce the value of future actions relative to a state. An…

reinforcement-learning sarsa

asked Mar 20 '20 at 22:17

Levi Botelho

103
3

0

votes

1 answer

Episodic Semi-gradient Q-learning for Estimating approximation of optimal action-value function

at page 244 of Sutton and Barto book on Reinforcement Learning (book) is described the pseudocode for episodic semi-gradient Sarsa, while it is never given a pseudocode for the corresponding episodic semi-gradient Q-learning. I am aware of the…

q-learning sarsa

asked May 14 '19 at 14:48

hardhu

133
3

0

votes

2 answers

Different algorithms categorized in reinforcement learning

For some time I am going through reinforcement learning, and have found a lot of diverse information specially in area of Policies (algorithms). I figured out that policies can be classified in On Vs Off, Model based vs Model Free, Also, these are…

reinforcement-learning q-learning sarsa deterministic-policy

asked Apr 07 '19 at 07:57

Sandeep Bhutani

101
4

Questions tagged [sarsa]