Highest Voted 'stochastic-policy' Questions - Statistical Analysis Stack Exchange

8

votes

2 answers

Is a policy always deterministic in reinforcement learning?

In reinforcement learning, is a policy always deterministic, or is it a probability distribution over actions (from which we sample)? If the policy is deterministic, why is not the value function, which is defined at a given state for a given policy…

reinforcement-learning deterministic-policy stochastic-policy

asked Dec 01 '17 at 19:47

MiloMinderbinder

1,622
2
15
31

3

votes

1 answer

Proof that any $\epsilon-$greedy policy is an improvement over any $\epsilon-$soft policy

In the book by Richard Sutton and Andrew Barto, "Reinforcement Learning - An Introduction", 2ed edition, at page 101 there is a proof, and I don't understand 1 passage of it. We want to prove that any $\epsilon-$greedy policy with respect to an…

monte-carlo reinforcement-learning deterministic-policy stochastic-policy

asked Jul 16 '19 at 07:31

robertspierre

1,358
6
21

2

votes

1 answer

Policy improvement in SARSA and Q learning

I have a rather trivial doubt in SARSA and Q learning. Looking at the pseudocode of the two algorithms in Sutton&Barto book, I see the policy improvement step is missing. How will I get the optimal policy by the two algorithms? Are they used to find…

reinforcement-learning q-learning sarsa policy-iteration stochastic-policy

asked Apr 07 '21 at 11:45

Jor_El

391
3
9

1

vote

1 answer

How do measure how different two policies are?

I have two agents that both follow a baseline behavioral policy pi(a|s). If I then modify the state-action distribution for the two agents (resulting in two new policies), is there a standard measure I can use to tell how "far" the policies are from…

reinforcement-learning distance kullback-leibler stochastic-policy

asked Jan 25 '21 at 18:07

Dirk

111
2

1

vote

0 answers

ε-Greedy with Q learning / SARSA can have stochastic policy?

Hello I'm now studying Q learning and SARSA with ε-Greedy , Softmax startegies. And have a question about my readings. In my readings, when SARSA with ε-Greedy, SARSA causes value-function oscilliations in case of stochastic polices. but I think…

reinforcement-learning q-learning sarsa stochastic-policy

asked Jan 17 '21 at 09:56

BE LEO

11
1

1

vote

1 answer

Discrete and continuous actions in the same environment

I am working on a RL environment that requires both discrete and continuous actions as input from the agent. I currently have a fine implementation of DDPG which I would like to use for the continuous part. But what about the discrete actions? Can…

machine-learning neural-networks reinforcement-learning deterministic-policy stochastic-policy

asked Oct 06 '20 at 09:16

franyx

11
1

1

vote

0 answers

Why the Monte Carlo Control algorithm is written this way?

I am having trouble to understand this algorithm, since this is not how I would have written it. To me, we should first start to fix a policy. Then, we evaluate the Q values associated with this policy by doing exploration and reducing the…

monte-carlo reinforcement-learning stochastic-policy

asked Jan 05 '20 at 11:49

Hugo Laurençon

51
4

0

votes

0 answers

What should be the policy for online reinforcement learning with intrinsic reward

An agent receives an extrinsic reward $r_{ext}$ and an intrinsic reward $r_{int}$ and a Q-function approximation is trained using TD learning such that $Q(s,a)$ approximates the expected return of $r_{ext} + \beta r_{int}$ where $\beta$ is a…

reinforcement-learning stochastic-policy

asked Dec 03 '21 at 05:08

Kevin

1
1

0

votes

0 answers

How to prove that stochastic policy iteration converges?

I was reading Sutton's book Reinforcement Learning: An Introduction, especially policy iteration part. There was a proof for convergence of policy iteration with deterministic policy. So i tried to find the proof for the case of stochastic policy,…

reinforcement-learning stochastic-policy

asked May 22 '21 at 09:27

nawab

1

0

votes

1 answer

Are the two $\epsilon$-greedy policies different?

I found 2 diffefent versions of $\epsilon$Greedy policy for monte carlo and q learning: For monte carlo: $\pi (a|s)=\epsilon /m +1-\epsilon$ to choose the best action and $\pi =\epsilon /m$ for other actions For q learning: $\pi (a|s)=1-\epsilon$ to…

monte-carlo reinforcement-learning stochastic-policy greedy-algorithm

asked May 17 '21 at 06:49

abcd

1
1

Questions tagged [stochastic-policy]