Highest Voted 'deterministic-policy' Questions - Statistical Analysis Stack Exchange

8

votes

2 answers

Is a policy always deterministic in reinforcement learning?

In reinforcement learning, is a policy always deterministic, or is it a probability distribution over actions (from which we sample)? If the policy is deterministic, why is not the value function, which is defined at a given state for a given policy…

reinforcement-learning deterministic-policy stochastic-policy

asked Dec 01 '17 at 19:47

MiloMinderbinder

1,622
2
15
31

3

votes

1 answer

Proof that any $\epsilon-$greedy policy is an improvement over any $\epsilon-$soft policy

In the book by Richard Sutton and Andrew Barto, "Reinforcement Learning - An Introduction", 2ed edition, at page 101 there is a proof, and I don't understand 1 passage of it. We want to prove that any $\epsilon-$greedy policy with respect to an…

monte-carlo reinforcement-learning deterministic-policy stochastic-policy

asked Jul 16 '19 at 07:31

robertspierre

1,358
6
21

1

vote

1 answer

Deep deterministic policy gradient : which network do I have to use for testing?

We know that Deep deterministic policy gradient (henceforth ddpg) is characterized by two kind of neural networks: one related to the critic $Q$ the other to the actor $\mu$ with parameters $\theta^\mu$ and $\theta^Q$ respectively. For stability…

machine-learning reinforcement-learning actor-critic deterministic-policy

asked Sep 10 '21 at 20:56

Siderius

11
4

1

vote

1 answer

Discrete and continuous actions in the same environment

I am working on a RL environment that requires both discrete and continuous actions as input from the agent. I currently have a fine implementation of DDPG which I would like to use for the continuous part. But what about the discrete actions? Can…

machine-learning neural-networks reinforcement-learning deterministic-policy stochastic-policy

asked Oct 06 '20 at 09:16

franyx

11
1

1

vote

1 answer

Q-learning shows worse results than value iteration

I'm trying to solve the same problem with different algorithms (Travel max possible distance with a car). While using value iteration and policy iteration I was able to get the best results possible but with Q-learning it doesn't seem to go well. My…

reinforcement-learning q-learning value-iteration policy-iteration deterministic-policy

asked Mar 11 '19 at 10:28

Most Wanted

255
1
13

0

votes

2 answers

Greedy policy definition

I've always seen as definition for the greedy policy the one that maximizes the action value function $q_{\pi} (s,a)$ over the actions $a$. How is this equivalent to the following one that I found on my professor lecture notes? The greedy policy is…

machine-learning reinforcement-learning deterministic-policy

asked Jun 12 '21 at 13:52

Damuna

19
3

0

votes

0 answers

MDP optimal policy inverse problem

Given a map $\pi: S \to A$, is there an MDP with state ans action spaces $S,A$ such that it has $\pi$ as an optimal policy if we suppose the MDP is over an infinite time horizon and the optimality criterion is the expected discounted total…

decision-theory markov-decision-process inverse-problem deterministic-policy

asked Apr 15 '21 at 18:01

Vincent L.

101
2

0

votes

1 answer

policy gradient for non-differentiable policy

Is it possible to apply policy gradient if the parameters of policy are not differentiable? If not, is there any other algorithm for optimizing such type of policies? One example I'm thinking about is a hard boundary: if $W^T x > 0$ then take…

reinforcement-learning policy-gradient deterministic-policy

asked Feb 17 '20 at 03:58

DiveIntoML

1,583
1
11
21

0

votes

1 answer

Policy evaluation in contextual bandit setting

I am currently reading a paper whose links is (Exploration Scavenging)…

reinforcement-learning estimators multiarmed-bandit contextual-bandit deterministic-policy

asked Aug 04 '19 at 22:09

Hunnam

155
5

0

votes

1 answer

quick questions about a contextual bandit problem

I am currently reading the paper "Learning from Logged Implicit Exploration Data" https://arxiv.org/pdf/1003.0120.pdf. But I believe the questions I have can be answered without reading the whole paper, so I would greatly appreciate it if you help…

reinforcement-learning estimators multiarmed-bandit contextual-bandit deterministic-policy

asked Jul 29 '19 at 23:32

Hunnam

155
5

0

votes

0 answers

Neural Network and equally good predictions

There is a two-player game (discrete, deterministic, perfect information and so on) where - in some but not all states - a few moves may be equally good; i.e. they are symmetric and expert player will expect the same outcome from any of those moves.…

neural-networks games game-theory deterministic-policy

asked Apr 09 '19 at 14:10

tomash

101
1

0

votes

2 answers

Different algorithms categorized in reinforcement learning

For some time I am going through reinforcement learning, and have found a lot of diverse information specially in area of Policies (algorithms). I figured out that policies can be classified in On Vs Off, Model based vs Model Free, Also, these are…

reinforcement-learning q-learning sarsa deterministic-policy

asked Apr 07 '19 at 07:57

Sandeep Bhutani

101
4

Questions tagged [deterministic-policy]