Questions tagged [deterministic-policy]
12 questions
8
votes
2 answers
Is a policy always deterministic in reinforcement learning?
In reinforcement learning, is a policy always deterministic, or is it a probability distribution over actions (from which we sample)? If the policy is deterministic, why is not the value function, which is defined at a given state for a given policy…

MiloMinderbinder
- 1,622
- 2
- 15
- 31
3
votes
1 answer
Proof that any $\epsilon-$greedy policy is an improvement over any $\epsilon-$soft policy
In the book by Richard Sutton and Andrew Barto, "Reinforcement Learning - An Introduction", 2ed edition, at page 101 there is a proof, and I don't understand 1 passage of it.
We want to prove that any $\epsilon-$greedy policy with respect to an…

robertspierre
- 1,358
- 6
- 21
1
vote
1 answer
Deep deterministic policy gradient : which network do I have to use for testing?
We know that Deep deterministic policy gradient (henceforth ddpg) is characterized by two kind of neural networks: one related to the critic $Q$ the other to the actor $\mu$ with parameters $\theta^\mu$ and $\theta^Q$ respectively. For stability…

Siderius
- 11
- 4
1
vote
1 answer
Discrete and continuous actions in the same environment
I am working on a RL environment that requires both discrete and continuous actions as input from the agent. I currently have a fine implementation of DDPG which I would like to use for the continuous part. But what about the discrete actions? Can…

franyx
- 11
- 1
1
vote
1 answer
Q-learning shows worse results than value iteration
I'm trying to solve the same problem with different algorithms (Travel max possible distance with a car). While using value iteration and policy iteration I was able to get the best results possible but with Q-learning it doesn't seem to go well.
My…

Most Wanted
- 255
- 1
- 13
0
votes
2 answers
Greedy policy definition
I've always seen as definition for the greedy policy the one that maximizes the action value function
$q_{\pi} (s,a)$ over the actions $a$.
How is this equivalent to the following one that I found on my professor lecture notes?
The greedy policy is…

Damuna
- 19
- 3
0
votes
0 answers
MDP optimal policy inverse problem
Given a map $\pi: S \to A$, is there an MDP with state ans action spaces $S,A$ such that it has $\pi$ as an optimal policy if we suppose the MDP is over an infinite time horizon and the optimality criterion is the expected discounted total…

Vincent L.
- 101
- 2
0
votes
1 answer
policy gradient for non-differentiable policy
Is it possible to apply policy gradient if the parameters of policy are not differentiable? If not, is there any other algorithm for optimizing such type of policies?
One example I'm thinking about is a hard boundary: if $W^T x > 0$ then take…

DiveIntoML
- 1,583
- 1
- 11
- 21
0
votes
1 answer
Policy evaluation in contextual bandit setting
I am currently reading a paper whose links is (Exploration Scavenging)…

Hunnam
- 155
- 5
0
votes
1 answer
quick questions about a contextual bandit problem
I am currently reading the paper "Learning from Logged Implicit Exploration Data" https://arxiv.org/pdf/1003.0120.pdf. But I believe the questions I have can be answered without reading the whole paper, so I would greatly appreciate it if you help…

Hunnam
- 155
- 5
0
votes
0 answers
Neural Network and equally good predictions
There is a two-player game (discrete, deterministic, perfect information and so on) where - in some but not all states - a few moves may be equally good; i.e. they are symmetric and expert player will expect the same outcome from any of those moves.…

tomash
- 101
- 1
0
votes
2 answers
Different algorithms categorized in reinforcement learning
For some time I am going through reinforcement learning, and have found a lot of diverse information specially in area of Policies (algorithms).
I figured out that policies can be classified in On Vs Off, Model based vs Model Free, Also, these are…

Sandeep Bhutani
- 101
- 4