Highest Voted 'markov-decision-process' Questions - Statistical Analysis Stack Exchange

3

votes

1 answer

What is the difference between Reinforcement Learning(RL) and Markov Decision Process(MDP)?

What is the difference between a Reinforcement Learning(RL) and a Markov Decision Process(MDP)? I believed I understood the principles of both, but now when I need to compare the two I feel lost. They mean almost the same to me. Surely they are…

reinforcement-learning markov-decision-process

asked May 17 '20 at 07:01

Pluviophile

2,381
8
18
45

3

votes

1 answer

States in Bandit Problems

I am wondering if there is an interpretation of the Bandit Problem with more than one states. I know that there are versions which views each slot machine as an independent Markovian machines and as such the states evolve when an arm is pulled.…

machine-learning probability reinforcement-learning multiarmed-bandit markov-decision-process

asked May 09 '20 at 21:32

dezdichado

105
7

3

votes

1 answer

UCB Exploration in Reinforcement Learning

I have two questions regarding the upper confidence bounds (UCB) exploration in reinforcement learning: UCB exploration is derived from Hoeffding's inequality which assumes that the reward is bounded in the interval [0,1]. If the rewards are not…

reinforcement-learning iid multiarmed-bandit markov-decision-process

asked Sep 24 '19 at 20:41

gnikol

657
2
6
16

3

votes

0 answers

Model or State Uncertainty in Queueing Model due to uncertain arrival rate

$\textbf{Introduction}$ I am currently modelling a scenario where two queues need to be served by a single server in a non preemptive discipline. I am quite sorted on generating the optimal policy via Value or Policy Iteration when given the arrival…

bayesian markov-process uncertainty belief-propagation markov-decision-process

asked Aug 28 '19 at 13:01

Dylan Solms

183
5

3

votes

2 answers

Uniqueness of the optimal value function for an MDP

Suppose we have a Markov decision process with a finite state set and a finite action set. We calculate the expected reward with a discount of $\gamma \in [0,1]$. In chapter 3.8 of the book "Reinforcement Learning: An Introduction" (by Andrew Barto…

reinforcement-learning markov-decision-process

asked Jan 30 '17 at 17:41

jakab922

181
1
9

2

votes

1 answer

Is a policy $\pi(s)$ on Markov decision process a random variable?

Citing Wikipedia: The goal in a Markov decision process is to find a good "policy" for the decision maker: a function $\pi$ that specifies the action $\pi(s)$ that the decision maker will choose when in state $s$. Once a Markov decision process…

probability markov-process reinforcement-learning markov-decision-process

asked Oct 01 '20 at 20:04

Multivac

168
10

2

votes

1 answer

How to solve a Markov Decision Problem with State Transition Matrix and Reward Matrix

I'm stuck in solving a simple dynamic probabilistic model. I have Three states {Sunny, Cloudy, Rainy}. I have the Transition Probability Matrix for the states transitioning to another state (for eg. Sunny -> Cloudy or Sunny -> Sunny). For the Action…

machine-learning markov-process reinforcement-learning markov-decision-process

asked Sep 27 '20 at 16:33

Sammy

65
3

2

votes

1 answer

Dyna-Q Algorithm Reinforcement Learning

In step(f) of the Dyna-Q algorithm we plan by taking random samples from the experience/model for some steps. Wouldn't it be more efficient if we construct an MDP from experience by computing the state transition probabilities and reward…

reinforcement-learning markov-decision-process dynamic-programming

asked Sep 24 '19 at 21:02

gnikol

657
2
6
16

1

vote

0 answers

Fixed point of the Bellman operator for suboptimal policies

Consider an MDP and let the Bellman operator be defined as follows, $$ (T^\pi_\gamma V)(s) = \sum_{a\in A}\pi(s)\big(r(s,a) + \gamma \sum_{s' \in S} p(s'\mid s,a) V(s')\big) $$ where, $\pi:S\to \Delta(A)$ is a policy, i.e., a function that maps…

reinforcement-learning markov-decision-process

asked Feb 05 '22 at 19:58

Erik M

111
3

1

vote

0 answers

Bellman Optimality Operator fixed point

I'm reading Szepesvári's book on RL. My question is concerning the proof of Theorem A.10 (p. 71). Theorem Let $V$ be the fixed point of $T^∗$ and assume that there is policy $π$ which is greedy w.r.t $V:T^πV=T^∗V$. Then $V=V^∗$ and $π$ is an…

machine-learning reinforcement-learning q-learning markov-decision-process dynamic-programming

asked May 29 '21 at 15:57

Nick Halden

23
5

1

vote

0 answers

Is random policy a stochastic policy?

I'm a student to start to study RL. When I studied MDP and watched the gridworld example, I had one question. In the gridworld, we usually assume that we can have four actions in any states, e.g. up, down, left, right. In this case, if we have a…

reinforcement-learning markov-decision-process

asked Mar 20 '21 at 23:53

beef stew

43
3

1

vote

0 answers

Open AI Gym for TSP problem?

In a previous question I asked about use of Open AI Gym as a vehicle for modeling business problems as MDPs. A comment suggested that I start a new question with more refined scope. In general, I'm interested in RL for combinatorial optimization. As…

optimization reinforcement-learning discrete-data combinatorics markov-decision-process

asked Mar 02 '21 at 15:58

jbuddy_13

1,578
3
22

1

vote

1 answer

What kind of model to optimize the allocation a ressource in the context of time to event outcome?

I have a list of N patients that are competing for one treatment at each time. A treatment becomes available at times t=1,...,T. I want to build a model that can take the time-varying characteristics of all the patients at the time t, when a…

survival random-allocation markov-decision-process

asked Feb 17 '21 at 10:33

Mery

11
1

1

vote

0 answers

Optimal action-value as function of optimal value. Proof

Currently reading through Algorithms for Reinforcement Learning, I think these notes are good, but there're bits that are a bit unclear, and I have few questions that I think are quite basic: Definition of optimal value function…

reinforcement-learning markov-decision-process

asked Nov 26 '20 at 14:38

user8469759

213
1
8

1

vote

1 answer

Equivalent definitions of Markov Decision Process

I'm currently reading through Sutton's Reinforcement Learning where in Chapter 3 the notion of MDP is defined. What it seems to me the author is saying is that an MDP is completely defined by means of the probability $p(s_{t+1},r_t | s_t,…

machine-learning probability reinforcement-learning markov-decision-process

asked Nov 03 '20 at 18:21

user8469759

213
1
8

Questions tagged [markov-decision-process]