Is it possible to use DDPG for discrete action space?

Question

In Deep Deterministic Policy Gradients(DDPG) method, we use two neural networks, one is Actor and the other is Critic.

From actor-network, we can directly map states to actions (the output of the network directly the output) instead of outputting the probability distribution across a discrete action space. It especially advantages in continuous action space problem so that most examples that I've found using a sigmoid function as the output activation function in Actor-network and multiply by action maximum bound.

However, my model has discrete actions (e.q. integer index [0-125]). In this case, how should I build the output layer of actor-network? should I also use a sigmoid function and just transfer it as an integer by brute-force?

score 2 · Answer 1 · answered Aug 25 '19 at 23:54

DDPG extends actor-critic methods from the discrete action-space environments they were originally developed on to continous action-space environments.

With that in mind -- sure, you can use actor-critic methods with discrete action-spaces, but it doesn't really make sense to talk about "DDPG" anymore.

In this case, how should I build the output layer of actor-network?

Typically for a discrete action, $\pi$ is bernoulli with $p$ parameterized by the output of the network.

score 0 · Answer 2 · answered Nov 25 '19 at 02:19

0

I've struggled for a while with this same question. Actually, like shimao said, DDPG is the continuous action space version of actor-critic method. So for discrete action space, you may use DQN or Double-DQN instead.

answered Nov 25 '19 at 02:19

user2189731

111
2

You can only use the answer space for answers even if you don't have enough reputation to comment. – Michael R. Chernick Nov 25 '19 at 02:56

paypaytr · Answer 3 · 2020-05-12T21:52:30.463

-1

Actor critic methods wasn't originally developed for discrete action - spaces in fact it's exact opposite.

DDPG born from lack of training off policy on continuous action spaces , since all proper actor critic algorithms are on policy and they have horrendous sample efficiency.

DDPG with discrete actions is basically DQN with improvements. Newer versions of DQN such as C51 and Rainbow nets are much more refined for your need , if you need discrete actions with off policy training.

edited May 12 '20 at 21:52

answered May 12 '20 at 13:45

paypaytr

1
1

Welcome to the site, @paypaytr. CV is a strict Q&A site, not a discussion forum. Please edit your post to be a stand-alone answer to the question at the top of the thread, & not a discussion of other answers. The "Your Answer" field should only be used to provide answers to the OP's question. Since you're new here, you may want to take our [tour], which has information for new users. – gung - Reinstate Monica May 12 '20 at 15:04

Is it possible to use DDPG for discrete action space?

3 Answers3