In Deep Deterministic Policy Gradients(DDPG) method, we use two neural networks, one is Actor and the other is Critic.
From actor-network, we can directly map states to actions (the output of the network directly the output) instead of outputting the probability distribution across a discrete action space. It especially advantages in continuous action space problem so that most examples that I've found using a sigmoid function as the output activation function in Actor-network and multiply by action maximum bound.
However, my model has discrete actions (e.q. integer index [0-125]). In this case, how should I build the output layer of actor-network? should I also use a sigmoid function and just transfer it as an integer by brute-force?