How sensitive is reinforcement learning to the neural network structure

Question

I am trying out Sarsa deep reinforcement learning on OpenAI gym CartPole-v0 problem. The state has 4 continuous features and the action is binary with either 0 or 1. The state-action vector is then fed to a neural network to output the state-action value. The action with the highest value is then selected as according to Sarsa.

When the network has shape 128-256-128, i could achieve up to 100 points, although it was quite volatile and varies around 30. However, if i choose 128-256-256-128 then the network does not learn at all and always choose 1 action even after i have trained it for 300 episodes.

So my question is is this an expected behavior of reinforcement learning? Is it very sensitive to the network architecture or is it because i have made some mistake in my implementation?

I'm not sure enough to write an answer, but If I was making a neural network working with only 4 features I would use far fewer parameters than 128-256-256-128 (~100,000 parameters) or even 128-256-128 (~50,000 parameters). I'd start smaller but deeper, e.g. something like 10-10-20-10-10. — kbrose, Sep 15 '19 at 14:55

How sensitive is reinforcement learning to the neural network structure

0 Answers0