How to choose different number of Neurons per Hidden Layer?

Question

The post about how to choose the amount of hidden layers and neurons was extremely helpful. The rules of thumb given gave me often a good point to start. However, I'm now thinking about varying the number of Neurons per layer.

How should one choose different number of neurons per layer? Any rules of thumb? And is there maybe a good explanation how e.g. a bottleneck or an enlargement of a layer in between influences the network?

EDIT 1:

Some more background information about the problem. I'm working on some Reinforcement Learning Projects. I used Q-Learning so far and am now trying some DQN stuff. I started with OpenAi Gyms Cartpole game to test some basics. There are currently an input vector of length 4 and an output vector of length 2. With the rule of thumb I've chosen 2 hidden layers. 7 neurons per layer worked best so far. The loss is MSE and I train with an ADAM optimizer. As Activation function a ReLu is used.

However, some more general answers would be great to, since this is only a small test project for later work.

There are some guidelines, but it depends quite a bit on what problem you are modeling. Are you working on dimension reduction? Representation learning? Object detection? Compressed sensing? Tell us you problem and we may be able to help you. — DeltaIV, Nov 09 '18 at 11:09
Uuugh Deep RL? No, thanks, not touching that with a ten foot pole. [There's so much sketchy stuff going on](https://arxiv.org/abs/1811.02553) in DRL, that the rest of Deep Learning looks like linear regression, in comparison. I hope someone will be able to help you, but personally [I doubt anyone knows what's really going on with Deep RL](https://arxiv.org/abs/1803.07055). — DeltaIV, Nov 10 '18 at 11:04
Well that doesn't help much. Thats why I didn't gave further information in the first case. Because I wanted a more general explanation which would be helpful in e.g. supervised cases too. — Mr.Sh4nnon, Nov 10 '18 at 11:06
Sorry, but the point is that what we know about supervised learning doesn't really apply to DRL (it sorts of applies to classic RL): I could give you the "standard" version, but I'm conviced I'd just be fooling you. Have a look at the papers I linked, if you don't believe me, or have a look at the very insightful series of blog posts Ben Recht wrote. http://www.argmin.net/2018/06/25/outsider-rl/ — DeltaIV, Nov 10 '18 at 11:11
If you insist, can give you the link to the answer I wrote on bottleneck layers of autoencoders. But personally I feel like I'd be lying to you. And anyway, you did well to explain your use case. Maybe someone else will be able to help you. For a less skeptical approach to DRL, [see here](https://www.alexirpan.com/2018/02/14/rl-hard.html). For a very different point of view (an enthusiastic introduction), [see here](https://blog.openai.com/spinning-up-in-deep-rl/). — DeltaIV, Nov 10 '18 at 11:12
I know that those are different topics. But if one would e.g. suggest going down from 16 to 3 and backt to 16 neurons on three hidden layers forces the generalize more, than this would be true in both cases. (I know this was a bad example...) Yes give me that link please :) and thanks for the RL tipp site! — Mr.Sh4nnon, Nov 10 '18 at 11:16
Not it's not: gradient estimates in DRL are often **uncorrelated** with the true gradient, just to mention one issue (you can read the Madry's paper for many more). Anyway, here's the link: https://stats.stackexchange.com/questions/351212/do-autoencoders-preserve-distances. TL:DR; for autoencoders you can **prove** that the bottleneck layer makes it impossible for the autoencoder to learn the identity mapping (and any isometry, actually). How this interacts with policies, environments, completely "fake" optimization landscapes and violated trust regions? I have no idea. — DeltaIV, Nov 10 '18 at 11:22

How to choose different number of Neurons per Hidden Layer?

0 Answers0