Use of stack of tanh layers

Question

I am currently reading this paper and following is the model used in it. I haven't been able to completely understand what is the purpose of using three tanh layers. I read the usage of tanh activation function, how it can reduce learning time compared to sigmoid activation function but I don't understand why three tanh layers are needed.

To explain what it is trying to do is take two sentences, add embdedding of each word in the sentence to create two 100d vector, concatenate these two vectors to form a 200d vector and input it to the stack of tanh layer followed by a 3-way softmax since there are classes to classify into.

That does not seem like a duplicate at all to me. Could you expand more on why you think so? — kbrose, Oct 02 '18 at 17:27
@kbrose yes, the question asks why there are 3 tanh layers as opposed to (what i infer is the implied alternative) 0 or 1 layers. — shimao, Oct 02 '18 at 19:04
Ah, that makes sense. I interpreted the question as the OP thought that the layer _only_ included the tanh function (i.e. no fully connected weight matrix + bias stuff, literally only the activation function). This would be worthy of a question because why would anyone just stack plain tanh's together? Maybe the OP could weigh in? — kbrose, Oct 02 '18 at 20:27

score 1 · Answer 1 · answered Oct 02 '18 at 17:26

In your linked paper (you should provide the full citation to avoid link rot) we see the following

Our neural network classifier, depicted in Figure 3 (and based on a one-layer model in Bow- man et al. 2015), is simply a stack of three 200d tanh layers

Following the reference through

Samuel R. Bowman, Christopher Potts, and Christopher D. Manning. 2015. Recursive neural networks can learn logical semantics. In Proc. of the 3rd Workshop on Continuous Vector Space Models and their Compositionality.

it seems like the authors are using "tanh layer" to mean a fully connected layer with a tanh activation. This is not said outright but it seems heavily implied by the following section:

Use of stack of tanh layers

1 Answers1