Neural Network: Matlab uses different activation functions for different layers - why?

Question

I have trained on matlab an Artificial Neural Network with one input layer, one hidden layer and one output layer (my output is values between zero and one, which I turn into 0 or 1 according to a treshold of 0.5).

I have noticed that, by default, matlab used the 'tansig' transfer function for the hidden layer and then 'logsig' transfer function for the output layer. Can anyone give me an explanation for this?

Thank you in advance!

'logsig' is the standard sigmoid transfer function, and 'tansig' is the hyperbolic tangent sigmoid transfer function — Johanna, Aug 19 '20 at 15:31
you need to clarify which function did you use to build ANN. For instance, narnet would use tanh (aka tansig), but I'm not sure whether deep learning functions would do it. — Aksakal, Aug 19 '20 at 16:30

Sycorax · Accepted Answer · 2020-08-19T19:36:10.760

3

The big idea is that there's no particular requirement that all layers of a neural network use the same activation function. You can mix-and-match as you wish. That said, there are some reasons to prefer using $\tanh$ as the activation function of a hidden layer and $\sigma$ as the output function.

The $\tanh(x)=\frac{\exp(x)-\exp(-x)}{\exp(x)+\exp(-x)}$ function is a standard activation function. Using it in a neural network is no more surprising than using least squares as an objective function for a regression task.
The function $\sigma(x)=\frac{1}{1+\exp(-x)}$ is a standard way to map real numbers to real values in (0,1). So it's commonly used to model probabilities. Since your task is to predict 0 or 1, using this model suggests modeling the probability that the sample is labeled 1.
Using a $\tanh$ function in the last layer would be implausible, because it does not have a clear relationship to modeling the probability that a sample is labeled 1. The function $\tanh$ returns values between -1 and 1, so it is not a probability.
If you wished, you could use $\sigma(x)$ as an activation function. But $\tanh$ is preferred because having a stronger gradient and giving positive and negative outputs makes it easier to optimize. See: tanh activation function vs sigmoid activation function
But also note that ReLU and similar functions are generally preferred as activation functions in hidden layers. See: What are the advantages of ReLU over sigmoid function in deep neural networks?
The choice to use $\tanh$ as a default is likely more about software development practices than mathematical principles: changing the default behavior of software can break legacy code and cause unexpected behavior. ReLU units only became popular recently, relative to the age of MATLAB. The Neural Network Toolbox add-on first published 1992 (since then, it's been rebranded as the "Deep Learning Toolbox"). In 1992, building a neural network was almost synonymous with a single-layer network with $\tanh$ or $\sigma$ activation functions.

But there's unlikely to be any definitive explanation for why MATLAB chose this default unless they happened to publish a justification for this choice (e.g. release notes or documentation).

edited Aug 19 '20 at 19:36

answered Aug 19 '20 at 15:40

Sycorax

76,417
20
189
313

Thank you very much! Is there a difference between 'tanh' and 'tansig'? – Johanna Aug 19 '20 at 15:48
You tell me. When I asked what "tansig" meant, you told "'tansig' is the hyperbolic tangent sigmoid transfer function". I took this to mean you were using a nonstandard term for $\tanh$. What is the mathematical expression for "tansig"? – Sycorax Aug 19 '20 at 15:49
According to matlab, tansig(n) = 2/(1+exp(-2*n))-1. However, checking the matlab code was the first time I came across the 'tansig' function, when I studied this I only learned about the logaritmic sigmoid and the tanh! But I am very new at machine learning, all I know about it was self-learned so I may be missing some information :) – Johanna Aug 19 '20 at 15:52
1

The next sentence in [the documentation](https://www.mathworks.com/help/deeplearning/ref/tansig.html;jsessionid=76a6c19a777f8213fe9d79501056) is "This is mathematically equivalent to tanh(N)." I don't know why they use `n` for one expression and `N` for another. – Sycorax Aug 19 '20 at 15:53
Yes, I don't really understand why they say the result can have small differences but I guess it's not relevant for my work anyway. Thank you very much! – Johanna Aug 19 '20 at 15:54
1

The note about computation makes some sense -- it's cheaper to compute $\exp(x)$ once than it is to do it twice and then divide. And it raises another point that is important for new students of machine learning, which is that doing numbers on a computer is a specialization of mathematics. Floating point arithmetic is a bit different from math. – Sycorax Aug 19 '20 at 16:01
I dont think you answered why MATLAB uses tanh as default. Why not ReLU? – Aksakal Aug 19 '20 at 16:29
1

My assumption is that changing default behavior breaks legacy code. ReLU didn't become popular until recently, relative to the age of MATLAB. – Sycorax Aug 19 '20 at 16:55
+1 @Sycorax I can put money exactly to that. **Correctly** MATLAB does not change default arguments often. Given they probably have thousands lines of test-code that works fine with tahns changing the default to ReLU (or GELU or the next hot thing) defeats the point of doing code quality assurance. They just need to check ReLU works correctly in a small number of general cases and potentially write some specific unit-tests for particular cases. – usεr11852 Aug 19 '20 at 17:26
@usεr11852 As someone who working on a major compatibility-breaking code refactor right now, I feel the pain of revising unit tests and verifying logic quite acutely. – Sycorax Aug 19 '20 at 18:11

Neural Network: Matlab uses different activation functions for different layers - why?

1 Answers1