Conceptual question on MLP error calculation

Question

Consider a NN of 2 input neurons, one hidden layer of 4 neurons and one output node. The task is to predict the next sample given two input samples at a time. $m_1$ is the output of the hidden layer, $m_2$ is the output of the output layer, then would the error from the output layer be calculated as $e = target - f(m_2)$ or $e = target -m_2$ where $f(\cdot)$ is the activation function of the output layer. The activation function for the hidden $m_1$ is the sigmoid. $b1,b2$ denote the bias. I am confused whether the error in general is calculated after passing the result into the activation function or not. In a perceptron, I have studied that the error is calculated from the output of the activation function where typically the activation function is a hardlimit.

Question1) In general, the error is always calculated between the target and the output. By output, based on my understanding is the answer obtained after passing the weights multiplied by the input and bias terms into an activation function. Is this correct?

Reasons for confusion and questions:

(Confusion) For the case of MLP, I am not sure. The text book that I am following is titled, "MATLAB Deep Learning: With Machine Learning, Neural Networks and Artificial Intelligence" by Phil Kim (https://www.mathworks.com/support/books/matlab-deep-learning-kim.html)In Chapter 4, an example of Multiclass classification is given. I ran the code for digit classification but it only classifies correctly the first example after training. The code does not give the results as shown in the example. In many other examples especially for prediction, I have seen the error = target - actual output, and there is no activation. So the actual output is not passed into any activation function. the activation function is just an identity.

(Question2) So, does it mean that it not necessary to have a transfer function at the output node?

(Question3) Does the prediction task normally not have any transfer function at the output? Only classification tasks have the transfer function?

Therefore, I was wondering if I have misunderstood something or not.

m1 = sig(W1*[X(1,k+1) X(1,k)]' + b1);
m2 = W2*m1 + b2;
 err(k) = X(1,k+2) - m2;

where $X1$ is the data and $k$ is iteration number. Can somebody please help in clearing the concepts and confusions. Thank you.

COOLBEANS · Accepted Answer · 2018-04-17T17:02:20.020

1

All networks typically have an activation on the final layer, although different types are used depending on the context:

For the output layer:

1) Regression problems often use a linear activation function.

2) Classification problems often use a softmax or sigmoid etc.

You are correct that the output you compare to your label in the error function should have gone through your chosen activation function first.

The RELU activation function which is used in Deep Nueral Nets is actually just an identity function ( $f(x) = x$ ) where ($x >0$). This may seem odd since the point of an activation function is to add non-linearity. This kind of question is answered here.

edited Apr 17 '18 at 17:02

answered Apr 16 '18 at 23:02

COOLBEANS

345
2
9

Thanks for your answer: I wanted to clarify few things based on your reply. In https://en.wikipedia.org/wiki/Activation_function the activation function can be an identity as well. So, does this mean that the error in the case of an identity is calculated as `target - actual output` where the `actual output` does not pass through any activation function (basically an identity )so it seems there is no activation function? Is this the way how an identity activation function works? – Srishti M Apr 17 '18 at 16:44
An identity function simply returns the input as output $ f (x) = x $. So in the case you mention, they do appear to be doing this. However, remember the a RELU which the activation function used by deep nets is an identity function in the range (x>0) ! – COOLBEANS Apr 17 '18 at 16:59

Conceptual question on MLP error calculation

1 Answers1