Consider a NN of 2 input neurons, one hidden layer of 4 neurons and one output node. The task is to predict the next sample given two input samples at a time. $m_1$ is the output of the hidden layer, $m_2$ is the output of the output layer, then would the error from the output layer be calculated as $e = target - f(m_2)$ or $e = target -m_2$ where $f(\cdot)$ is the activation function of the output layer. The activation function for the hidden $m_1$ is the sigmoid. $b1,b2$ denote the bias. I am confused whether the error in general is calculated after passing the result into the activation function or not. In a perceptron, I have studied that the error is calculated from the output of the activation function where typically the activation function is a hardlimit.
Question1) In general, the error is always calculated between the target and the output. By output, based on my understanding is the answer obtained after passing the weights multiplied by the input and bias terms into an activation function. Is this correct?
Reasons for confusion and questions:
(Confusion) For the case of MLP, I am not sure. The text book that I am following is titled, "MATLAB Deep Learning: With Machine Learning, Neural Networks and Artificial Intelligence" by Phil Kim (https://www.mathworks.com/support/books/matlab-deep-learning-kim.html)In Chapter 4, an example of Multiclass classification is given. I ran the code for digit classification but it only classifies correctly the first example after training. The code does not give the results as shown in the example.
In many other examples especially for prediction, I have seen the error = target - actual output
, and there is no activation. So the actual output
is not passed into any activation function. the activation function is just an identity.
(Question2) So, does it mean that it not necessary to have a transfer function at the output node?
(Question3) Does the prediction task normally not have any transfer function at the output? Only classification tasks have the transfer function?
Therefore, I was wondering if I have misunderstood something or not.
m1 = sig(W1*[X(1,k+1) X(1,k)]' + b1);
m2 = W2*m1 + b2;
err(k) = X(1,k+2) - m2;
where $X1$ is the data and $k$ is iteration number. Can somebody please help in clearing the concepts and confusions. Thank you.