Why we dont need any hidden layers at all if the data is linearly separable?

Question

According to the answer here: How to choose the number of hidden layers and nodes in a feedforward neural network?

How many hidden layers? Well, if your data is linearly separable (which you often know by the time you begin coding a NN) then you don't need any hidden layers at all.

Why this is true?
If the data is linearly separable:

2.1 Do we need only to use input and output layers?

2.2 Does the activation function on the output layer will do the logic of the separation? (Is it enough)?

gunes · Accepted Answer · 2021-07-05T22:49:31.917

Yes, hidden layers are not needed for linearly separable data. Because, the output layer already calculates a linear combination of features and outputs a number that has discriminative power, i.e. $f(\sum w_ix_i + b)$, where $w_i$ are output neuron's weights, $b$ is bias and $f$ is the activation function. Linear separability means that there exist a hyperplane separating the classes in the feature space, i.e. $\sum w_ix_i+b$. Activation functions also matter, but as long as they translate to a decision rule of the form $\sum w_ix_i+b >\tau$, the form of activation is not important, e.g. tanh, sigmoid. So, you don't need hidden layers trying to discover a more complex decision boundary. It's already been discovered by using only the output layer. And, you have input layer as always, which just represents your features.

Thanks. How the activation function influence the output in this case ? — user3668129, Jul 05 '21 at 12:32
For example, while learning, it helps you to define a proper loss function like log-loss, instead of mse, where the problem can turn out to be non-convex — gunes, Jul 05 '21 at 12:33

Why we dont need any hidden layers at all if the data is linearly separable?

1 Answers1