Recently, a project I'm involved in made use of a linear perceptron for multiple (21 predictor) regression. It used stochastic GD. How is this different from OLS linear regression?
-
The `Perceptron` class you link to is for a classifier (binary output) rather than a regressor (continuous output). Is that the actual code you used? If so, that's the difference. :) – Danica Mar 30 '15 at 03:03
-
@Dougal, it still counts among the GLMs though: http://scikit-learn.org/stable/supervised_learning.html#supervised-learning – Simon Kuang Mar 30 '15 at 03:30
-
@Dougal: suppose you had a (G)LM that you optimized to L2 using [`SGDRegressor`](http://goo.gl/QmI6bM); would this be equivalent to linear regression? – Simon Kuang Mar 30 '15 at 03:34
-
Yes, some GLMs are classifiers. If you used `SGDRegressor(loss='squared_loss', penalty='none')`, that is OLS. – Danica Mar 30 '15 at 03:57
1 Answers
scikit-learn's Perceptron
class (equivalent to SGDClassifier(loss="perceptron", penalty=None, learning_rate="constant", eta0=1)
) uses the following objective function:
$$\frac{1}{N} \sum_{i=1}^N \max(0, - y_i w^T x_i).$$
In this case, $y_i \in \{-1, 1\}$. If $w^T x_i$ has the right sign, it doesn't incur any loss; otherwise, it gives linear loss. The perceptron in particular uses a fixed learning rate which can lead to some optimization weirdness as well.
Least squares regression, by contrast, uses
$$\frac{1}{N} \sum_{i=1}^N (y_i - w^T x_i)^2.$$
Here $y_i$ can be any real; you can give it classification targets in $\{-1, 1\}$ if you want, but it's not going to give you a very good model.
You can optimize this with SGDRegressor(loss="squared_loss", penalty=None)
if you'd like.
The two define fundamentally different models: the perceptron predicts a binary class label with $\mathrm{sign}(w^T x_i)$, whereas linear regression predicts a real value with $w^T x_i$. This answer talks some about why trying to solve a classification problem with a regression algorithm can be problematic.