I'm trying to get the same linear SVM classifier model by using Scikit-Learn's SVC
, LinearSVC
and SGDClassifier
classes. I managed to do so (see the code below), but only by manually tweaking the alpha
hyperparameter for the SGDClassifier
class.
Both SVC
and LinearSVC
have the regularization hyperparameter C
, but the SGDClassifier
has the regularization hyperparameter alpha
. The documentation says that C = n_samples / alpha
, so I set alpha = n_samples / C
, but when I use this value, the SGDClassifier
ends up being a very different model than the SVC
and LinearSVC
models. If I manually tweak the value of alpha
, I can get all models to be approximately the same, but there should be a simple equation to find alpha
given C
. What is it?
from sklearn.svm import SVC, LinearSVC
from sklearn.linear_model import SGDClassifier
from sklearn.datasets import make_moons
from sklearn.preprocessing import StandardScaler
X, y = make_moons(n_samples=100, noise=0.15, random_state=42)
C = 5
alpha = len(X)/C # alpha == 20
sgd_clf1 = SGDClassifier(loss="hinge", alpha=alpha, n_iter=10000, random_state=42)
sgd_clf2 = SGDClassifier(loss="hinge", alpha=0.0007, n_iter=10000, random_state=42)
svm_clf = SVC(kernel="linear", C=C)
lin_clf = LinearSVC(loss="hinge", C=C)
X_scaled = StandardScaler().fit_transform(X)
sgd_clf1.fit(X_scaled, y)
sgd_clf2.fit(X_scaled, y)
svm_clf.fit(X_scaled, y)
lin_clf.fit(X_scaled, y)
print("SGDClassifier(alpha=20): ", sgd_clf1.intercept_, sgd_clf1.coef_)
print("SGDClassifier(alpha=0.0007): ", sgd_clf2.intercept_, sgd_clf2.coef_)
print("SVC: ", svm_clf.intercept_, svm_clf.coef_)
print("LinearSVC: ", lin_clf.intercept_, lin_clf.coef_)
This code outputs:
SGDClassifier(alpha=20): [-0.46597258] [[ 0.0283698 -0.03634389]]
SGDClassifier(alpha=0.0007): [ 0.0422716] [[ 0.79608868 -1.48847539]]
SVC: [ 0.04569242] [[ 0.79788013 -1.48716383]]
LinearSVC: [ 0.04556911] [[ 0.79762806 -1.4866854 ]]
Note: to make the LinearSVC
class output the same result as the SVC
class, you have to center the inputs (eg. using the StandardScaler
) since it regularizes the bias term (weird). You also need to set loss="hinge"
since the default is "squared_hinge"
(weird again).
So my question is: how does alpha
really relate to C
in Scikit-Learn? Looking at the equations, the documentation should be right, but in practice it is not. What's going on?