0

I need to implement a custom kernel for the sklearn.svm.SVC learner. My custom kernel consists in multiplying every element of the kernel matrix except the main diagonal by a fixed constant (the "correction" factor). According to scikit-learn docs I can use a standard SVC with parameter kernel='precomputed' and fit by passing the Gram matrix and the target variable.

Adapting from their example (copied from the link above):

import numpy as np
from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
from sklearn import svm
X, y = make_classification(n_samples=130000, random_state=42)
X_train , X_test , y_train, y_test = train_test_split(X, y, random_state=42)
clf = svm.SVC(kernel='precomputed')
# custom kernel computation
mu = 0.25
sigma_squared = mu * (1.0 - mu)
correction = 1.0 - 4 * sigma_squared
linear_gm = np.dot(X_train, X_train.T)
diag_values = np.diag(linear_gm)
corrected_gram_matrix = linear_gm * correction
np.fill_diagonal(corrected_gram_matrix, diag_values)

clf.fit(corrected_gram_matrix, y_train)

# predict on training examples
gram_test = np.dot(X_test, X_train.T)
clf.predict(gram_test)

The problem is that precomputing the Gram matrix in this way takes up too much memory and the Google Colab instance I am using crashes with the following:

tcmalloc: large alloc 76050006016 bytes

That's about 76GB of RAM, so running on my computer is not an option either. Removing my correction part and just changing the number of samples in their example produces the same result. What's worse is that this number (130000) is a toy example, as the actual dataset I will use has nearly 1 million samples and 120 features.

I know the other possible way to use a custom kernel is to define a function and pass the callable as the kernel parameter in the SVC constructor, but the function needs to have the signature f(X, Y) and would operate on one sample at a time without knowing if the kernel value is on the diagonal or not, which would in turn make my correction impossible.

Is there any other (or better, feasible) way to implement my custom kernel?

df342
  • 1

1 Answers1

0

By definition the diagonal is the case where you're evaluating $f(x,x)$ - thus, you should provide your own function $f(x,y)$:

unscaled = k(x,y)
return x == y ? unscaled : scalingFactor * unscaled

Note that this calculation is correct in that if you have multiple identical points, the kernel definition should not be dependent on the fact that it's the same index in the matrix, but rather the values of the operands.

MotiNK
  • 1,224
  • 6
  • 14
  • Thanks for your reply. I think there's a problem with your code though, if two samples $x_1$ and $x_2$ are identical, the corresponding value in the Gram matrix (not on the main diagonal) won't have the correction even it if should. Am I right? That's why I can't rely on the samples alone but I think I need also the indices. – df342 Jan 15 '22 at 17:28
  • It is not a problem - if two samples are identical they must identical behavior. Consider *classifying* a point $x$ which is equal to $x_1$ and $x_2$ - you can't distinguish if it is $x_1$, $x_2$ or some completely new point. Otherwise, you could incorporate an additional dimension for the index, but then you could only be in a transductive setting. – MotiNK Jan 16 '22 at 08:15
  • I'm not sure I have the knowledge to follow you. My experiments consists in correcting a generic kernel matrix by multiplying each value, excluding those on the main diagonal, for a constant number. Thus arises the need to know if a pair of samples $x_1$ and $x_2$ are on the diagonal or not, which seems impossible to know without resorting to the indices. I hope I was clearer and to be proven wrong. – df342 Jan 18 '22 at 16:30
  • The only way to do what you want is to include indices. Note that for new points (not known at training time) to be classified you would then have to create a unique index. – MotiNK Jan 23 '22 at 08:20