Is there a measure to describe the degree of linear separability?

Question

I know that given two sets of points, one can use linear programming to see if there is a solution/hyperplane that linearly separates the two data sets. But this holds for completely linearly separability. I wonder in case that the two sets can only be linearly separated by neglecting some points in both sets, then is there a measure to describe the degree of linear separability?

In other ways, the problem may be transformed to

At least how many points we need to kick out to have linear separability? (and the degree may be measured by the number of points to be kicked out divided by sample size)

But is this computationally expensive to implement using computers? Or is there any better way to measure the degree of linear separability?

use linear programming to see if it is separable? Could you give me a reference for that? — Haitao Du, Oct 05 '16 at 02:29
See [here](http://www.joyofdata.de/blog/testing-linear-separability-linear-programming-r-glpk/)@hxd1011 — Nicholas, Oct 05 '16 at 02:32
@hxd1011: See also [Konis (2007), "Linear programming algorithms for detecting separated data in binary logistic regression models", DPhil., U. Oxf.](http://ora.ouls.ox.ac.uk/objects/uuid:8f9ee0d0-d78e-4101-9ab4-f9cbceed2a2a). — Scortchi - Reinstate Monica, Oct 05 '16 at 12:46
@Scortchi thanks for great reference. It is good to see it is a PHD dissertation, I thought it is trivial and everyone knows, but I do not know.... — Haitao Du, Oct 05 '16 at 13:39
For linear programming, also see [this answer](http://stats.stackexchange.com/a/94537/127790). For the original question, you might be able to formulate using a "soft margin" classifier (e.g. [SVM](https://en.wikipedia.org/wiki/Support_vector_machine#Computing_the_SVM_classifier)), and then use the lagrange multipliers to diagnose the "degree of linear separability". — GeoMatt22, Oct 05 '16 at 15:44

score 1 · Answer 1 · edited Apr 13 '17 at 12:44

As noted in the question and comments, a dataset of $m$ points $(\boldsymbol{x}_i,y_i)$ with $\boldsymbol{x}_i\in\mathbb{R}^n$ and $y_i\in\{-1,+1\}$ is linearly separable if we can find a normal vector $\boldsymbol{a}$ and scalar bias $b$ such that the linear inequality $$y_i(\boldsymbol{a}^T\boldsymbol{x}_i+b)\geq 1$$ is satisfied for all $i=1,\ldots,m$. Physically this says that for each point $\boldsymbol{x}_i$, the signed distance $d_i$ to the separating plane has sign $y_i$ and magnitude at least $\|\boldsymbol{a}\|$.

This can be written also as $$\boldsymbol{w}^T\boldsymbol{z}_i \geq 1$$ where $$\boldsymbol{z}_i = \begin{bmatrix}\boldsymbol{x}_i \\ 1\end{bmatrix} y_i \text{ , } \boldsymbol{w}=\begin{bmatrix}\boldsymbol{a} \\ b\end{bmatrix}$$

As noted in the question, this is essentially a linear program with a "0" objective function.

The proposed "degree of separability" measure $S_\min$ can be expressed as $$ \min_{\boldsymbol{\sigma},\boldsymbol{w}} S[\boldsymbol{\sigma}] $$ where $\boldsymbol{\sigma}\in\{0,1\}^m$, the function $S$ is defined by $$ S[\boldsymbol{\sigma}] = \sum_i\sigma_i = \boldsymbol{1}^T\boldsymbol{\sigma} $$ and the minimization is subject to the constraint $$ \sigma_i(\boldsymbol{w}^T\boldsymbol{z}_i - 1) \geq 0 \text{ , } i=1,\ldots,m$$

This is a (slightly nonlinear) "mixed integer programming" problem, a problem class which is NP hard (even in the linear & binary case).

Now a more feasible alternative could be to solve a "soft-margin" classification problem using a standard approach like SVM*. The tolerance and/or lagrange multiplier variables could then be used to quantify "degree of separability".

(*For $\lambda\approx 0$ the SVM will essentially reduce to a "softened" version of the linear program above. However to get meaningful distances, you will have to normalize tolerances by $\|\boldsymbol{a}\|$.)

Update: Using something like soft-margin SVM can give you an idea of which points are "hard" to separate, but it would not necessarily address your question of "how many points must be removed to allow a hard-margin split?"

However, I think that if you pre-process the $\boldsymbol{X}$ matrix by whitening, then it should be possible to get some more consistent results. (In terms of numerical offset-scales and directional clustering of the problem points.)

Is there a measure to describe the degree of linear separability?

1 Answers1