I am having trouble understanding how one-class SVMs work. They were introduced in a paper by Scholkopf and others (and can be found here).
One-class SVMs perform "novelty detection", where a point is classified as either normal or abnormal, but you can only train your model on normal points.
In the paper linked above, the authors use SVMs for novelty detection by separating the training examples from the origin in feature space. They further claim that for certain values of parameters (specifically, their parameter $\nu$) their method is equivalent to a thresholded Parzen Windows if the kernel function integrates to 1 over $\mathbb{R}$. Mathematically, this makes sense (equations 5 and 6 in the paper reduce to Parzen Windows for large values of $\nu$), however I am having trouble understanding this intuitively.
So here is my question: As I see it, separating points from the origin in feature space will necessarily mean that there exists some $n$-dimensional ball centered at the origin with radius $r$ such that for all points outside of this ball, they will be classified as normal. Wouldn't this lead to a large portion of $\mathbb{R}^n$ being classified as normal? However, their paper claims that the solution has small volume.