3

I am reading the book "An Elementary Introduction to Statistical Learning Theory" and there is a sketch of a proof (Section 8.4) for the universal consistency of kernel rules for binary classification.

In this sketch the authors write: "with very high probability, the feature vector x will fall where the probability density p(x) is positive"

Since probability density is non-negative, the authors suggest that there is a non-zero probability that the probability density p(x) is zero. I would like to know if this is possible and if so how?

mtedwards
  • 173
  • 5

1 Answers1

0

Agree with you, this quote is confusing. Probability is bounded between zero and one, probability density is non-negative, so "positive probability" taken literally means non-zero. My guess would be that by "positive probability" they mean something like "high probability" and state the tautology "for $x$ such that probability density $p(x)$ is high, the probability of observing value in close proximity to $x$ is high". Otherwise they would be suggesting that probability of observing some $x$ value does not correspond to $p(x)$, what is nonsense.

Another interpretation might be that authors are trying to define probability density in here. For continuous variable $\Pr(X=x) = 0$ for any $x$, so we use probability densities instead. Probability densities $p(x)$ are "probabilities per foot", so $\int_a^b p(x) \, dx = \Pr(a < x < b)$ tells us about probability of observing $x$ within the $(a,b)$ range. You could say something similar like the authors: "if probability density $p(x)$ is large, there's high probability, the feature vector $x$ will fall near to $x$".

Tim
  • 108,699
  • 20
  • 212
  • 390
  • While I generally agree with your idea, when talking about probability densities rather than probability masses, *probability of observing some $x$ value can differ from $p(x)$* is correct. The probability is zero, while $p(x)$ may be anything nonnegative. – Richard Hardy Jun 23 '20 at 10:10
  • @RichardHardy agree, this was badly written, corrected. – Tim Jun 23 '20 at 10:15
  • I think the edit is insufficient to prevent confusion. The "tautology" part is also fraught with the same problem. – Richard Hardy Jun 23 '20 at 10:43
  • @RichardHardy those are good points, made some clarifications, & our discussion gave me one more hypothesis what might have been meant by the quote. – Tim Jun 23 '20 at 12:39
  • We are probably half-way there. The remaining half concerns *Otherwise they would be suggesting that probability of observing some $x$ value does not correspond to $p(x)$*. As mentioned before, the probability $P(X=x)$ is zero for a continuous random variable with density $p_X(\cdot)$, so there is no correspondence between $P(X=x)$ and $p_X(x)$. – Richard Hardy Jun 23 '20 at 14:02