1

Assuming $X$ is a data set represented in form of a matrix. Each row represents an instance of the data with every instance consisting of values $x_1,...,x_n$ as the attribute values and a class association $y$ (possible class labels are $y_1$ and $y_2$).

From a set of unlabeled instances, i.e. instances consisting only of values for $x_1,...,x_n$ , each instance should get assigned to the most probable class.

To do this, I have fit a Kernel Density Estimator using a gaussian kernel on the subsets of $X$ that are classified as either $y_1$ or $y_2$ , written as $X_{y_1}$ and $X_{y_2}$ and on the whole data, written as $X_{y}$.

To get the posterior probability of an instance $\overrightarrow{x}$ belonging to class $y_1$ I would compute

$p(y_1|\overrightarrow{x}) = \frac{KDE_{X_{y_1}}.score(\overrightarrow{x})}{KDE_{X_{y}}.score(\overrightarrow{x})}$

where 'score' refers to a function in sklearn.kde


However, this approach doesn't seem to be right, since some of the posterior estimates that I get by using it are $>1$ .

Where did I go wrong with this approach? What would be the correct approach to probabilistic classification with KDEs ?

Note: If there's an error in the question or additional information is required, please leave a comment and I will try to edit my question.

dml
  • 13
  • 4
  • See http://stats.stackexchange.com/questions/82797/how-to-draw-random-samples-from-a-non-parametric-estimated-distribution – Sean Easter Dec 11 '15 at 16:24

1 Answers1

2

Not quite sure about it, but it seems you made a mistake in the computation of the posterior probability

I'll try to adapt your notation in the following in order to make it more clear

$p(y_1|\overrightarrow{x}) = \frac{KDE_{X_{y_1}}.score(\overrightarrow{x})}{\sum_{i=1}^{|Y|}KDE_{X_{y_i}}.score(\overrightarrow{x})}$

Found this pdf while looking into the problem, see page 27.

kjetil b halvorsen
  • 63,378
  • 26
  • 142
  • 467
deemel
  • 2,402
  • 4
  • 20
  • 37