2

I'm using the scipy.stats.gaussian_kde function to generate a KDE from a set of $N$ points in a 2D space: $A = \{(x_1,y_1), (x_2,y_2), (x_3,y_2), ..., (x_N,y_N)\}$

Each one of these points has a given error attached to it. So for example, the point $(x_1,y_1)$ has errors $(e_{x_1},e_{y_1})$ and so on. I can assume the errors are normally distributed in both axis.

The python function that I use to generate the KDE has no way to integrate these errors into the calculations and I wonder how I would even do such a thing if I did it manually.

Ie: what is the statistically correct way to generate a KDE accounting for errors in the data used?

Gabriel
  • 3,072
  • 1
  • 22
  • 49
  • Did you find a solution to you problem? I now have a similar case, but with 1D data instead 2D. I have an error associated with each value and would like to generate a new array with these errors – Srivatsan Mar 11 '15 at 10:34
  • @ThePredator see this question: http://stackoverflow.com/questions/28330959/kernel-estimation-using-one-bandwidth-value-per-point/28487390 – Gabriel Mar 11 '15 at 11:55
  • @ThePredator You're welcome :) And if you come up with some way to improve the answer in that question, please share it over there. Cheers! – Gabriel Mar 11 '15 at 13:33

1 Answers1

1

You will need a robust loss function in the kernel estimation model. However, this topic may become quite advances very fast. :) For a good start, I would suggest the one class SVM from sklearn. http://scikit-learn.org/stable/modules/svm.html#density-estimation-novelty-detection

mojovski
  • 249
  • 1
  • 6
  • No idea how I should apply such a method to my issue, sorry. Upvote for pointing me to `scikit-learn`, I hadn't heard about that package, thanks. – Gabriel Oct 15 '13 at 10:57
  • Well, here is an example: http://scikit-learn.org/stable/auto_examples/svm/plot_oneclass.html#example-svm-plot-oneclass-py – mojovski Oct 15 '13 at 14:42