12

Let $\{x_1,\ldots,x_N\}$ be observations drawn from an unknown (but certainly asymmetric) probability distribution.

I would like to find the probability distribution by using the KDE approach: $$ \hat{f}(x) = \frac{1}{Nh}\sum_{i=1}^{N} K\bigl(\frac{x-x_i}{h}\bigr) $$ However, I tried to use a Gaussian kernel, but it performed badly, since it is symmetric. Thus, I have seen that some work about the Gamma and Beta kernels have been released, although I did not understand how to operate with them.

My question is: how to handle this asymmetric case, supposing that the support of the underlying distribution is not in the interval $[0,1]$?

Danica
  • 21,852
  • 1
  • 59
  • 115
Eleanore
  • 233
  • 1
  • 10
  • 4
    In the case of densities that are close to lognormal (which I encounter a lot in some particular applications), I simply transform (by taking logs) and then do KDE, and then transform the KDE back (you need to remember the Jacobian when transforming the estimate back). It works quite well in that case. – Glen_b Feb 15 '13 at 15:27
  • @Glen_b do you have any reference or material where this method is described? (Calculating the KDE on a transformation of the original variable and then transforming the KDE back) – boscovich May 25 '13 at 08:19
  • Not that I know of - I'm sure they exist, since it's a rather trivial idea, and easily implemented. It's the sort of thing I'd expect a stats undergrad to be able to derive. In practice it works very well. – Glen_b May 25 '13 at 09:20
  • @glen_b thanks. So if I was to use it in a technical report/publication do you think it would be ok to not give any references? – boscovich May 25 '13 at 14:00
  • @Glen_b I've gotten some very goofy looking results doing that before, partially because the bandwidth is no longer playing the same sort of role in the transformed KDE. Lots of artifacts (I've used I've used it with a logistic transformation). – guy May 25 '13 at 14:51
  • You problem is not that the kernel function has to be symmetric, but rather, that it has to be bounded. [KernSmooth](http://stat.ethz.ch/CRAN/web/packages/KernSmooth/) can do KDE with a choice of kernel functions --including bounded ones such as the beta(2,2)-- – user603 May 25 '13 at 15:02
  • andrea - well, I'd have no qualms about doing it or justifying it, but it kinds of depends on where you're trying to do it – Glen_b May 25 '13 at 15:13
  • 1
    @guy It's certainly possible to have problems, especially with some transformations and some kinds of data. The situations I've used it tend to be pretty close to lognormal, and there the change in bandwidth that you see as a problem is exactly what's needed; it does a great deal better than KDE on the raw data does. From the OP's description it sounded pretty similar, but it's not like I was suggesting it was a *panacea*. – Glen_b May 25 '13 at 15:18
  • Some useful solutions for certain kinds of asymmetric distributions are presented in the closely related thread about [density plots of non-negative variables](http://stats.stackexchange.com/questions/65866/good-methods-for-density-plots-of-non-negative-variables-in-r). – whuber Nov 21 '13 at 16:39

2 Answers2

5

First of all, KDE with symmetric kernels can also work very well when you data is asymmetric. Otherwise, it would be completely useless in practice, actually.

Secondly, have you considered rescaling your data to fix the asymmetry, if you believe this is causing the problem. For example, it may be a good idea to try going to $\log(x)$, as this is known to help in many problems.

Has QUIT--Anony-Mousse
  • 39,639
  • 7
  • 61
  • 96
0

Hmm. You might want a kernel width that changes as a function of location.

If I were looking at the problem in eCDF then I might try and make a numeric slope of the CDF relate to the Kernel size.

I think that if you are going to do a coordinate transform, then you need to have a pretty good idea of the start and end points. If you know the target distribution that well, then you don't need the Kernel approximation.

EngrStudent
  • 8,232
  • 2
  • 29
  • 82