5

Do you know how I could get the inflection points from an R density plot (e.g. looking like the one generated by the code below)?

Thanks!

set.seed(1453)
distro<-c(rnorm(500,mean=0.01,sd=0.001),rnorm(200,mean=0.9,sd=0.01),sample(c(0:100)/100))

distro<-distro[distro>0]
distro<-distro[distro<1]

plot(density(distro))
Federico Giorgi
  • 189
  • 1
  • 11
  • 2
    @BabakP perhaps the user was referring to the change of sign in the second derivatives, which is also referred to as curvature (imprecisely)? – Gavin Simpson Aug 26 '13 at 21:30
  • 1
    A kernel density estimate might have many local points of inflexion. – Glen_b Aug 26 '13 at 23:02
  • 1
    @Glen_b Although that is true, the adaptive selection of bandwidth in `density` tends to keep the number of inflection points low. – whuber Aug 26 '13 at 23:04
  • 1
    Federico, you may be tempted to take second differences of density to identify where those cross the x-axis, but that's not very stable. You'd at least need to smooth it; there are better ways than that using kernels, though I'd actually suggest using logspline density estimation here. – Glen_b Aug 26 '13 at 23:12
  • Glen_b: thanks for the comment. So what would you think of my solution (posted below), assuming a strong smoothing for the density function? – Federico Giorgi Aug 26 '13 at 23:15
  • In some applications that may be sufficient. You can use KDE more directly for derivative estimation, but the optimal bandwidth for estimating second derivatives tends to be much wider than for the density. See also the discussion I linked elsewhere. – Glen_b Aug 26 '13 at 23:21

1 Answers1

2

Thanks for the comments. We achieved a solution by simply calculating the first and second derivatives, and seeing where the second switches sign. Below, the code (credit goes to my colleague Yishai Shimoni):

dens<-density(distro)
# dy/dx first derivative
first<-diff(dens$y)/diff(dens$x)
# Second derivative
second<-diff(first)/diff(dens$x[1:511])
# Condition for inflection point
flections<-c()
for(i in 2:length(second)){
    if(sign(second[i])!=sign(second[i-1])){
         flections<-c(flections,i)
    }
}
plot(density(distro))

abline(v=dens$x[flections],lty=2)
Federico Giorgi
  • 189
  • 1
  • 11
  • 2
    You did exactly what I feared. Take care, this is not a robust approach. [This answer makes some relevant points](http://stats.stackexchange.com/a/33958/805) – Glen_b Aug 26 '13 at 23:13
  • 2
    This is the right idea, but may need some improvements. (1) you should consider smoothing the second derivative to avoid the potential for a large number of spurious results. (2) There is no need to divide by the differences of the x-values because they are constant and therefore will not change the signs. (3) It's faster (although still somewhat imperfect) to identify locations of sign changes with an expression like `which(second[-1]*second[-length(second)]<=0`. – whuber Aug 26 '13 at 23:17
  • 2
    [Illustration of the non-robustness](http://i.imgur.com/kYygKeU.png) of the 2nd-difference estimate of the second derivative. Many articles discuss optimal bandwidth for derivative estimation. There are some notes [here](http://www.ssc.wisc.edu/~bhansen/718/NonParametrics1.pdf) that discuss the issue briefly on pages 12-15. (edited my last two comments together since I needed to make a small correction to one) – Glen_b Aug 27 '13 at 02:09
  • @Glen_b I believe that plot is showing up artifacts in the `density` calculation itself, most likely resulting from approximations used to speed up the calculation. (Its manual page mentions several procedures that will cause these artifacts.) When a Gaussian kernel is used and *accurately* calculated (which can really slow things down), the second differences should still look smooth. Nevertheless, the point is clear: to defend against such things, introducing smoothing into the second derivative computation (or computing the second derivative directly from the data) is essential. – whuber Aug 27 '13 at 15:50
  • 1
    @whuber I agree with your points, but on the other hand the dangers associated with naive numerical approximation of derivatives aren't entirely a function of the particular choices in the implementation of `density` (though plainly they greatly exacerbate them and so nearly all of what can be seen in that plot I gave can be avoided with some care). I wouldn't say 'don't try to do this' it's more 'go into this with an understanding of the potential numerical issues'. – Glen_b Aug 27 '13 at 22:56