I was looking for some way to deal with boundary bias of kde in case of a unit interval. One example is an usage of Chen estimators (or Beta estimators; an example might be seen here: http://stats-www.open.ac.uk/TechnicalReports/mcjdah.pdf -p.4) - instead of typical kernel density estimator $$ \hat{f}(x) =\frac{1}{n} \sum_{i=1}^{n}K(x,X_{i};h) $$ we obtain: $$ \hat{f_{C1}}(x)=\frac{1}{nB(\frac{x}{h^2}+1,\frac{1-x}{h^2}+1)}\sum_{i=1}^{n}X_{i}^{x / h^2}(1-X_{i})^{(1-x) / h^2}, $$ where B() is beta function
The difficulty which I encountered is underflow problems in calculating beta function in case of large values of parameters. For example in R:
data <- runif(10000)
Chen_kde <- function(x,input,h=1/length(input)^(0.9)){
p = x / h + 1
q = (1-x) / h + 1
output = mean(trans_data^(p-1)*(1-trans_data)^(q-1)/beta(p,q))
return(output)
}
Chen_kde(0.1,data)
Warning message:
In beta(p, q) : underflow occurred in 'beta'
I found that one way to tackle this problem is to approximate a beta distribution with a normal density with equal mean and std deviation. However, each element of above mentioned sum is only "similar" to the beta distribution since x lies in exponent, not in base. My question is if in this example I can also approximate somehow each element to get rid of underflow problems or there can be some other successful methods to correct boundary bias of kde for unit interval.