Is a kernel density estimate meaningful if > 25% of my data are duplicates?

Question

The title pretty much says it all. I have data that consists of 80 samples but there are always at least four samples that have exactly the same value. I want to assess, whether the data is unimodal. Now the plot of the kernel density estimate shows clearly that the data has two "hooks":

However this data would have never been obtained by any continuous probability distribution, so I am not sure whether it is appropriate to use a kernel density estimate at all?

If the duplicates are due to rounding or similar forms of discretization, ties in the data apparently cause issues in [bandwidth selection using cross validation](http://www.ism.ac.jp/editsec/aism/pdf/060_1_0021.pdf) and there seem to be [posts](https://stats.stackexchange.com/questions/88297/kernel-density-estimation-incorporating-uncertainties) about k.d.e. with uncertainties in the data. Apart from these, I'd guess that the error imposed by ties is largely dependant on the context - understandably, using wide bandwidths, k.d.e. of rounded data looks similar to k.d.e. not-rounded data. — adityar, Nov 06 '18 at 13:48
One idea in the second link is to use wider wider kernels on points with ties. — adityar, Nov 06 '18 at 13:53

score 0 · Answer 1 · answered Sep 16 '19 at 19:56

if these duplicates are drawn from same random variable then yes (experiment can yield same observation sometimes and it is normal) - although it would be better to give answer in terms of probability of RV unimodality under some conditions (kernel variance used for K.D.E.) maybe function of probability of unimodality under variance value

OR

I think it is better solution

you can compare your data to some unimodal distribution and apply rank-test

Is a kernel density estimate meaningful if > 25% of my data are duplicates?

1 Answers1