1

I have some proportion data looking like this:

Histogram of my data

This data is bounded between 0 and 1, with most values being either 0 or 1.

I would like to find an appropriate distribution to model this data.

I thought i would use a beta distribution (with parameters below 1 to get the right shape), but it is not defined at 0 and 1 when the parameters are below 1. I had a look at related distributions like the arcsine, but it is symetric while the data is not.

Any suggestion? Thanks!

Data extract:

c(1, 0.834873928492229, 0.83487387774498, 0.832912251212133, 
0.263146420579504, 1, 0.999973747392683, 1, 0.834874115370994, 
0, 0, 0.589727145106873, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 
1, 1, 0, 0.523252687858625, 1, 1, 0.77417229246272, 0.715053944817417, 
0.429564600400542, 0, 0, 1, 0, 0, 0, 1, 0, 1, 1, 1, 1, 1, 1, 
1, 1, 1, 1, 0, 0, 1, 0.434319348458047, 0, 0, 0, 1, 1, 1, 0, 
1, 1, 0, 1, 0, 0, 1, 0, 1, 1, 0, 0, 1, 1, 1, 0, 0, 1, 1, 0.723801924032693, 
0.72380206435232, 1, 0, 1, 1, 1, 0, 0, 1, 1, 0, 0, 0, 0, 1, 0, 
1, 0, 1, 0, 0.720429440699491, 0.72380206435232, 0.723802085309152, 
0, 0.742684754684826, 0.50351343422981, 0, 1, 0, 0.318017023094, 
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 1, 0.274568218944055, 
0.769911022662505, 0, 0, 0, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 
1, 1, 0.671171423217696, 0.72380206435232, 0.72380206435232, 
0.72380206435232, 0.539495715072002, 0, 0, 1, 1, 0, 0.560603050501015, 
0, 0, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 
0, 0, 0.579301514160636, 1, 1, 0, 0.207897072302231, 0.207897072302231, 
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 1, 1, 0, 
0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 1, 0.519829707163405, 
0, 1, 0, 1, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 
1, 1, 0, 0, 0, 1, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
0, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0
)
asac
  • 197
  • 1
  • 1
  • 14
  • Do either of these http://stats.stackexchange.com/questions/108918/seeking-a-continuous-parametric-bimodal-sampling-distribution-for-proportions?rq=1 and http://stats.stackexchange.com/questions/90642/appropriate-distribution-for-bounded-data-set?rq=1 help? – mdewey Dec 12 '16 at 15:30
  • Not really, I am looking for a bounded distribution quite different from those described in these questions. – asac Dec 12 '16 at 18:47

1 Answers1

0

Instead of a single distribution, why not use a mixture model? The mixture could be informed by theory and your observation that the non-0/1 values are not uniformly or symmetrically distributed.

Wayne
  • 19,981
  • 4
  • 50
  • 99