0

How do you fit/estimate a probability distribution using only the information from boxplot, i.e, max, min, 1st quartile, 3rd quartile and median? I don't have access to raw data but only a boxplot information is given.

bninopaul
  • 101
  • 2
  • Can't be done, although you might try to create some sort of empirical density function based on that data. – user2974951 Sep 12 '18 at 13:06
  • 1
    do you have any other information that would allow you to hypothesise what the expected distribution would be? – ReneBt Sep 12 '18 at 13:26
  • @ReneBt, the only information given are the boxplot values, we don't have access to the raw data/values, but we know that the observations are discrete. – bninopaul Sep 12 '18 at 14:09
  • 2
    What can our cannot be advised depends on what you are doing with it and what exactly you know and how confident you need to be in the outcome. As @user2974951 says you can't create a distribution based on what you have told us. is this a study question and therefore you expect there to be a solution? If so you need to share the full context. Or is this something that you have no idea should work? – ReneBt Sep 12 '18 at 18:46
  • You need to be more specific about the source of the boxplots and your objectives. Perhaps looking at this somewhat similar [Q & A](https://stats.stackexchange.com/questions/365865/build-a-normal-distribution-from-n-quartiles-and-mean/366284#366284) will help you formulate a more specific question. // Do you anticipate the distributions may be normal and just need to find $\mu$ and $\sigma?$ Or do you have no clue about the data? History test scores? Times to failure of electronic devices? Liver enzyme data on hepatitis patients? // If you knew exact answer, how would you use that? – BruceET Sep 12 '18 at 22:16
  • THe difference with that other question and this one is that in that question the aim is to fit a specified distribution rather than to try to choose or estimate one. Given that specification, reasonable efficiency would be the main aim (along with potential secondary aims like simplicity), and reasonable estimators would not be difficult to arrive at. But here we know the distribution is discrete (and, seemingly, nothing else but the five number summary), so it's a considerably harder problem, since a model must first be chosen (and those five values don't pin it down very well). – Glen_b Sep 13 '18 at 03:26
  • The example toward the end of [this answer](https://stats.stackexchange.com/questions/137965/box-and-whisker-plot-for-multimodal-distribution/137982#137982) gives some sense of the potential difficulty - 4 sets of data with very different shapes but identical boxplots, and [this answer](https://stats.stackexchange.com/questions/135737/will-two-distributions-with-identical-5-number-summaries-always-have-the-same-sh/135760#135760) clarifies what the constraints are (and makes it clear exactly what the available 'wiggle room' is). The last might almost count as a duplicate. – Glen_b Sep 13 '18 at 03:36

0 Answers0