4

Is there a general methodology for dealing with greater than or less than numbers when calculating descriptive statistics.

For example if I had a series of measurements from a biological assay which could not measure below 5 and I wanted to calculate descriptive statistics for a dataset derived from a group of patients.

e.g

    P1  <5 
    P2  <5
    P3  6
    P4  7
    P5  7
    P6  6
    P7  6
    P8  <5
    P9  8
    P10 9

Is the only option to calculate median plus interquartile range or is it still possible to calculate mean and standard deviations?

whuber
  • 281,159
  • 54
  • 637
  • 1,101

1 Answers1

1

If you make some distributional assumptions you can certainly estimate the population mean and standard deviation of the "underlying" uncensored variable; this is just inference under censoring, a standard statistical problem.

If those "less than" numbers are also bounded below by $0$ (which I assume is the case), then you can also bound the uncensored sample quantities without a distributional assumption (i.e. you can state an interval within which the mean must lie, and similarly for the standard deviation).

Glen_b
  • 257,508
  • 32
  • 553
  • 939