0

I have only the following summary information from which I have to approximate the integral $\int_xxf(x)dx$.

Interval Frequency
 20-40      12
 40-60      18
  60+       14 

As the last interval is an open interval, I think I cannot find the midpoint of that interval and use $\sum_xxf(x)$ to approximate the integral. So, what is the usual practice in this case?

kjetil b halvorsen
  • 63,378
  • 26
  • 142
  • 467
Blain Waan
  • 3,345
  • 1
  • 30
  • 35
  • This question has been answered in several places, including http://stats.stackexchange.com/questions/60256, http://stats.stackexchange.com/questions/145777, http://stats.stackexchange.com/questions/214576, http://stats.stackexchange.com/questions/71169, *etc.* – whuber Jun 08 '16 at 14:15
  • So, as in this case the last interval is open ended, how can I assume a bin from that interval? Should I just assume where the mid point of the bin might be? Specially if this case were for age distributions, an open age interval of 60+ might have a midpoint of bin at around 80, if I assumed the highest age to be 100? But, the distribution of that interval may also be right skewed instead of uniform, so may be it's hard to assume an appropriate bin in this case. – Blain Waan Jun 09 '16 at 04:43
  • That's right. And, given no context or additional information, we would have to say that the mean of the last interval could literally be any number $60$ or larger. – whuber Jun 09 '16 at 12:57
  • One standard, conservative heuristic for assigning a numeric value to the last, open-ended interval would be to fix it at 60 and *not* to assume a value greater than that. The reason for this is that '60' is the only numeric value for the interval about which you have any information. Obviously, this will underfit the true distribution for age but the whole approach is approximating and imprecise anyway. – Mike Hunter Nov 05 '17 at 14:30

0 Answers0