2

I'm trying something out in R and I'm curious how one would go about doing this. Let's say I have a sample of Americans and their income, furthermore I know that they are in the 90th-99th percentile of all American earners. If I assume I have that ALL income is normally distributed, how would I fit a normal distribution to the information I am given.

I have been playing around in R and using the fitdistr package and some youtube videos, however, the only examples I've seen is where fitdistr plots the entire distribution treating the input data as the whole instead only as a part if that makes sense.

So fitdistr will give me a mean and a standard deviation for my data, but in doing so it appears to be treating this small slice of earners as the entire population of Americans (obviously I'm doing something wrong) and not just as the portion falling in the 90th-99th percentile. If that makes sense?

Is there a name for what I'm trying to do beyond just fitting distributions? I'm just looking for a jumping off point or any resources that someone could point me to.

I definitely feel like this is possible, I have been playing around using different means and standard deviation levels (for instance using a mean of 50k and a std. of 25k) and then seeing how those distributions line up with the data I have but I'm just doing this visually, I'm sure there is a more rigorous way to do it in R I just don't know, I'm pretty lost because I can't get past the part to tell R that the input data is only a SLICE of the curve

Jed Bartlet
  • 121
  • 1
  • It's possible your problem concerns percentiles *as estimated from data,* in which case the thread at https://stats.stackexchange.com/questions/207403/estimating-a-normal-distribution-from-three-order-statistics/276322#276322 gives one solution. – whuber Aug 29 '18 at 15:52
  • See also https://stats.stackexchange.com/questions/263829/i-have-three-probabilities-of-a-value-falling-within-the-following-ranges-5/263837#263837 – Sycorax Aug 29 '18 at 15:56
  • @whuber Sorry about that, I had seen the questions before but couldn't really make sense of the answers. I saw your answer specifically but couldn't quote follow, I'll try looking at that again, sorry for the dupe. – Jed Bartlet Aug 29 '18 at 18:02
  • Just change "15" to "90" and "50" to "99" throughout. The point is that your two percentiles give you two equations for the two unknown parameters and all you have to do is solve them, usually for a unique solution. – whuber Aug 29 '18 at 19:32
  • @Whuber. I've re-read and I'll try and again. For a norm distribution I think intuitively I can understand because I understand the building blocks of a normal distribution, is it a similar process for other distributions like a gamma distribution? I think it's partly a notation thing, I'm used to working in spreadsheets and not so much with formal mathematical notation if that makes sense. I really appreciate your help though, I desperately want to learn how to do this stuff. Are there any resources you could recommend that focus on practical applications of statistics? – Jed Bartlet Aug 29 '18 at 20:39
  • @Whuber I have an idea. Let me know if you think this would be appropriate. I can use R to generate a sequence of random numbers that are normally distributed with mean 0 and standard deviation 1. Then I'll sort them sequentially, and pick the first 25%. And then try to back out the normal distribution parameters (already knowing the answer) and then I can post my work here if I can't figure it out. Would that count as a dupe or would that be a legit question? – Jed Bartlet Aug 29 '18 at 20:54
  • That's not the same thing as having percentiles: it would be equivalent to having the order statistics. Because percentiles are fixed numbers but order statistics are random, the analysis is quite different, as indicated in the previous link I gave. – whuber Aug 29 '18 at 22:04

0 Answers0