1

I am using income data from the Current Population Survey for a small undergrad economics paper.

In economics, there is evidence that the income of 97%–99% of the population is distributed log-normally. The distribution of higher-income individuals follows a Pareto distribution.

I have used kernel density estimation to plot the lower 99% and the graph does appear to be log-normal. But I would like to estimate mu and sigma; how do I go about this?

I have been reading about maximum likelihood estimation. But I'm just not sure how to calculate this when I have 200,000 rows of information. Do I have to write my own algorithm to sum over all of my x's? Or is there a built-in function I could use?

I would ideally like to do this in R or Stata.

Tim
  • 108,699
  • 20
  • 212
  • 390
amofo
  • 45
  • 1
  • 4

1 Answers1

1

I am not sure if this question belongs to stats.stackexchange. Anyhow you don't need to write any function! Here is how to generate a random sample from a lognormal distribution and then estimate parameters in R.

> #Generate 200,000 random sample from a lognormal distribution with mean .5 and s.d.=2 
> x=rlnorm(200000, meanlog = .5, sdlog = 2)
> #Load package MASS
> library(MASS)
> #M.L. estimate of the parameters
> fitdistr( x, densfun = "log-normal")
     meanlog        sdlog   
  0.491560746   1.999413446 
 (0.004470824) (0.003161350)
> 
Stat
  • 7,078
  • 1
  • 24
  • 49
  • Thanks Stat! Is this function manually doing maximum likelihood estimation? I understand Cross Validated isn't a place to ask coding questions, but I was not even sure if MLE was the correct way to estimate the paramenters (still very new to statistics) so I wanted to ask here to get statistical input. Thanks again. – amofo Mar 15 '15 at 16:42
  • @amofo maybe look at this thread to learn more on MLE: http://stats.stackexchange.com/questions/112451/maximum-likelihood-estimation-mle-in-layman-terms – Tim Mar 15 '15 at 16:47
  • It is a maximum likelihood estimation (I am not sure what you mean by manually) ... to see the help file of `fitdistr` in R simply type `?fitdistr` after loading package MASS. – Stat Mar 15 '15 at 16:47
  • This doesn't seem to respond to the question, which asserts that the upper tail follows a Pareto distribution rather than a lognormal. By including all the data in the analysis you would be fitting a different model. – whuber Mar 15 '15 at 19:20
  • Thanks for pointing this out @whuber. I must remove the values corresponding to the top 1% and then run the fitdistr. Correct? – amofo Mar 15 '15 at 23:28