2

Suppose I have some data points (0.6695, 0.5968, 0.7641, 0.7252, and 0.7779) and want to fit different distributions to this dataset, say, log-normal and log-t (with different df), and then compare the tails of these distributions using graphics. How this can be done in R?

user9292
  • 1,324
  • 2
  • 17
  • 31
  • The tails of the lognormal and log-t are going to be different independently of the sample. What is the goal of your comparison? –  Jul 11 '12 at 14:32
  • Do you really only have five data points? – jbowman Jul 11 '12 at 14:56
  • @Procrastinator The main goal is to compare the tails of these distributions. Then how can this be done using simulation? Also, after fitting these distributions on provided data, I want to compare the tails… as you mentioned, the tails are going to be different depending on the data… – user9292 Jul 11 '12 at 15:15
  • @jbowman. Yes, I only have five data points. – user9292 Jul 11 '12 at 15:15
  • 2
    @act00 What I mentioned is that the tails of those distributions are different *independently* of the sample. The tails of the log-t are heavier than those of the lognormal and they become more similar as the degrees of freedom parameter goes to $\infty$. If you have estimators of both distributions, say $\hat F$ and $\hat G$, then an informative graphical tool is the graph of $\dfrac{1-{\hat F}(x)}{1-{\hat G(x)}}$. –  Jul 11 '12 at 15:25
  • 2
    That's a great answer, @Procrastinator. Why don't you make it official? – whuber Jul 11 '12 at 15:38

2 Answers2

8

Fitting a log-$t$ distribution is not recommended for small samples. However, I am going to present a R code for estimating the parameters $(\mu,\sigma,\nu)$ using MLE.

# Your data
data = c(0.6695, 0.5968, 0.7641, 0.7252, 0.7779)
n=length(data)

# -log-likelihood of (mu,sigma,nu)
lt = function(par){
if(par[2]>0&par[3]>0) return( -n*(log(gamma(0.5*(par[3]+1)))    -log(gamma(0.5*par[3]))-0.5*log(par[3])) + 0.5*(par[3]+1)*sum(log(1+ (log(data)-par[1])^2/(par[2]^2*par[3]))) + n*log(par[2])  )
else return(Inf)
}

# MLE obtained numerically
optim(c(0,0.1,10),lt)

The estimators are $(\hat\mu,\hat\sigma,\hat\nu)=(-0.35162295 ,0.09719402,342.22742778)$. As you can see, $(\hat\mu,\hat\sigma)$ are very similar to those obtained for the lognormal in your previous question. In addition, the estimator of the degrees of freedom is very large $(342.22742778)$, suggesting that the lognormal model is reasonable in this case.

The tails of the log-$t$ are always heavier than those of the lognormal, therefore there is not much point on comparing them graphically. But, again, the MLE suggest that the lognormal is reasonable.

In order to compare these models you can use the AIC (which again might be affected due to the small sample size), which in this case favours the lognormal model.

library(MASS)

# -loglikelihood for the lognormal model
lln = function(par){
if(par[2]>0) return( - sum(log(dlnorm(data,par[1],par[2]))) )
else return(Inf)
}

# AIC for the lognormal model   
optim(c(0,0.1),lln)$value + 2*2

# AIC for the log-t model
optim(c(0,0.1,10),lt)$value + 2*3

Best regards.

  • Many thanks, @Procrastinator. I'm going over it to understand everything... a quick question: how can i find $P(X<0.80)$ using log-t? Thanks again!!! – user9292 Jul 11 '12 at 17:39
  • @act00 This can be calculated using the estimated parameters $(\hat\mu,\hat\sigma,\hat\nu)=(−0.35162295,0.09719402,342.22742778)$ as ${\mathbb P}(X<0.8)=$`pt((log(0.8)-muh)/sigmah, df=nuh)`. Note that I am using a log transformation in order to use the command `pt`. The resulting value is $0.9064$ which is very close to the value obtained in the lognormal case in your previous question. –  Jul 11 '12 at 18:18
3

If you fit a distribution to a small sample a good fit will only really measure how well the body of the distribution fits the data because you won't see much of the tail of a distribution unless you have a very large dataset. Distributions can look very similar in the body and yet very different in the tails. Comparing the tails of the distributions as Procrastinator pointed out can pretty much be done from knowing the particular distributions being selected without knowing anything about the data and in a small sample what you know from the data won't tell much if anthing about the tails. So I don't see where comparing the tails of competing distributions makes any sense in data analysis.

Michael R. Chernick
  • 39,640
  • 28
  • 74
  • 143