1

I have data that (in theory) should be normally distributed. But there are additional sources of noise and I want to estimate the likelihood of the data using Student T distribution in order not to penalise outliers too strictly.

How can I do this? I thought that just $(X - \mu) / \sigma$ and likelihood calculation using standard T distribution (i.e. with dt() in r) can help, but it gives obviously wrong likelihood values.

It can be illustrated in r as:

vect <- rnorm(1000, sd=0.05)
likelik <- sum(log(dt(scale(vect, center=0, scale=0.05) , df=1000)))
likelik1 <- sum(log(dnorm(vect, mean=0, sd=0.05)))

Since the df parameter is really big, the T distribution should be close to normal. But, as you can see, the likelihoods are really different.

UPD: actually you can normalise likelihood by $1/\sigma$. So likelik <- sum(log(dt(scale(vect, center=0, scale=0.05) * 1 / 0.05 , df=1000))) seems to be a solution. It can be proven strictly using the integration, I think. Sorry for bothering. If you have something to add or correct, please, write it and I accept the answer.

UPD1: Do not use Student's penalization for dealing outliers in Mixture of Normals. Just use Trimmed BIC instead

German Demidov
  • 1,501
  • 10
  • 22
  • What exactly makes them "obviously wrong"? – Tim Sep 20 '16 at 13:20
  • @Tim thank you for the answer. I calculate log likelihood as `log(dt())` or it can be even `dnorm`. The area under the density curve should be equal to 1 so when I normalise by scale the maximum likelihood value that I can get is changing. I will add an example in `r` to the body of the answer in order to clarify the problem. – German Demidov Sep 20 '16 at 13:28

1 Answers1

2

Your understanding of probability density functions seem to be incorrect. Recall that density is probability "per feet". You are correct that total area under the curve integrates to unity, but what follows is that if the area changes, then the density function needs to be re-scaled so to integrate to unity. The most basic example is continuous uniform distribution that changes it's height when parameters change.

Take a look at the following plot. If $X$ follows standard normal distribution, then $X/1.5$ follows normal distribution parametrized by $\sigma=1.5$, yet, dnorm(x/1.5) and dnorm(x, sd = 1.5) densities differ in their height.

enter image description here

On another hand, if you re-scaled the density, they return the same outputs:

set.seed(123)
x <- rnorm(1e5, sd = 1.5)
all.equal(dnorm(x, sd = 1.5), dnorm(x/1.5)/1.5)
## [1] TRUE

The same with your example, if in both cases you used re-scaled data $t$-distribution and normal distribution would give similar densities and would result in similar likelihoods:

set.seed(123)
vect <- rnorm(1000, sd=0.05)
sum(log(dt(scale(vect, center=0, scale=0.05) , df=1000)))
## [1] -1410.342
sum(log(dnorm(scale(vect, center=0, scale=0.05))))
## [1] -1410.306

the same as re-scaled density

sum(log(dnorm(vect, sd = 0.05) * 0.05))
## [1] -1410.306

Basically, if $f(x)$ is probability density function and $X \sim f$, then when scale parameter $\sigma$ is introduced, probability density function of $X/\sigma$ is $f(x/\sigma)/\sigma$.

For more details check the following thread that introduces probability densities: Can a probability distribution value exceeding 1 be OK? You may be also interested in reading about location-scale distributions, e.g. in Wikipedia, or in online Engineering Statistics Handbook.

Tim
  • 108,699
  • 20
  • 212
  • 390
  • Thank you. The problem is that I know it, but when this question arised within the complex problem, I became somehow retarded and forgot the simple things...I hope this question will help somebody in the future not to spend 3 hours on such a simple thing =) will share the bounty tomorrow, it does not let me do it. The another question is: is it useful to use robust estimation of $\sigma$ and Student t-distribution as likelihood measure? – German Demidov Sep 20 '16 at 15:25
  • 1
    @GermanDemidov it is hard to answer about appropriateness of t distribution without knowing more context but, for example, regression with t distributed errors is used as more robust approach to standard regression etc. – Tim Sep 20 '16 at 16:20