3

I want to construct my likelihood.
General case:
If my data do come from a line of the form $y = mx + b$ and the uncertainties are normally distributed with mean zero and known variance $\sigma_y^2$, then the likelihood would be: $$p(y|x, \sigma_y, m, b) = \frac{1}{\sqrt{2\pi \sigma_y^2}} \exp(-\frac{(y - mx - b)^2}{2\sigma^2}) $$

However, in my case, the errors follow a gamma distribution. How can I construct my likelihood?


Edit

To make things clearer, I have a set of data points represented by $x_i$ and $y_i$ that have measurement errors denoted by $\sigma_{x,i}$ and $\sigma_{y,i}$.
Example, my data points look the following:
$x$ $\sigma_x$ $y$ $\sigma_y$
-0.5 $\pm$ 0.02 0.14 $\pm$ 0.004
0.2 $\pm$ 0.03 0.5 $\pm$ 0.002
.....
I want to calculate the likelihood $p(y|x, \sigma_y, m, b)$. To check if I can use the formula written above I had to test whether the measurement uncertainties ($\sigma_y$) are Gaussian. I did a QQ plot and the result was that they don't.
The measurement uncertainties seem to come from a Gamma distribution.
My question is how can I model the likelihood if the measurement errors come from a Gamma distribution.

aloha
  • 410
  • 2
  • 9
  • How did you arrive at this equation to begin with? What is the definition of likelihood? – jlimahaverford Mar 18 '15 at 12:57
  • @jlimahaverford I explained in more details the equation of the likelihood in this [question](http://stats.stackexchange.com/questions/142097/how-to-construct-the-likelihood-if-my-errors-are-not-gaussian) – aloha Mar 18 '15 at 13:05
  • This question needs clarification. Normally, we would think of something as an "error" only if its conditional mean is zero. Thus it *appears* you are assuming that the distribution of $y$ conditional on $x$ has some kind of Gamma distribution whose mean is $mx+b$. Is that your intention? And how many parameters do you include in your Gammas: one (for the shape), two (for shape and scale), or even three (for shape, scale, and an additive locational offset)? For the three-parameter Gamma an alternative interpretation is that $mx+b$ is the third parameter. – whuber Mar 18 '15 at 21:39
  • Rather than trying to do it with errors, define the conditional distribution of your response given your predictors. – Glen_b Mar 19 '15 at 05:52
  • Do you have a compelling rationale for using a Gamma distribution of your errors? I would agree with @Glen_b since what you're really interested in is the conditional model of your outcome variable given explanatory variables. In Econometrics you should be able to give some sort of interpretation of why exactly gamma distributed errors and not say normal, which is more conventional. Also you can always use the model equation to solve for your errors as outlined in the answer below. Cool question though! – Hirek Mar 20 '15 at 12:32
  • @Hirek Thank you for your answer. I'm still reading the theory behind modeling the distribution of my errors. You're right, usually the normal distribution is used, however, when I plot a histogram of my errors, they do not seem to come from a normal distribution. Furthermore, I did a QQ plot test and indeed, they do not follow a normal distribution. I would like to know if I'm 'thinking' correctly. Is this the right way to solve my problem? – aloha Mar 20 '15 at 12:37
  • @po6 the short answer is no although your analysis up to now is correct. If you fit a linear model using OLS or a normal likelihood (which are equivalent in the linear case) but your errors are not normal then your model does not fit. This is a simply chi square test of over-identifying restrictions and your model is rejected. What you should do is use a better model until your errors are normal. Even better, you should think about what's going on, i.e. what does your model describe and see whether theory gives you any guidance. So write down a new model and try to fit that. – Hirek Mar 20 '15 at 15:42
  • @Hirek I guess I didn't make thing clear, please check my updated question. What I meant by errors is the measurement uncertainties I have on my original data. In astronomy, there is always uncertainties present on the data. These uncertainties I'm trying to fit to a distribution. I hope things are clearer now. – aloha Mar 20 '15 at 20:28
  • It is not the case that the "errors form a Gamma distribution." The edit describes a situation where the recorded "measurement uncertainties" can be described approximately with a Gamma distribution. This says *nothing* whatsoever about the distributional form of the errors themselves! How to proceed depends on how you know the "$\pm$" terms, as well as on the nature of the measurement process. Where do these numbers $\pm 0.004$ *etc* come from? Repeated measurements? An earlier calibration? Some model of the measurement device? – whuber Mar 20 '15 at 23:41
  • @whuber, these measurement uncertainties come from fitting theoretical models to observations. My data is astronomical data and I downloaded them from the Kepler database. If you check my first comment, I linked this question to another one. It has more details about how I'll be constructing my likelihood, as I'm following the method described by Kelly 2007. – aloha Mar 21 '15 at 07:29
  • @po6 I am not sure you're doing this right. Uncertainty is a different matter. I would recommend Introduction to Error Analysis by John Taylor because depending on how you measure things, your uncertainties need to be formed differently. – Hirek Mar 21 '15 at 14:47
  • Unfortunately, neither I nor the commenters to your other question have been able to make sense of it. It does not seem to describe either the data used to estimate your error distributions, nor the method used to make those estimates. – whuber Mar 21 '15 at 16:36

1 Answers1

3

If we use the probability density function (pdf) of the Gamma distribution given in here then the pdf of uncertainties or ($b$) is $$f(b;k,\theta)=\dfrac{b^{k-1}e^{-\frac{b}{\theta}}}{\theta^k\Gamma(k)} \quad (1)$$ To write the loglikelihood of $y$ you neeed to find the pdf of $y$. The easiest way is to use the distribution function as follows: $$P(y\leq z)=F_{y}(z)=P(mx+b\leq z)=P(b\leq z-mx)=F_{b}(z-mx).$$ Now take the derivative w.r.t $z$ from both side to find the pdf of $y$ evaluated at $z$ i.e. $$f_{y}(z)=\dfrac{dF_{y}(z)}{dz}=\dfrac{dF_{b}(z-mx)}{dz}=f_{b}(z-mx). \dfrac{d (z-mx) }{dz}=f_{b}(z-mx) \quad (2).$$ Next in (1), replace $b$ with $z-mx$ to have: $$f_{y}(z;x,m,k,\theta)=\dfrac{(z-mx)^{k-1}e^{-\frac{(z-mx)}{\theta}}}{\theta^k\Gamma(k)}; z>0, k>0, \theta >0 \quad (3).$$ Eq. (3) is the loglikelihood of $y$ when $b$ has a Gamma distribution with parameters $k$ and $\theta$.

Stat
  • 7,078
  • 1
  • 24
  • 49