2

My professor has this slide up here:

enter image description here

Here, $y$ is an observed signal. $H$ is a deterministic transformation, which is assumed known. $f$ is the original signal (which we dont know), and $w$ is random gaussian noise. We are trying to recover $f$.

I understand everything, except for, why $p(\mathbf{w})$ = $p(\mathbf{y}|\mathbf{f})$.

That is, I understand that the multidimensional noise PDF is given by the above expression.

But why is that expression, ALSO equal to the likelihood function, $\mathbf{y}$, given $\mathbf{f}$? I'm not seeing this...

Creatron
  • 1,407
  • 2
  • 13
  • 23
  • Where does it say p(w)=Likelihoodfunction? The function p(y|f) is a conditional probability density. – emcor Jun 25 '14 at 17:38
  • @emcor What do you mean? p(w) = p(y|f) as seen above, and p(y|f) is the likelihood function as given [here](https://www.dropbox.com/s/zyxxv56v3niy9m5/Screenshot%202014-06-25%2013.40.25.png) – Creatron Jun 25 '14 at 17:41
  • 1
    The likelihood is [a function of the parameters,](http://stats.stackexchange.com/questions/2641), so notation like "$p(w)$" clearly does not refer to a likelihood. Unless a distribution is assumed for $f$, though, "$p(y|f)$" is not a conditional probability density, either: it merely refers to the probability density of $y$ as it depends on the parameters $f$. By assuming $W=Y-Hf$ has a Gaussian distribution, all you have to do is plug $y-Hf$ into the formula for a (multivariate) Gaussian density. Fixing $y$, $H$, and $C_{ww}$, it becomes a function of $f$: in *that* sense it's a likelihood. – whuber Jun 25 '14 at 17:55
  • @whuber Yes, he is assuming that $y$, $H$ and $C_{ww}$ are known. ....soooo are you saying that ONLY if we assume those three entities are known and fixed, can we say that $P(y|f)$ is a likelihood function?... I find myself getting more confused. He is nonchalantly saying p(y|f) is a likelihood function. Are we saying he is mistaken? – Creatron Jun 25 '14 at 18:04
  • @whuber What if I rephrase: Forget the nouns: Why is p(w) = p(y|f) in the above expression? I understand the expression itself, etc. But I do not get by what right, we can equate p(y|f) to p(w). – Creatron Jun 25 '14 at 18:05
  • 1
    We have to do some careful interpreting because the notation is sloppy. Apparently the model is $Y=Hf+W$ where $W$ is a random vector-valued variable. This makes $Y$ a random variable, too. Given any value of $f$, any *realization* of $Y$, which is written $y$, corresponds to a realization $y-Hf$ of $W$. The probability density of that realization is given by the equation. The right hand side $\Lambda$ is a function of $(y,H,C_{ww},f)$. If you *assume* values for $H$ and $C_{ww}$, and are *given* the data $y$, $f$ remains the only variable and you can study how $\Lambda$ depends on $f$. – whuber Jun 25 '14 at 18:10
  • @whuber I have edited the question with further details from the background setup. Does that help us decipher what is meant here? (I am still digesting what you wrote, all the same). – Creatron Jun 25 '14 at 18:28
  • 1
    Your edits support my suppositions about how to read the slide. The $f$ you are trying to recover plays the role of unknown parameters; everything else is either known or assumed. Thus the likelihood will be considered a function of $f$ and you will later find values of $f$ that make the likelihood as large as possible. You might go even further and deduce confidence limits for your estimates of $f$ by studying how the likelihood varies as you vary $f$ around its maximizing value. You might possibly even adopt a "prior distribution" for $f$, but that would not alter the present interpretation. – whuber Jun 25 '14 at 18:44

1 Answers1

1

$p(w)=p(y|f)$:

It is because of what is annotated in red on the slide, you have $w$ and $y$ linked as:

$w=y-Hf$

,so $p(w)=p(y-Hf)$ as well.

If $H$ and $f$ are held constant, $y$ is the only random variable which determines the probability:

$p(w)=p(y-Hf|f,H)=p(y|f)$.

I assume he omits $H$ because it is defined as constant anyways, so the probability is no longer dependent on $f$ neither on $H$.

He then correctly substitutes $'w=y-Hf'$ into the Gaussian density of $w$.

Placidia
  • 13,501
  • 6
  • 33
  • 62
emcor
  • 1,143
  • 1
  • 10
  • 19
  • 1
    One point worth addressing is the potential confusion of data with random variables. If the right hand side is called a "likelihood," then "$y$" must refer to data (that is, a realization), not to a random variable. Furthermore, if we accept that calling the RHS a likelihood was intentional, then we must emphasize its dependence on $f$ rather than dismiss it. – whuber Jun 25 '14 at 18:24
  • Thank you Emcor. FYI, I edited my question to give more details on the background setup. That said, unfortunately I still find myself somewhat puzzled by why, exactly, p(w) = p(y|f). Specifically, I am not sure why you are saying that f is held constant, when we are in fact trying to find it... – Creatron Jun 25 '14 at 18:26
  • "Likelihood" is the density evaluated at a datapoint, which we have in p(w) as already clear. As we also have p(w)=p(y-Hf), it is a notational convention to write the parameter to be maximized in the Maximum Likelihood Function as $f(x|\theta)$. This convention might be confusing here. – emcor Jun 25 '14 at 18:32
  • Creatron, I believe you may be overthinking this. In a formula like $2-x^2$, $x$ has some definite but unknown value. Your circumstance is no different conceptually. You will estimate $f$ by maximizing the likelihood, just as you would consider varying the unknown $x$ to maximize $2-x^2$, even though the quantity $x$ refers to is whatever it is and that doesn't vary at all. For more about how the likelihood works I will refer you once more to the link I gave in an earlier comment: please read the thread at http://stats.stackexchange.com/questions/2641. – whuber Jun 25 '14 at 18:35
  • @whuber Thanks whuber, let me study that link and digest. – Creatron Jun 25 '14 at 19:55