31

What is the difference in meaning between the notation $P(z;d,w)$ and $P(z|d,w)$ which are commonly used in many books and papers?

Alexis
  • 26,219
  • 5
  • 78
  • 131
Learner
  • 4,007
  • 11
  • 37
  • 39
  • 13
    f(x;θ) is the same as f(x|θ), simply meaning that θ is a fixed parameter and the function f is a function of x. f(x,Θ), OTOH, is an element of a family (set) of functions, where the elements are indexed by Θ. A subtle distinction, perhaps, but an important one, esp. when it comes time to estimate an unknown parameter θ on the basis of known data x; at that time, θ varies and x is fixed, resulting in the "likelihood function". Usage of "|" is more common among statisticians, ";" among mathematicians. – jbowman Jun 20 '12 at 19:20
  • Yes jbowman is correct. We sometimes call it the density of X given Θ. – Michael R. Chernick Jun 20 '12 at 19:51
  • @jbowman why not post that as an answer? My only question is - why would they use both, but I assume that it has something to do with the context (the "|" is used with "P" and the ";" with "$f$"). – Abe Jun 21 '12 at 15:11
  • Good thinking, Abe; that's probably it. $f$ is more generic, I suppose. – jbowman Jun 21 '12 at 15:13

4 Answers4

27

$f(x;\theta)$ is the density of the random variable $X$ at the point $x$, with $\theta$ being the parameter of the distribution. $f(x,\theta)$ is the joint density of $X$ and $\Theta$ at the point $(x,\theta)$ and only makes sense if $\Theta$ is a random variable. $f(x|\theta)$ is the conditional distribution of $X$ given $\Theta$, and again, only makes sense if $\Theta$ is a random variable. This will become much clearer when you get further into the book and look at Bayesian analysis.

COOLSerdash
  • 25,317
  • 8
  • 73
  • 123
PeterR
  • 1,712
  • 1
  • 16
  • 13
  • Uhhhh... $f(x|\theta)$ is the conditional distribution of $x$ given $\theta$ makes perfect sense even if $\theta$ is not a random variable. It's pretty much standard notation in classical statistics, where $\theta$ is not a random variable. – jbowman Jun 21 '12 at 17:10
  • Uhhhh....if you interpret that to mean that P[Θ=θ]=1 (left Θ is a random variable, right θ is a constant) then I agree. Otherwise I do not...for what then would P[Θ=θ] mean in the denominator of the definition of conditional distribution? – PeterR Jun 21 '12 at 17:30
  • Denominator? I can write $x \sim f(x | \mu, \sigma)$ where $f$ is a Normal distribution without reference to Bayes' Rule. $\mu$ and $\sigma$ are fixed. Others do too, for example, http://www.ll.mit.edu/mission/communications/ist/publications/0802_Reynolds_Biometrics-GMM.pdf. – jbowman Jun 21 '12 at 18:28
  • jbowman, so what is the definition of your f(x|μ,σ) as a conditional density when μ and σ are fixed numbers (i.e. not random variables)? – PeterR Jun 21 '12 at 18:41
  • $x$ is distributed according to, e.g., a Normal law with mean $\mu$ and standard deviation $\sigma$. When, for example, $\mu=0, \sigma=1$, $f(0) = 0.3989...$. When $\mu=1, \sigma=1$, $f(0) = 0.2419...$. The value of $f(x)$ is conditional upon the values of $\mu$ and $\sigma$. I think, BTW, we are using the word "conditional" in two slightly different ways; you are limiting it to "conditional upon some random event occurring", and I am using it to mean that or just "given", as in "$f(x)$ given (specific values of) $\mu$ and $\sigma$". – jbowman Jun 21 '12 at 19:24
  • 2
    The word "conditional", associated with the notation f(X|Y), is defined to be "conditional upon some random event occurring". If you are using it to mean something else, such as just "given", as in "f(x) given (specific values of) μ and σ", well then that is what the notation f(x;μ,σ) is for. Since the OP was asking about what the notation means, we should be precise about the notation in the answer. – PeterR Jun 21 '12 at 19:49
  • Perhaps you could explain the notation in the paper I linked to? And why they didn't bother to define it, if it's really nonstandard? I have many, many others if you're interested! Also some books, e.g., Rao's Linear Statistical Inference and its Applications. Maybe you could provide a link to your source... which of course may have been defining notation for itself, not for the profession in general. – jbowman Jun 22 '12 at 00:15
  • Conditional pdf - page 67 – PeterR Jun 22 '12 at 12:13
  • Oops - hit return to fast: I dont have Hogg & Tanis (the OP's book) but I do have Hogg & Craig, Introduction to Mathematical Statistics, 4th edition: Conditional pdf f(x|y)- page 67; f(x;θ) pg 201. When you see f(x|y) it refers to conditional probability, which is defined by being with respect to a random variable. If you have something that is not a random variable in the second slot, then either you are explicitly saying it is a random variable that has a specific value with probability one (which is perfectly ok), or you are implicitly saying the same thing. – PeterR Jun 22 '12 at 12:28
18

$f(x;\theta)$ is the same as $f(x|\theta)$, simply meaning that $\theta$ is a fixed parameter and the function $f$ is a function of $x$. $f(x,\Theta)$, OTOH, is an element of a family (or set) of functions, where the elements are indexed by $\Theta$. A subtle distinction, perhaps, but an important one, esp. when it comes time to estimate an unknown parameter $\theta$ on the basis of known data $x$; at that time, $\theta$ varies and $x$ is fixed, resulting in the "likelihood function". Usage of $\mid$ is more common among statisticians, while $;$ among mathematicians.

jbowman
  • 31,550
  • 8
  • 54
  • 107
  • 1
    How is $f(x;θ)$ spoken verbally? Do you say " f of x given θ"? – stackoverflowuser2010 Oct 23 '14 at 22:09
  • @stackoverflowuser2010 - yes, exactly so. – jbowman Oct 27 '14 at 20:47
  • 2
    I found in some Coursera videos that Stanford professor Andrew Ng verbalizes the semicolon as "parameterized by." See: https://class.coursera.org/ml-005/lecture/34 . So the example would be spoken as "f of x parameterized by theta". – stackoverflowuser2010 Nov 10 '14 at 02:50
  • 5
    Saying "given" or "conditional" is very different (in general) from "parameterized." I'd hate if someone saw this and thought the two were equivalent. Saying "parameterized" is only appropriate when the quantity being conditioned on is a parameter indexing the pdf of the variable in the first term. For two variables (e.g., f(x;y)), using that term would be wrong. – ATJ Jun 17 '16 at 17:07
  • @ATJ Can you elaborate what you mean in that comment? I am interpreting it as such: $f\left(x;y\right)$ is "f of x parameterized by y", which is a *function* **defined in terms of y**. E.g., $f\left(x;y\right)$ could be $x^{y-1}$. On the other hand, $f\left(x\mid y\right)$ is "f of x given y", which does depend upon y in some fashion, but is not a parameter, per se. E.g., $f\left(x\mid y\right)$ could be simply $x$ (as long as $y$="red", 12, "female", or set to whichever of the random options $y$ is allowed). Am I on the right track? – Mike Williamson Aug 18 '18 at 00:02
  • Also, @ATJ if I am understanding you correctly, then this answer by jbowman is wrong. Yeah? – Mike Williamson Aug 18 '18 at 00:04
  • @MikeWilliamson - My answer is straight from George Shanthikumar, one of the greats. Furthermore, in Bayesian statistics, it is quite usual to treat model parameters as random variables, e.g., $f(x;\mu, \sigma^2)$ and $\mu \sim \text{N}(\theta, \delta^2)$; you can write the joint distribution of $x, \mu$ or the conditional distribution of $x|\mu$ along with the distribution of $\mu$, so the distinction between "conditional upon" and "parameterized by" in this case is nonexistent. Just because you put a distribution on a parameter doesn't mean it's not a parameter! – jbowman Aug 18 '18 at 14:46
  • @jbowman Thanks for the reply! But, is any of what I'm saying wrong? Specifically, I'm asking/saying, the semicolon in that case **specifically means** "parameterized by". Whereas the $\mid$ **specifically means** "given (that)". Yes, you're right that "just because you put a distribution on a parameter doesn't mean that it's not a parameter". However, $x\mid\mu$ means that $\mu$ **must be** a random variable with some distribution, say $M$, that we are fixing at some value. This is not true of $x;\mu$, where $\mu$ needn't be a random variable with a distribution. – Mike Williamson Aug 18 '18 at 18:09
  • @MikeWilliamson - no, $x | \mu$ doesn't mean $\mu$ is a random variable, it's, as noted above, just a notation that stats people tend to use whereas math people tend to use $;$. I am looking in Bickel and Doksum and observe that they use "$,$", as in, "Let $\{P_{\theta}\}$ be a one-parameter exponential family... such that the density functions $p(x,\theta)$ of the $P_{\theta}$ may be written...". Lehmann and Casella, OTOH, use $|$, as in, "Frequently it is more convenient to use the $\eta_i$ as parameters and write the density... $p(x|\eta) = \dots$". There is no one universal notation. – jbowman Aug 18 '18 at 18:19
  • Hmmm... if that's right, then that is sad! These subtleties (conditional distrib. versus marginal distrib. versus joint distrib.) are what make the otherwise-not-overly-onerous math of statistics challenging. To think that well-published statisticians are not remaining consistent amongst the handful of each other is a shame. Regardless, I've seen "comma" to most often mean *joint distribution*, $\mid$ to most often mean *conditional distribution*, and "semicolon" to most often mean *parameterized by*, so I'm gonna stick with that notation. – Mike Williamson Aug 18 '18 at 18:33
  • 2
    @MikeWilliamson - Sure, pick a notation where you know what everything means and stick with it! That way when you go back to something you did earlier, like 4 hours earlier in my experience, you don't have to figure out what you meant when you used that "|". I agree, it is annoying, but after a while you just observe the first use of the notation and remember it for the rest of the paper / book; the distinctions are not usually what's important, anyway. – jbowman Aug 18 '18 at 18:42
15

I believe the origin of this is the likelihood paradigm (though I have not checked the actual historical correctness of the below, it is a reasonable way of understanding how it came to be).

Let's say in a regression setting, you would have a distribution:

$$ p(Y | x, \beta) $$

Which means: the distribution of $Y$ if you know (conditional on) the $x$ and $\beta$ values.

If you want to estimate the betas, you want to maximize the likelihood:

$$ L(\beta; y,x) = p(Y | x, \beta) $$

Essentially, you are now looking at the expression $p(Y | x, \beta)$ as a function of the beta's, but apart from that, there is no difference (for mathematical correct expressions that you can properly derive, this is a necessity --- although in practice no one bothers).

Then, in bayesian settings, the difference between parameters and other variables soon fades, so one started to you use both notations intermixedly.

So, in essence: there is no actual difference: they both indicate the conditional distribution of the thing on the left, conditional on the thing(s) on the right.

Jake Tae
  • 123
  • 4
Nick Sabbe
  • 12,119
  • 2
  • 35
  • 43
11

Although it hasn't always been this way, these days $P(z; d, w)$ is generally used when $d,w$ are not random variables (which isn't to say that they're known, necessarily). $P(z | d, w)$ indicates conditioning on values of $d,w$. Conditioning is an operation on random variables and as such using this notation when $d, w$ aren't random variables is confusing (and tragically common).

As @Nick Sabbe points out $p(y|X, \Theta)$ is a common notation for the sampling distribution of observed data $y$. Some frequentists will use this notation but insist that $\Theta$ isn't a random variable, which is an abuse IMO. But they have no monopoly there; I've seen Bayesians do it too, tacking fixed hyperparameters on at the end of the conditionals.

JMS
  • 4,660
  • 1
  • 22
  • 32
  • 2
    Re your 2nd paragraph, it's worth pointing out that in typical statistical situations (say, fitting a regression model), $X$ isn't considered a random variable either, but a set of known constants. – gung - Reinstate Monica Aug 25 '13 at 13:49