13

I was thinking about the meaning of location-scale family. My understanding is that for every $X$ member of a location scale family with parameters $a$ location and $b$ scale, then the distribution of $Z =(X-a)/b$ does not depend of any parameters and it's the same for every $X$ belonging to that family.

So my question is could you provide an example where two random from the same distribution family are standardized but that does not results in a Random Variable with the same distribution?

Say $X$ and $Y$ come from the same distribution family (where with family I mean for example both Normal or both Gamma and so on ..). Define:

$Z_1 = \dfrac{X-\mu}{\sigma}$

$Z_2 = \dfrac{Y-\mu}{\sigma}$

we know that both $Z_1$ and $Z_2$ have the same expectation and variance, $\mu_Z =0, \sigma^2_Z =1$.

But can they have different higher moments?

My attempt to answer this question is that if the distribution of $X$ and $Y$ depends on more than 2 parameters than it could be. And I am thinking about the generalized $t-student$ that has 3 parameters.

But if the number of parameters is $\le2$ and $X$ and $Y$ come from the same distribution family with the same expectation and variance, then does it mean that $Z_1$ and $Z_2$ has the same distribution (higher moments)?

kjetil b halvorsen
  • 63,378
  • 26
  • 142
  • 467
gioxc88
  • 1,010
  • 7
  • 20
  • 4
    Yes, they can. But, you would need at least 3 parameters in a generalized distribution. – Carl Dec 23 '17 at 15:24
  • 5
    @Carl One parameter will suffice. – whuber Dec 23 '17 at 17:46
  • @whuber The OP is asking about the same distribution, not different ones. – Carl Dec 23 '17 at 18:03
  • If $Z$ is a standard Cauchy random variable with density $\frac{1}{\pi(1+x^2)}, -\infty < x < \infty$, then $bZ+a = X$ is a scaled and displaced Cauchy random variable. The family $$\mathcal C(a,b) = \{X\colon X=a+bZ, Z~\text{standard Cauchy}, b\neq 0\}$$ is a location-scale family in the sense that you have defined it (the distribution of $\frac{X-a}{b}$ does not depend on $a$ or $b$ and is the same for all $a$ and $b$), but none of the random variables enjoys a mean or a variance. – Dilip Sarwate Dec 23 '17 at 18:09
  • 1
    @DilipSarwate you are right, I did not specify it but mean both $\mu$ and $\sigma$ finite – gioxc88 Dec 23 '17 at 18:16
  • 1
    If you require two or fewer parameters, and are specifying a location-scale family, the two parameters will of necessity be the location and the scale. Since you are requiring the same location and scale between the two distributions, that means the parameter values will be the same; if the distributions also have the same functional form, it must be that they are identical, since they have the same functional form and parameter values. Since they are identical, all the higher moments will be the same as well. – jbowman Dec 23 '17 at 19:25
  • 5
    @Carl It's unclear what you mean by "same distribution." Literally, that would refer to a unique distribution, with one law and therefore a unique expectation, unique variance, and unique moments (to the extent they are defined). If you mean "same distribution *family*," then your remark is meaningless, because the family is whatever you define it to be. – whuber Dec 23 '17 at 19:29
  • @whuber To clarify, I mean distributions of the same form, before their parameters take values. For example, a normal distribution would have the form $\mathcal{N}(\mu,\sigma^2)$. – Carl Dec 23 '17 at 21:09
  • 1
    The answer depends on the interpretation of "... the distribution of $Z=(X−a)/b$ does not depend of any parameters and it's the same for every $X$ belonging to that family." whether the "it's" means "for _all_ $X$ in the family, the distribution of $(X-a)/b$ does not depend on $a$ and $b$" **or** "for all $X$ in the family, $(X-a)/b$ has the same distribution." yyzz chose the latter interpretation but everyone else, including Moderator @whuber, prefers the former. Both interpretations have easy, but diametrically opposed, answers. It's unfair to call yyzz's answer incorrect; it is NOT. – Dilip Sarwate Dec 24 '17 at 21:56
  • 3
    @HardCore Since it seems you feel your question has been answered, please see [What should I do when someone answers my question?](https://stats.stackexchange.com/help/someone-answers) – Glen_b Dec 24 '17 at 23:17
  • @DilipSarwate Agreed. Moreover, I feel that there is a more natural usage herein than "distribution family" and that is "distributions of the same form". For example, there are multiple distribution forms that are in the exponential family, so it can be too confusing to be natural. My answer relates to that, and is also not wrong despite downvotes. I have a problem with downvotes; they sometimes tend toward uninspired narrowness. – Carl Dec 25 '17 at 15:38
  • 2
    @Carl I did upvote your answer too. The OP's usage seems to support the notion of $Z=(X-a)/b$ as having the same standard distribution for all choices of $X$ in the family. Let's see which answer the OP accepts (if the OP ever reads Glen_b's comment and acts on it). – Dilip Sarwate Dec 25 '17 at 16:17
  • @DilipSarwate I very much appreciate your most broadminded contribution, would that it be shared. – Carl Dec 25 '17 at 16:57
  • @HardCore $Z_1$ and $Z_2$ will not be the same when $X$ and $Y$ are both Gamma (but with different shape parameter $k$). – Sextus Empiricus Aug 29 '18 at 08:52
  • @MartijnWeterings Indeed, $\mu$ is generally a location parameter, and gamma distributions, without generalizing them, do not have location parameters as well as, without generalization. being defined only on $[0,\infty)$. As per my answer below, the question, without generalization, only pertains to symmetric distributions. – Carl Aug 30 '18 at 00:03
  • 1
    @Carl, I do not get what your point is about the symmetric distributions (does that make it different?). I was pointing out to HC that already one of his examples (both normal, **both gamma** and so on) is a 2 parameter case the question was looking for. In your answer I neither see your point. To me it is difficult to read (e.g. it starts with a page long comment about downvoters) and I see not what your conclusion is about the distribution. For fixed $\mu$ we have just another example 2 parameter family: $$\dfrac{\beta}{2\alpha\Gamma\Big(1/\beta\Big)} \; e^{-\Big(|x|/\alpha\Big)^\beta}$$ – Sextus Empiricus Aug 30 '18 at 07:28
  • @MartijnWeterings Your post shows a shape and scale parameter distribution without a location parameter. If you want to convert a generalization of the gamma distribution to be a generalized, symmetric normal distribution with shape, scale and location parameters you would do it as [follows](https://stats.stackexchange.com/a/331966/99274). How does that relate to the question, which is about location and scale parameter distributions? The gamma distribution without generalization does not fit in with the rest of the question as it is a scale and shape parameter containing distribution. – Carl Aug 30 '18 at 19:00
  • 1
    @Carl, the gamma distribution is explicitly mentioned in the question *" (where with family I mean .. both Gamma and so on ..)"*. That is why I mention it. But certainly there are other flavors of this question and the family of generalized normal distributions with a specific fixed mean does the job if you think about symmetric distributions (the question is ambiguous about the fact whether it is about location-scale families ). Personally, I find the question about two parameter location-scale families the most interesting because that flavor of the question is not at all trivial. – Sextus Empiricus Aug 30 '18 at 19:12

5 Answers5

17

If you want an example which is an "officially named parameterized distribution family, you can look into the generalized gamma distribution, https://en.wikipedia.org/wiki/Generalized_gamma_distribution. This distribution family has three parameters, so you can fix mean and variance and still have freedom to vary higher moments. From the wiki page, the algebra do not look inviting, I would rather to do it numerically. For statistical applications, search this site for gamlss, which is an extension of gam (generalized additive models, in itself a generalization of glm's) which have parameters for "location, scale and shape".

Another example is the $t$-distributions, extended to be a location-scale family. Then the third parameter will be the degrees of freedom, which will wary the shape for a fixed location and scale.

kjetil b halvorsen
  • 63,378
  • 26
  • 142
  • 467
  • 1
    Although the [generalized error distribution](http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.542.879&rep=rep1&type=pdf) may have been a better choice. – Carl Dec 23 '17 at 21:55
  • 2
    Thank you very much for your answer!! I choose Carl's one because it was more detailed but this was fine too .. thank you very much !!! – gioxc88 Dec 26 '17 at 00:37
14

There is an infinite number of distributions with mean zero and variance one, hence take $\epsilon_1$ distributed from one of these distributions, say the $\mathcal{N}(0,1)$, and $\epsilon_2$ from another of these distributions, say the Student's $t$ with 54 degrees of freedom rescaled by $\sqrt\frac{1}{3}$ so that its variance is one, then $$X=\mu+\sigma\epsilon_1\qquad\text{and}\qquad Y=\mu+\sigma\epsilon_2$$ enjoy the properties you mention. The "number" of parameters is irrelevant to the property.

Obviously, if you set further rules to the definition of this family, like stating for instance that there exists a fixed density $f$ such that the density of $X$ is $$\frac{1}{\sigma^d} f(\{x-\mu\}/\sigma)$$ you may end up with a single possible distribution.

Xi'an
  • 90,397
  • 9
  • 157
  • 575
  • thank you for the answer but I think that this is not what I asked – gioxc88 Dec 23 '17 at 17:46
  • 6
    I think it does because if the family of distributions is defined by the reunion of both the distributions of the $X$'s and the $Y$'s, then you have a contradiction to the property. A "family" of distributions is quite a vague notion. – Xi'an Dec 23 '17 at 17:48
  • yes in fact is quite vague but if you read my question I wrote that in this context with family I mean for example both Normal or both Gamma and so on .. You made an example with one normal and one t student – gioxc88 Dec 23 '17 at 18:13
  • 4
    Hard Core, you seem to confuse the *name* of a family with its *concept*. This answer is a fine one and nicely illustrates the concept. Your question doesn't ask that the solution be a location-scale family. If you need it to be one, you can always take this answer--or any other answer--and prolong it to a location-scale family by allowing arbitrary translations and rescalings. Xi'an's point about the number of parameters still holds. – whuber Dec 23 '17 at 19:32
  • @whuber I think it is confused as an answer. Student's-t by itself would be a better answer, rather than use the extreme answer of $df=3,\infty$ and not specify it. Indeed, it is $df$ which is the third parameter. – Carl Dec 23 '17 at 21:59
  • Why are you using a normal distribution when that is Student's-t with $df=\infty$. Then you are comparing that with $df=54$, which is almost indistinguishable from normal. Not especially good example. – Carl Dec 23 '17 at 22:17
  • @whuber I disagree because a normal distribution is Student's-t with $df=\infty$, or in more formal language a normal distribution is a limiting case of a Student's-t. Alternatively, Student's-t is a generalization of a normal distribution. So, either we are comparing the same Student's-t with the shape parameter, i.e., $df$, having different values, or we are comparing two different distributions. In either case, we are not satisfying the OP as this is not an answer to the question he asked. – Carl Dec 25 '17 at 01:24
8

There is apparently some confusion as to what a family of distributions is and how to count free parameters versus free plus fixed (assigned) parameters. Those questions are an aside that is unrelated to the intent of the OP, and of this answer. I do not use the word family herein because it is confusing. For example, a family according to one source is the result of varying the shape parameter. @whuber states that A "parameterization" of a family is a continuous map from a subset of ℝ$^n$, with its usual topology, into the space of distributions, whose image is that family. I will use the word form which covers both the intended usage of the word family and parameter identification and counting. For example the formula $x^2-2x+4$ has the form of a quadratic formula, i.e., $a_2x^2+a_1x+a_0$ and if $a_1=0$ the formula is still of quadratic form. However, when $a_2=0$ the formula is linear and the form is no longer complete enough to contain a quadratic shape term. Those who wish to use the word family in a proper statistical context are encouraged to contribute to that separate question.

Let us answer the question "Can they have different higher moments?". There are many such examples. We note in passing that the question appears to be about symmetric PDFs, which are the ones that tend to have location and scale in the simple bi-parameter case. The logic: Suppose there are two density functions with different shapes having two identical (location, scale) parameters. Then there is either a shape parameter that adjusts shape, or, the density functions have no common shape parameter and are thus density functions of no common form.

Here, is an example of how the shape parameter figures into it. The generalized error density function and here, is an answer that appears to have a freely selectable kurtosis.

enter image description here

By Skbkekas - Own work, CC BY-SA 3.0, https://commons.wikimedia.org/w/index.php?curid=6057753

The PDF (A.K.A. "probability" density function, note that the word "probability" is superfluous) is $$\dfrac{\beta}{2\alpha\Gamma\Big(\dfrac{1}{\beta}\Big)} \; e^{-\Big(\dfrac{|x-\mu|}{\alpha}\Big)^\beta}$$

The mean and location is $\mu$, the scale is $\alpha$, and $\beta$ is the shape. Note that it is easier to present symmetric PDFs, because those PDFs often have location and scale as the simplest two parameter cases whereas asymmetric PDFs, like the gamma PDF, tend to have shape and scale as their simplest case parameters. Continuing with the error density function, the variance is $\dfrac{\alpha^2\Gamma\Big(\dfrac{3}{\beta}\Big)}{\Gamma\Big(\dfrac{1}{\beta}\Big)}$, the skewness is $0$, and the kurtosis is $\dfrac{\Gamma\Big(\dfrac{5}{\beta}\Big)\Gamma\Big(\dfrac{1}{\beta}\Big)}{\Gamma\Big(\dfrac{3}{\beta}\Big)^2}-3$. Thus, if we set the variance to be 1, then we assign the value of $\alpha$ from $\alpha ^2=\dfrac{\Gamma \left(\dfrac{1}{\beta }\right)}{\Gamma \left(\dfrac{3}{\beta }\right)}$ while varying $\beta>0$, so that the kurtosis is selectable in the range from $-0.601114$ to $\infty$.

That is, if we want to vary higher order moments, and if we want to maintain a mean of zero and a variance of 1, we need to vary the shape. This implies three parameters, which in general are 1) the mean or otherwise the appropriate measure of location, 2) the scale to adjust the variance or other measure of variability, and 3) the shape. IT TAKES at least THREE PARAMETERS TO DO IT.

Note that if we make the substitutions $\beta=2$, $\alpha=\sqrt{2}\sigma$ in the PDF above, we obtain $$\frac{e^{-\frac{(x-\mu )^2}{2 \sigma ^2}}}{\sqrt{2 \pi } \sigma }\;,$$

which is a normal distribution's density function. Thus, the generalized error density function is a generalization of the normal distribution's density function. There are many ways to generalize a normal distribution's density function. Another example, but with the normal distribution's density function only as a limiting value, and not with mid-range substitution values like the generalized error density function, is the Student's$-t$ 's density function. Using the Student's$-t$ density function, we would have a rather more restricted selection of kurtosis, and $\textit{df}\geq2$ is the shape parameter because the second moment does not exist for $\textit{df}<2$. Moreover, df is not actually limited to positive integer values, it is in general real $\geq1$. The Student's$-t$ only becomes normal in the limit as $\textit{df}\rightarrow\infty$, which is why I did not choose it as an example. It is neither a good example nor is it a counter example, and in this I disagree with @Xi'an and @whuber.

Let me explain this further. One can choose two of many arbitrary density functions of two parameters to have, as an example, a mean of zero and a variance of one. However, they will not all be of the same form. The question however, relates to density functions of the SAME form, not different forms. The claim has been made that which density functions have the same form is an arbitrary assignment as this is a matter of definition, and in that my opinion differs. I do not agree that this is arbitrary because one can either make a substitution to convert one density function to be another, or one cannot. In the first case, the density functions are similar, and if by substitution we can show that the density functions are not equivalent, then those density functions are of different form.

Thus, using the example of the Student's$-t$ PDF, the choices are to either consider it to be a generalization of a normal PDF, in which case a normal PDF has a permissible form for a Student's$-t$'s PDF, or not, in which case the Student's$-t$ 's PDF is of a different form from the normal PDF and thus is irrelevant to the question posed.

We can argue this many ways. My opinion is that a normal PDF is a sub-selected form of a Student's$-t$ 's PDF, but that a normal PDF is not a sub-selection of a gamma PDF even though a limiting value of a gamma PDF can be shown to be a normal PDF, and, my reason for this is that in the normal/Student'$-t$ case, the support is the same, but in the normal/gamma case the support is infinite versus semi-infinite, which is the required incompatibility.

Carl
  • 11,532
  • 7
  • 45
  • 102
  • 1
    ok thank you very much for looking into this. So your answer confirms my initial guess that it depends on the number of parameters!!! thank you – gioxc88 Dec 24 '17 at 02:48
  • 6
    (-1) As has been stated in other comments, the issue is "what does a distribution family mean?". I can easily define a new "family" of distributions that are simply rescaled t-distributions to have mean = 0, sd = 1, with a single parameter: df. Then the 1st and 2nd moments are equal for all df, but for different values of df, they have different higher moments. – Cliff AB Dec 25 '17 at 04:26
  • 1
    I believe your objection is that I just had to make up a new "family" of distributions for this example. But that's just a side effect of people trying to parameterize distributions using useful notation: most of the time, we care first about 1st moments, then about 2nd moments, etc. That's why most of the classic families require 3 parameters before we can fix the 1st and 2nd moment while having different higher moments. But that's a matter of coming up with convenient notation rather than a fundamental property of families of distributions. – Cliff AB Dec 25 '17 at 04:32
  • 2
    @CliffAB Just because you assign a value to a parameter does not mean that I will stop counting it as a parameter. You stopped counting it, I did not. I grant you that in the way you are counting, free parameters only, that you are correct. However, I strongly suggest that a distribution has the form of that distribution no matter what parameter values are assigned. You may choose to misunderstand what I am saying, that does not change what I have said. – Carl Dec 25 '17 at 13:21
  • 2
    @CliffAB For example, your definition of counting free parameters, only, would suggest that a standard normal distribution is not of the same form as an ordinary normal distribution. My method of counting parameters would suggest that a standard normal distribution has the form of a normal distribution. Which definition is more correct is not actually in question, I choose the latter and my exposé should be understood in that context. – Carl Dec 25 '17 at 13:35
  • @Carl: by that reasoning, the exponential distribution is (at least) a three parameter distribution, as it is a special case of the generalized gamma distribution. – Cliff AB Dec 25 '17 at 17:46
  • @CliffAB By that reasoning, it would not be possible to generalize an exponential distribution. In fact, two, three, or four parameter gamma distributions are a subset of possible generalizations of an exponential distribution. There is no contradiction here. All information is context dependent. – Carl Dec 25 '17 at 23:17
  • 1
    Listen guys ... FAMILY of distribution does not mean anything ... FAMILY has no precise meaning in statistics, hence it is pointless that you keep saying: "i can define a family in this o that way ..." ... Because I was very well aware that the word FAMILY could have been misinterpreted I specified in my question that "with family I mean for example both Normal or both Gamma and so on .." So Carl is the only one who gave an answer to my question ...The other ones answered the obvious question(not mine though) "does same expectation and variance mean same distribution?" .. that's what i think – gioxc88 Dec 26 '17 at 00:28
  • Thank-you very much. I would not have answered myself if I thought someone else was actually paying attention to what you asked. The excess downvotes I got were, I agree, because of confusing the meaning of the question with how it needed to be answered. – Carl Dec 26 '17 at 01:36
  • 5
    Hard Core, that comment is difficult to fathom, given that your title itself contains the word "family"! Moreover, if you deny that a family is meaningful, then the question makes no sense. Please clarify by editing your question to reflect your intentions. – whuber Dec 26 '17 at 19:29
  • @whuber Obviously, there are some language difficulties here. They do not seem to bother me, or HardCore. I know what he means, he knows that I know, but lots of other people apparently do not. Can you suggest language that explains the question and does not incite? I used the expression "of the same form" which is more math-like. Got any suggestions, please? I do not like downvotes that I cannot address that do not appear to go to substance. – Carl Dec 26 '17 at 19:39
  • 5
    -1 because you start by saying "The answer is NO." and then proceed to give an example that effectively answers Yes (another example is given in kjetilbhalvorsen's answer that you favourably mention). This does not make sense to me. I think the math here is clear to all of us, so my downvote is only for the lack of consistency in presentation. – amoeba Dec 26 '17 at 19:51
  • 3
    Carl, there is a stark inconsistency between the question and Hard Core's comments. The question is explicit: to "provide an example where two random [variables] from the same distribution family are standardized but that does not result in ... Random Variable[s] with the same distribution." Obviously some meaning of "family" is intended. The usual meaning is clear, despite there being various technical variants around, and the (easily demonstrated) correct answer is "yes, there are many such examples." – whuber Dec 26 '17 at 19:52
  • @amoeba I had another question in mind. However, I changed it to be the question you have in mind. You are not a mind reader, I suppose. – Carl Dec 26 '17 at 19:56
  • @whuber see above. Also, I changed my answer to what you suggested. Thanks for the suggestion. – Carl Dec 26 '17 at 19:57
  • 4
    Thank you. Clearly you have a good conception of what you're writing about, but unfortunately your post propagates quite a bit of confusion about what the meanings of "distribution," "shape," "form," and "parameter" might be. As one example of the subtleties, consider a family of distributions created by any distribution law $F$ that has nonzero third central moment. The family is indexed by two real numbers $(\mu,\sigma\ne 0)$ and consists of all laws $x\to F(\sigma x+\mu)$. It is a location-scale family, but the shapes of these laws differ depending on the sign of $\sigma$. – whuber Dec 26 '17 at 20:05
  • 1
    I see. I have removed my downvote. I suspect that other downvotes were also provoked by the same sentence. – amoeba Dec 26 '17 at 20:06
  • @whuber As it would be for most asymmetric distribution I should think. However, a shape parameter, as is present for example in a gamma distribution, would not be present in a lognormal distribution. The two parameters μ and σ are not location and scale parameters for a lognormally distributed random variable X, but they are respectively location and scale parameters for the normally distributed logarithm ln(X). So yes, it gets confusing. However, is this confusion mine, or does it come with the territory? – Carl Dec 26 '17 at 20:18
  • 2
    I think there are sufficiently clear definitions available that confusion is not inevitable. A little Googling suggests there are several variant meanings of many of these terms. As an example, some people limit "location-scale" families to those with *only* a location and a scale parameter, whereas others (at least implicitly) allow for other parameters to be present too. I don't think that reflects any deep-seated confusion: most likely it is determined by the applications people have in mind. Similarly, differences arise constituting just what a valid "family" of distributions is. – whuber Dec 26 '17 at 21:14
  • @whuber thanks for your comment ... I deny that the word family is meaningful because it is not .. And because of that I specify what I mean by saying family ..Honestly I don't know how to refer to a set of random variables e.g. all lognormal but with different parameters. Hence I used the expression "same distribution family" specifying what I mean. If you know a more formal way to state my answer I will gladly modify it. I did not mean to argue I just wanted to discuss it with all of you guys. Thank you. – gioxc88 Dec 27 '17 at 07:06
  • @whuber When I use the terms location, scale and shape, I use them as per the linked references. The answer I have given is either flawed or it isn't. If it is flawed, then I would solicit help in rectifying it, as Amoeba did, thank you, Amoeba. In that it is not, it would require, IMHO, a much more lengthy answer than I have already given, and that would be off topic. Consider, for example, MVUE location, and how complicated that is. Scale is even worse, as it is generally not one-one to variance, it is however, a metric. Shape is the worst, defining shape is very tricky.... – Carl Dec 27 '17 at 15:36
  • @whuber When we use the phrase "distribution family" the word "family" I take to be redundant. For example, when we say gamma distribution, we are generally referring to an $f(x|\alpha,\beta)$ and not a form with substituted parameters like $f(x|\sqrt{2},\pi)$. I use the concept of form of a formula to specify an $f(x|\dots)$ because it is neither redundant nor ambiguous. Need your help, here. Tell me a bit more explicitly what the problem is, please. – Carl Dec 27 '17 at 15:47
  • 2
    There is a pervasive abuse of terminology. A *distribution* is the probability law of a random variable, say $X$. The *distribution function* of $X$ is given by $F_X(x)=\Pr(X\le x)$ for all $x\in\mathbb{R}$. A *statistical model* for $X$ posits that $F_X$ is some element of a specified mathematical set of distributions: this is the broadest sense in which "family" of distributions is understood. Many statistical authors (ab)use the word "distribution" to refer to such a family, such as "the [*sic*] normal distribution." Parameters are subtler. Technically, there is a natural topology on ... – whuber Dec 27 '17 at 17:53
  • 2
    ... the set of all distributions. A "parameterization" of a family is a continuous map from a subset of $\mathbb{R}^n$, with its usual topology, into the space of distributions, whose image is that family. One and the same family can have different parameterizations. Locally, a parameterization should be one-to-one (but it might not be so globally). The upshot is that "family" is not a redundancy and one needs to be clear about the meaning and use of a parameterization. – whuber Dec 27 '17 at 17:56
  • @whuber I do not fully understand 'A "parameterization" of a family is a continuous map from a subset of $\mathbb{R}^n$, with its usual topology, into the space of distributions, whose image is that family. ' so I cannot use it. See text. – Carl Dec 27 '17 at 23:47
  • 1
    I think you're missing the point of @X'ian's answer, which has nothing to do with convergence of Student's t-distributions to normal distributions with increasing degrees of freedom; but rests on the basis that you can define a family to include any two distributions you like. A parametric family to boot: a mixture of the two parametrized by weight does the job. – Scortchi - Reinstate Monica Sep 02 '18 at 09:08
  • @Scortchi OP used the word family to imply a restricted context better relating to form than more complicated functional substitutions. X'ian's answer is correct for families having different forms. People tend to either ferret out intended meaning or literal context. I.m.h.o. the former make for better tutors and reviewers, and the latter for better teachers and coauthors. – Carl Sep 02 '18 at 16:04
  • (1) Nevertheless, the discussion of convergence appears to be countering a claim that no-one's made. (2) How do you decide whether two density functions are of the same form? Why hasn't $f_X(x) =\frac{1}{\sqrt{2\pi}}\exp\left(\frac{-x^2}{2}\right)$ the same form as $f_Y(y) =\frac{\Gamma\left(\frac{55}{2}\right)}{\sqrt{\frac{54\pi}{3}}\cdot\Gamma(27)}\left(1+\frac{3y^2}{54}\right)^\frac{-55}{2}$? Has $f_X(x) = 1$ for $0\leq x \leq 1$ the same form as $f_Y(y) = \frac{1}{\pi\sqrt{y(1-y)}}$ for $0\leq y \leq 1$? You've introduced the term "form" but not defined it. – Scortchi - Reinstate Monica Sep 02 '18 at 21:48
  • @Scortchi When you "transform" you change forms. Incorrect usage would be "$4x+2\sqrt{x}+1$ has the form of a quadratic $4z^2+2z+1$," and correct usage would be "We can write $4x+2\sqrt{x}+1$ in quadratic form by making the substitution (read as transformative) $x=z^2$. – Carl Sep 03 '18 at 00:12
  • I can't see how that leads to a definition of "form", or how you're applying the concept to density functions. Could you perhaps illustrate it with the examples I provided? – Scortchi - Reinstate Monica Sep 04 '18 at 14:06
  • @Scortchi In my opinion, the two examples you have mentioned do not have the same form. Consider them as infinite series expansions and observe the complicated lack of agreement between those series. Indeed, there is no transform that that equates them, and even transforms, when available, do not, except in the trivial case or in isolated cases, relate forms, they relate transforms. For example, the substitution $x=z$ is the identity transform, the one that maintains form. The substitution $x=\frac{z-\mu}{\sigma}$ may conserve form, or maybe not, depending. – Carl Sep 05 '18 at 04:36
  • @Scortchi For example, $x=z-1$ as a transform is conservative of quadratic form, e.g., $x^2+x+1\to z^2 -z +1$, that is, although coefficients have changed, the variable powers have not changed. Counter example, $x=z+\frac{\pi}{2}$ would not conserve form for $\sin(x)\to\cos(z)$ because the infinite series expansion of sine is odd and that of cosine is even. – Carl Sep 05 '18 at 04:50
  • Your definition of 'form' is very indirectly relating to location-scale family. You say that a substitution like $x = z + a$ must preserve the form (e.g. a translated quadratic function is still a translated quadratic function, but a translated sin function is not always another sin function). It is a bit arbitrary, and in your answer very unclear, that a 'form' is defined in this way or why a 'form' needs to have this property. – Sextus Empiricus Nov 16 '18 at 12:29
  • You seem to mean that a 'form' is any (single) shape and all of it's scaled and translated variants. In this way it becomes a bit of a tautology that a 'form' is only defined by two parameters since you put it explicitly in the definition. But the question, mentioning for instance 'Gamma', seems to be more ambiguous about this definition of family and it differs from your form. If you substitute $x = z + \pi/2$ in a Gamma distributed variable then you do *not* obtain another Gamma distributed variable. – Sextus Empiricus Nov 16 '18 at 12:32
  • Would the following pdf not be a 'form' according to your definitions? (since the variables distributed according to it are closed under scaling and translation) $$f(x;a,b) = \begin{cases} \vert b \vert e^{-b(x-a)} & \text{if $(b>0$ and $x \geq a)$ or $(b<0$ and $x \leq a)$} \\ 0 & \text{if $(b>0$ and $xa)$} \end{cases}$$ For any $X$ that is distributed according to the above distribution, a shifted and scaled $(X+s_1)/s_2$ would also be distributed according to the above distribution (but with other parameters $a$ and $b$). – Sextus Empiricus Nov 16 '18 at 12:45
  • @MartijnWeterings *One* uses the third person for politeness. A gamma distribution has no location parameter. Its numerical location depends on form of the generalization of the gamma distribution one invents that [does have](https://reference.wolfram.com/language/ref/GammaDistribution.html) a location parameter, e.g., for proportional to $(x-\mu )^{\alpha \gamma -1} e^{-\left(\frac{x-\mu }{\beta }\right)^{\gamma }}$ for $x>\mu$ and otherwise zero, the ordinary gamma distribution would be viewed as having a location particular to that generalization. – Carl Nov 16 '18 at 20:40
  • @MartijnWeterings In other words, the location for a gamma distribution is ambiguous until one of an unknown number of generalizations is chosen the simplification of which is a gamma distribution and whose location is inherited from that particular generalized gamma distribution as a property assigned by the form of that generalization. – Carl Nov 16 '18 at 21:16
  • @MartijnWeterings The same applies to the exponential distribution which has at least two published generalizations with location parameters where each is of different form and would imply different location values (i.e., different constant values of location and not location parameters) for each respective simplification to an exponential distribution, where an exponential distribution has no location parameter *per se*. – Carl Nov 17 '18 at 04:55
  • Sure, the gamma distribution doesn't have a location parameter unless you speak about a generalized gamma distribution with additional location parameter. I don't believe that I said anything against that. So @Carl, how is that a reply to my comments? It doesn't define 'form' further, or explain why you use a vague esoteric term instead of the OP's term family. – Sextus Empiricus Nov 17 '18 at 09:32
  • @MartijnWeterings "form of an equation" as an exact phrase appears on Google 7,220,000 times. To call "form" exotic would require that the word "formula" be super exotic. The examples furnished show how a single form(ula) lacking a location parameter does not imply a unique family of curves of a more complicated form(ula) having a location parameter and that one cannot change location by substitution of a location implying form(ula) into a form(ula) lacking a location parameter with uniqueness, which was posited as trick questions for some indecipherable reason. – Carl Nov 17 '18 at 20:21
  • 1
    @Carl, why are you avoiding to clarify? You point to google for the definition of 'form' and add a sentence that makes no sense. If it is so simple then why not add a definition? It is mathematics, not art, being more precise is not a luxury. – Sextus Empiricus Nov 17 '18 at 20:34
  • @MartijnWeterings "[Forma](http://latindictionary.wikidot.com/noun:forma)" is the Latin root for [form](https://www.merriam-webster.com/dictionary/form). It has a meaning that is rather general and becomes more specific only in specific contexts. – Carl Nov 17 '18 at 20:40
  • Let us [continue this discussion in chat](https://chat.stackexchange.com/rooms/85895/discussion-between-carl-and-martijn-weterings). – Carl Nov 17 '18 at 20:42
6

I think you are asking whether two random variables coming from the same location-scale family can have the same mean and variance, but at least one different higher moment. The answer is no.

Proof: Let $X_1$ and $X_2$ be two such random variables. Since $X_1$ and $X_2$ are in the same location-scale family, there exist a random variable $X$ and real numbers $a_1>0, a_2>0, b_1, b_2$ such that $X_1 \stackrel{d}{=} a_1 X + b_1$ and $X_2 \stackrel{d}{=} a_2 X + b_2$. Since $X_1$ and $X_2$ have the same mean and variance, we have:

  1. $E[X_1] = E[X_2] \implies a_1 E[X] + b_1 = a_2 E[X] + b_2$.
  2. $\operatorname{Var}[X_1] = \operatorname{Var}[X_2] \implies a_1^2 \operatorname{Var}[X] = a_2^2 \operatorname{Var}[X]$.

If $\operatorname{Var}[X] = 0$, then $X_1=E[X_1]=X_2=E[X_2]$ with probability $1$, and hence the higher moments of $X_1$ and $X_2$ are all equal. So we may assume that $\operatorname{Var}[X] \neq 0$. Using this, (2) implies that $|a_1|=|a_2|$. Since $a_1>0$ and $a_2>0$, we have in fact that $a_1=a_2$. In turn, (1) above now implies that $b_1=b_2$. We therefore have that: $$ E[X_1^k] = E[(a_1X+b_1)^k] = E[(a_2X+b_2)^k] = E[X_2^k], $$ for any $k$, i.e., all moments of $X_1$ and $X_2$ are all equal.

whuber
  • 281,159
  • 54
  • 637
  • 1,101
yyzz
  • 69
  • 1
  • 1
    (+1) I cannot find fault with this answer. Apparently someone does, and they also find fault with mine. I do not understand this unexplained behaviour. – Carl Dec 24 '17 at 18:28
  • 5
    @Carl This answer is incorrect--that's why it's being downvoted. Xi'an has already provided a counterexample. – whuber Dec 24 '17 at 20:22
  • 1
    @whuber Please see my comments under Xi'an's answer. I do not agree with him but did not downvote because both he and you have a right to your opinion, even if I consider it to be incorrect. – Carl Dec 25 '17 at 01:31
  • 8
    @Carl After re-reading this answer, I need to retract my original assessment: this answer is correct (and +1 for that), *and it is correct because it clearly explains how it is interpreting the original question.* (Specifically, there is a common yet narrow concept of a "location-scale family" as consisting of just a *single* standard distribution along with all its translates and positive rescalings.) I believe the original question was intended to ask something a little different; the basis of that belief is the reference to more than two parameters in the post. – whuber Dec 25 '17 at 23:30
  • @whuber I congratulate you on your largess over a solitary interpretative approach. I wish this were more common. – Carl Dec 26 '17 at 00:13
  • 2
    I am sorry if I have not been very clear and I thank you for the time you have spent for looking into this but that is not what I asked. – gioxc88 Dec 26 '17 at 00:33
  • 1
    @Martijn We're suffering from a badly phrased question, so making one's interpretation clear is important. I am focusing on the final statement, "But if the number of parameters is ≤2 and X and Y come from the same distribution family with the same expectation and variance, then does it mean that Z1 and Z2 has the same distribution (higher moments)?" That makes no reference to location-scale families. The answer to it is a definite no. If it were to be modified to stipulate that it is a location-scale family, then the answer is yes, *assuming* a standard meaning of "parameter." – whuber Aug 27 '18 at 16:08
1

Since the question can be interpreted in multipe ways I will split this answer into two parts.

  • A: distribution families.
  • B: location-scale distribution families.

The problem with case A can be easily answered/demonstrated by many families with a shape parameter.

The problem with case B is more difficult since one and a half parameters seem to be sufficient to specify location and scale (location in $\mathbb{R}$ and scale in $\mathbb{R_{>0}}$), and the problem becomes whether two parameters can be used to encode (multiple) shapes in addition as well. This is not so trivial. We can easily come up with specific two parameter location scale families and demonstrate that you do not have different shapes, but it does not proof that this is a fixed rule for any two parameter location scale family.

A: Can two different distributions from the same 2 parameter distribution family have the same mean and variance?

The answer is yes and it can already be shown using one of the explicitly mentioned examples: the normalized Gamma distribution

Family of normalized gamma distributions

Let $Z = \frac{X-\mu}{\sigma}$ with $X$ a Gamma distributed variable. The (cumulative) distribution of $Z$ is as below:

$$F_Z(z;k) = \begin{cases} 0 & \quad \text{if} & z < -\sqrt{k}\\ \frac{1}{\Gamma(k)} \gamma(k, {z\sqrt{k}+k}) & \quad \text{if} & z \geq -\sqrt{k} \end{cases} $$

where $\gamma$ is the incomplete gamma function.

So here it is clearly the case that different $Z_1$ and $Z_2$ (distributions from the family of normalized gamma distributions) can have same mean and variance (namely $\mu=0$ and $\sigma=1$) but differ based on the parameter $k$ (often denoted 'shape' parameter). This is closely linked to the fact that the family of gamma distributions is not a location-scale family.

B: Can two different distributions from the same 2 parameter location-scale distribution family have the same mean and variance?

I believe that the answer is no if we consider only smooth families (smooth: a small change in the parameters will result in a small change of the distribution/function/curve). But that answer is not so trivial and when we would use more general (non-smooth) families then we can say yes, although these families only exist in theory and have no practical relevance.

Generating a location-scale family from a single distribution by translation and scaling

From any particular single distribution we can generate a location-scale family by translation and scaling. If $f(x)$ is the probability density function of the single distribution, then the probability density function for a member of the family will be

$$f(x;\mu,\sigma) = \frac{1}{\sigma}f(\frac{x-\mu}{\sigma})$$

For a location-scale family that can be generated in such way we have:

  • for any two members $f(x;\mu_1,\sigma_1)$ and $f(x;\mu_2,\sigma_2)$ if their means and variances are equal, then $f(x;\mu_1,\sigma_1) = f(x;\mu_2,\sigma_2)$

Can for all two parameter location-scale families their member distributions be generated from a single member distribution by translation and scaling?

So translation and scaling can convert a single distribution into a location-scale family. The question is whether the reverse is true and whether every two parameter location-scale family (where the parameters $\theta_1$ and $\theta_2$ do not necessarily need to coincide with the location $\mu$ and scale $\sigma$) can be described by a translation and scaling of a single member from that family.

For particular two parameter location-scale families like the family of normal distributions it is not too difficult to show that they can be generated according to the process above (scaling and translating of single example member).

One may wonder whether it is possible for every two parameter location-scale family to be generated out of a single member by translation and scaling. Or a conflicting statement: "Can a two parameter location-scale family contain two different member distributions with the same mean and variance?", for which it would be necessary that the family is a union of multiple subfamilies that are each generated by translation and scaling.

Case 1: Family of generalized Students' t-distributions, parameterized by two variables

A contrived example occurs when we make some mapping from $R^2$ into $R^3$ (cardinality-of-mathbbr-and-mathbbr2) which allows the freedom to use two parameters $\theta_1$ and $\theta_2$ to describe a union of multiple subfamilies that are generated by translation and scaling.

Let's use the (three parameter) generalized Student's t-distribution:

$f(x;\nu,\mu,\sigma) = \frac{\Gamma \left( \frac{\nu + 1}{2} \right) }{\Gamma \left( \frac{\nu}{2} \right) \sqrt{\pi\nu}\sigma} \left(1 + \frac{1}{\nu} \left( \frac{x-\mu}{\sigma} \right)^2 \right)^{-\frac{\nu+1}{2}}$

with the three parameters changed as following $$\begin{array}{rcl} \mu &=& \tan (\theta_1)\\ \sigma &=& \theta_2\\ \nu &=& \lfloor 0.5+\theta_1/\pi \rfloor \end{array}$$

then we have

$f(x;\theta_1,\theta_2) = \frac{\Gamma \left( \frac{\lfloor 0.5+\theta_1/\pi \rfloor + 1}{2} \right) }{\Gamma \left( \frac{\lfloor 0.5+\theta_1/\pi \rfloor}{2} \right) \sqrt{\pi\lfloor 0.5+\theta_1/\pi \rfloor}\theta_2} \left(1 + \frac{1}{\lfloor 0.5+\theta_1/\pi \rfloor} \left( \frac{x-\tan(\theta_1)}{\theta_2} \right)^2 \right)^{-\frac{\lfloor 0.5+\theta_1/\pi \rfloor+1}{2}}$

which may be considered a two parameter location-scale family (albeit not very useful) that can not be generated by translation and scaling of only a single member.

Case 2: Location-scale families generated by negative scaling of a single distribution with nonzero skew

A less contrived example, than using this tan-function, is given by Whuber under the comments of Carl's answer. We can have a family $x \mapsto f(x/b + a)$ where flipping the sign of $b$ keeps the mean and variance unchanged but possibly changing the uneven higher moments. So this gives a bit more easily a two parameter location-scale family where members with the same mean and variance can have different higher order moments. This example from Whuber can be split into two subfamilies each of which can be generated out of a single member by translation and scaling.

Smooth families

If we try to make a single smooth two parameter distribution family (smooth: a small change in the parameters will result in a small change of the distribution/function/curve) by somehow making a composition of two or more families that are generated by translation and scaling, then we get into problems to have the two parameters cover both the variation of 'mean' and 'variance', as well as the third parameter 'shape'. A formal proof will have to go along the same lines as the answer to the question: Is there a smooth surjective function $f:\mathbb{R}^2 \mapsto \mathbb{R}^3$? (where the answer is no in the case of smooth, ie. infinitely differentiable, functions although there are continuous functions that would do the job such as Peano curves).

Intuition: Imagine there would be some parameters $\theta_1$, $\theta_2$ that describe the distributions in some location-scale distribution family and by which we can change the mean and variance as well as some other moments, then we should be able to express $\theta_1$, $\theta_2$, in terms of the mean $\mu$ and variance $\sigma$

$$\begin{array}{rcl} \theta_1 &= &f_{\theta_1}(\mu,\sigma) \\ \theta_2 &=& f_{\theta_2}(\mu,\sigma)\end{array}$$

but these need to be multiple valued functions and these can not make continuous transitions, the different values from $f_{\theta_1}(\mu,\sigma)$ for a particular $\mu$ and $\sigma$ are not continuous, and will not be able to model a continuous shape parameter.

I am actually not so sure about this final part. We could possibly use a space-filling curve (such as the Peano curve, if only we knew how to express coordinates on the curve to coordinates of the hypercube) to have a single parameter $\theta_1$ completely model multiple features like mean and variance, without giving up the property that a small change of the parameter $\theta_1$ is equivalent to a small change of the function $f(x;\theta_1)$ at every $x$

Sextus Empiricus
  • 43,080
  • 1
  • 72
  • 161
  • 1
    I stopped reading after the initial definitions because they are so unclear and contradictory. By "integrate" you of course mean integration *over $x$ only.* By "$f,$" though, you *must* mean the CDF and not the PDF, because the division by $b\ne 1$ changes the integral. By not imposing any restrictions on how $f$ can vary with $\theta$ you also adopt a much broader concept of "family" than is usual. Only that allows you to discuss a "map from $R^2$ to $R^3.$" The problem with these "maps" is they cannot be continuous and will have no statistical meaning. – whuber Aug 27 '18 at 11:50
  • 2
    I'm not objecting to simplicity or the language, but to the confusion that is being sown. The problem with your $R^2\to R^3$ map points out why you need to impose additional mathematical structure--a suitable topology--on the family. Allowing the distributions to change in such a (violently) discontinuous manner with $\theta$ is not only impractical and meaningless, it would likely invalidate useful methods and theorems for no good reason. For instance, MLE is almost always performed under the assumption that the distribution varies with $\theta$ in a piecewise differentiable manner. – whuber Aug 27 '18 at 13:32
  • You might consider moving some of this discussion about families to the [family oriented question](https://stats.stackexchange.com/q/320746/99274) that arose because of the 'out of context' use of that term in the question above. I would appreciate that because I am still not certain that the word has a very useful meaning for two reasons 1) it is ambiguous and flexible 2) and as a mapping, such a broad term that almost any PDF is related to any other one somehow. – Carl Aug 27 '18 at 23:42
  • 1
    The second bullet is incorrect: it neither follows from any of the assumptions nor is it part of the definition of a location-scale family. – whuber Aug 28 '18 at 14:11
  • That's not what it says, though: by using "$\theta_i$" and "$\theta_j$" you are explicitly indicating the possibility of *different* values of the additional parameters. At a minimum, the notation needs to be clarified. – whuber Aug 28 '18 at 14:24
  • 1
    It is tremendously confusing because now all references to the $\theta_i$ are superfluous. I believe the quantifiers now in your statement might not convey correctly the idea you have. Why not just drop the $\theta_i$ and simply state that the family consists of the set of distributions $x \to F(bx + a)$ for one given $F$ and all $(a,b)\in\mathbb{R}^2$ with $b\gt 0$? There's no need to refer to means and variances, either--that's just a distraction from the essential idea, which does not require $F$ to have any moments at all. – whuber Aug 28 '18 at 14:52
  • 1
    @whuber if you are generating location-scale family from one single example then indeed it would seem like it is much easier to use $\mu$ and $\sigma$. Here I am however imagining that we already have a family of curves parameterized by some alternative $\theta_1$ and $\theta_2$ and I wonder whether it could be possible that such a family contains *more* curves than just the curves created by scaling one member with $\mu$ and $\sigma$ (as in the transformation with the tangent). I will see if I can change the formulation somehow again (do you disagree with the idea or with the formulation?). – Sextus Empiricus Aug 30 '18 at 07:51
  • 1
    I think that's a good way to frame the question. BTW, there are standard mathematical ways to describe the situation: you are considering the orbits of a mathematical structure (the set of distributions) under the action of a topological group. "Contain more curves" means the quotient space consists of more than one point. – whuber Aug 30 '18 at 12:26
  • (+1) Some thought-provoking examples there. In both Cases 1 & 2 the trick involves squeezing more information into a real number than it needs to do its job as a scale parameter. Doesn't it then become a matter of *encoding* rather than just *parametrizing*? – Scortchi - Reinstate Monica Nov 01 '18 at 08:59
  • @Scorchi, I am not sure how to accurately define the difference between *encoding* and *parameterizing* (the difference is that *parameter* is more useful or has some physical intuitive meaning, whereas a encoding variable might make little sense?). Indeed of all 'regular' two parameter location-scale families (by regular I mean anything that people actually use) there might be no example. However, the case 2 is not so strange to call it encoding instead of parameterizing. It has just a scale parameter that can become negative. (but the zero is not included and it is not a 'smooth' family) – Sextus Empiricus Nov 17 '18 at 22:17