15

Does a family of a distribution have a different definition for statistics than in other disciplines?

In general, a family of curves is a set of curves, each of which is given by a function or parametrization in which one or more of the parameters is varied. Such families are used, for example, to characterize electronic components.

For statistics, a family according to one source is the result of varying the shape parameter. How then can we understand that the gamma distribution has a shape and scale parameter and only the generalized gamma distribution has, in addition, a location parameter? Does that make the family the result of varying the location parameter? According to @whuber the meaning of a family is implicitly A "parameterization" of a family is a continuous map from a subset of ℝ$^n$, with its usual topology, into the space of distributions, whose image is that family.

What, in simple language, is a family for statistical distributions?

A question about relations among of the statistical properties of distributions from the same family has already generated considerable controversy for a different question so it seems worthwhile to explore the meaning.

That this is not necessarily a simple question is born out by its use in the phrase exponential family, which has nothing to do with a family of curves, but is related to changing the form of the PDF of a distribution by reparameterization not only of parameters, but also substitution of functions of independent random variables.

gung - Reinstate Monica
  • 132,789
  • 81
  • 357
  • 650
Carl
  • 11,532
  • 7
  • 45
  • 102
  • 1
    By the phrasing "family of a distribution", do you mean something else "a family of distributions"? An exponential family is a family of distributions (with certain properties), and interpreting the pdf of each distribution as a curve,it even corresponds to a family of curves, so the last paragraphs seems confused. – Juho Kokkala Dec 29 '17 at 19:16
  • @JuhoKokkala It seems confusing because the meaning of "family" is context dependent. For example, a [normal distribution](https://en.wikipedia.org/wiki/Exponential_family#Normal_distribution:_unknown_mean,_known_variance) of unknown mean and known variance is in the exponential family. A normal distribution has infinite support, $(-\infty,+\infty)$, and an exponential distribution has semi-infinite support, $[0,+\infty)$, so there is no family of curves for an exponential distribution that covers the range of a normal distribution, they never have the same shape... – Carl Dec 29 '17 at 19:36
  • @JuhoKokkala ...and an exponential PDF does not even have a location parameter, whereas a normal distribution cannot do without one. See the link above for the substitutions needed, and the context in which a normal pdf is in the exponential family. – Carl Dec 29 '17 at 19:40
  • 1
    https://stats.stackexchange.com/questions/129990/definition-of-exponential-family may be relevant. "normal distribution of unknown mean and known variance is in the exponential family" is, to my knowledge, abuse of terminology (although somewhat common). To be exact, an exponential family is a family of distributions with certain properties. The family of normal distributions with unknown mean and known variance is _an_ exponential family; the family of exponential distributions is another exponential family, etc. – Juho Kokkala Dec 30 '17 at 20:40
  • @JuhoKokkala The quote "normal distribution of unknown mean..." is from [Wikipedia](https://en.wikipedia.org/wiki/Exponential_family#Normal_distribution:_unknown_mean,_known_variance). What I asked for is a simple explanation of what a family means, what I am getting is topology theory that is hard to follow. Simple, anyone? – Carl Dec 30 '17 at 22:11
  • 1
    @JuhoKokkala: That "family" is so commonly (ab)used, in a special case, to mean "set of families" is perhaps worth pulling out into another answer. (I can't think of other cases - for some reason it seems no-one's prone to talking of "*the* location-scale family".) – Scortchi - Reinstate Monica Feb 26 '18 at 11:07
  • @Carl: The explanation you link to in ["Families of Distributions"](https://www.itl.nist.gov/div898/handbook/eda/section3/eda363.htm) from NIST's Engineering Statistics Handbook - "Many probability distributions are not a single distribution, but are in fact a family of distributions. This is due to the distribution having one or more shape parameters." - is, I believe, wholly idiosyncratic. – Scortchi - Reinstate Monica Sep 02 '18 at 15:57
  • @Scortchi And yet, NIST is recognized authority. There are many things in any field that are idiosyncratic. I think perhaps that electrical engineers use the term "family" differently than statisticians. E.g., why say "delta method" when "error propagation" is probably better documented? I am not criticizing here, just lamenting the linguistic gymnastics I have experienced publishing in slightly different insular fields. – Carl Sep 02 '18 at 16:29
  • @Carl: I meant specific to the individual author. You're right that different areas of applied Statistics have their terminological quirks, but I don't think this is one of them. (Partly because I used to work in Engineering & never came across it.) In general I'd not trust applied handbooks to give good accounts of theoretical concepts. – Scortchi - Reinstate Monica Sep 02 '18 at 17:23
  • @Scortchi I first came across it for component characteristics 60 years ago. We have the same thing in ordinary English, nuclear families and extended families. As long as we signal what we mean when we say it, there is no ambiguity. – Carl Sep 02 '18 at 17:54

3 Answers3

15

The statistical and mathematical concepts are exactly the same, understanding that "family" is a generic mathematical term with technical variations adapted to different circumstances:

A parametric family is a curve (or surface or other finite-dimensional generalization thereof) in the space of all distributions.

The rest of this post explains what that means. As an aside, I don't think any of this is controversial, either mathematically or statistically (apart from one minor issue which is noted below). In support of this opinion I have supplied many references (mostly to Wikipedia articles).


This terminology of "families" tends to be used when studying classes $\mathcal C_Y$ of functions into a set $Y$ or "maps." Given a domain $X$, a family $\mathcal F$ of maps on $X$ parameterized by some set $\Theta$ (the "parameters") is a function

$$\mathcal F : X\times \Theta\to Y$$

for which (1) for each $\theta\in\Theta$, the function $\mathcal{F}_\theta:X\to Y$ given by $\mathcal{F}_\theta(x)=\mathcal{F}(x,\theta)$ is in $\mathcal{C}_Y$ and (2) $\mathcal F$ itself has certain "nice" properties.

The idea is that we want to vary functions from $X$ to $Y$ in a "smooth" or controlled manner. Property (1) means that each $\theta$ designates such a function, while the details of property (2) will capture the sense in which a "small" change in $\theta$ induces a sufficiently "small" change in $\mathcal{F}_\theta$.

A standard mathematical example, close to the one mentioned in the question, is a homotopy. In this case $\mathcal{C}_Y$ is the category of continuous maps from topological spaces $X$ into the topological space $Y$; $\Theta=[0,1]\subset\mathbb{R}$ is the unit interval with its usual topology, and we require that $\mathcal{F}$ be a continuous map from the topological product $X \times \Theta$ into $Y$. It can be thought of as a "continuous deformation of the map $\mathcal{F}_0$ to $\mathcal{F}_1$." When $X=[0,1]$ is itself an interval, such maps are curves in $Y$ and the homotopy is a smooth deformation from one curve to another.

For statistical applications, $\mathcal{C}_Y$ is the set of all distributions on $\mathbb{R}$ (or, in practice, on $\mathbb{R}^n$ for some $n$, but to keep the exposition simple I will focus on $n=1$). We may identify it with the set of all non-decreasing càdlàg functions $\mathbb{R}\to [0,1]$ where the closure of their range includes both $0$ and $1$: these are the cumulative distribution functions, or simply distribution functions. Thus, $X=\mathbb R$ and $Y=[0,1]$.

A family of distributions is any subset of $\mathcal{C}_Y$. Another name for a family is statistical model. It consists of all distributions that we suppose govern our observations, but we do not otherwise know which distribution is the actual one.

  • A family can be empty.
  • $\mathcal{C}_Y$ itself is a family.
  • A family may consist of a single distribution or just a finite number of them.

These abstract set-theoretic characteristics are of relatively little interest or utility. It is only when we consider additional (relevant) mathematical structure on $\mathcal{C}_Y$ that this concept becomes useful. But what properties of $\mathcal{C}_Y$ are of statistical interest? Some that show up frequently are:

  1. $\mathcal{C}_Y$ is a convex set: given any two distributions ${F}, {G}\in \mathcal{C}_Y$, we may form the mixture distribution $(1-t){F}+t{G}\in Y$ for all $t\in[0,1]$. This is a kind of "homotopy" from $F$ to $G$.

  2. Large parts of $\mathcal{C}_Y$ support various pseudo metrics, such as the Kullback-Leibler divergence or the closely related Fisher Information metric.

  3. $\mathcal{C}_Y$ has an additive structure: corresponding to any two distributions $F$ and $G$ is their sum, ${F}\star {G}$.

  4. $\mathcal{C}_Y$ supports many useful, natural functions, often termed "properties." These include any fixed quantile (such as the median) as well as the cumulants.

  5. $\mathcal{C}_Y$ is a subset of a function space. As such, it inherits many useful metrics, such as the sup norm ($L^\infty$ norm) given by $$||F-G||_\infty = \sup_{x\in\mathbb{R}}|F(x)-G(x)|.$$

  6. Natural group actions on $\mathbb R$ induce actions on $\mathcal{C}_Y$. The commonest actions are translations $T_\mu:x \to x+\mu$ and scalings $S_\sigma:x\to x\sigma$ for $\sigma\gt 0$. The effect these have on a distribution is to send $F$ to the distribution given by $F^{\mu,\sigma}(x) = F((x-\mu)/\sigma)$. These lead to the concepts of location-scale families and their generalizations. (I don't supply a reference, because extensive Web searches turn up a variety of different definitions: here, at least, may be a tiny bit of controversy.)

The properties that matter depend on the statistical problem and on how you intend to analyze the data. Addressing all the variations suggested by the preceding characteristics would take too much space for this medium. Let's focus on one common important application.

Take, for instance, Maximum Likelihood. In most applications you will want to be able to use Calculus to obtain an estimate. For this to work, you must be able to "take derivatives" in the family.

(Technical aside: The usual way in which this is accomplished is to select a domain $\Theta\subset \mathbb{R}^d$ for $d\ge 0$ and specify a continuous, locally invertible function $p$ from $\Theta$ into $\mathcal{C}_Y$. (This means that for every $\theta\in\Theta$ there exists a ball $B(\theta, \epsilon)$, with $\epsilon\gt 0$ for which $p\mid_{B(\theta,\epsilon)}: B(\theta,\epsilon)\cap \Theta \to \mathcal{C}_Y$ is one-to-one. In other words, if we alter $\theta$ by a sufficiently small amount we will always get a different distribution.))

Consequently, in most ML applications we require that $p$ be continuous (and hopefully, almost everywhere differentiable) in the $\Theta$ component. (Without continuity, maximizing the likelihood generally becomes an intractable problem.) This leads to the following likelihood-oriented definition of a parametric family:

A parametric family of (univariate) distributions is a locally invertible map $$\mathcal{F}:\mathbb{R}\times\Theta \to [0,1],$$ with $\Theta\subset \mathbb{R}^n$, for which (a) each $\mathcal{F}_\theta$ is a distribution function and (b) for each $x\in\mathbb R$, the function $\mathcal{L}_x: \theta\to [0,1]$ given by $\mathcal{L}_x(\theta) = \mathcal{F}(x,\theta)$ is continuous and almost everywhere differentiable.

Note that a parametric family $\mathcal F$ is more than just the collection of $\mathcal{F}_\theta$: it also includes the specific way in which parameter values $\theta$ correspond to distributions.

Let's end up with some illustrative examples.

  • Let $\mathcal{C}_Y$ be the set of all Normal distributions. As given, this is not a parametric family: it's just a family. To be parametric, we have to choose a parameterization. One way is to choose $\Theta = \{(\mu,\sigma)\in\mathbb{R}^2\mid \sigma \gt 0\}$ and to map $(\mu,\sigma)$ to the Normal distribution with mean $\mu$ and variance $\sigma^2$.

  • The set of Poisson$(\lambda)$ distributions is a parametric family with $\lambda\in\Theta=(0,\infty)\subset\mathbb{R}^1$.

  • The set of Uniform$(\theta, \theta+1)$ distributions (which features prominently in many textbook exercises) is a parametric family with $\theta\in\mathbb{R}^1$. In this case, $F_\theta(x) = \max(0, \min(1, x-\theta))$ is differentiable in $\theta$ except for $\theta\in\{x, x-1\}$.

  • Let $F$ and $G$ be any two distributions. Then $\mathcal{F}(x,\theta)=(1-\theta)F(x)+\theta G(x)$ is a parametric family for $\theta\in[0,1]$. (Proof: the image of $\mathcal F$ is a set of distributions and its partial derivative in $\theta$ equals $-F(x)+G(x)$ which is defined everywhere.)

  • The Pearson family is a four-dimensional family, $\Theta\subset\mathbb{R}^4$, which includes (among others) the Normal distributions, Beta distributions, and Inverse Gamma distributions. This illustrates the fact that any one given distribution may belong to many different distribution families. This is perfectly analogous to observing that any point in a (sufficiently large) space may belong to many paths that intersect there. This, together with the previous construction, shows us that no distribution uniquely determines a family to which it belongs.

  • The family $\mathcal{C}_Y$ of all finite-variance absolutely continuous distributions is not parametric. The proof requires a deep theorem of topology: if we endow $\mathcal{C}_Y$ with any topology (whether statistically useful or not) and $p: \Theta\to\mathcal{C}_Y$ is continuous and locally has a continuous inverse, then locally $\mathcal{C}_Y$ must have the same dimension as that of $\Theta$. However, in all statistically meaningful topologies, $\mathcal{C}_Y$ is infinite dimensional.

whuber
  • 281,159
  • 54
  • 637
  • 1,101
  • (+1) OK, I slogged through it. So is $\mathcal{F}:\mathbb{R}\times\Theta \to [0,1]$ a Polish space or not? Can we do a simple answer so people know how to avoid using the word *family* improperly, please. @JuhoKokkala related, for example, that Wikipedia abused language in their [exponential family](https://en.wikipedia.org/wiki/Exponential_family#Normal_distribution:_unknown_mean,_known_variance), that needs clarification. – Carl Dec 30 '17 at 23:12
  • 1
    Doesn't the second sentence of this answer serve that request for simplicity? – whuber Dec 30 '17 at 23:25
  • IMHO, however uninformed, no, it does not due to incompleteness, it doesn't say what a family isn't. The concept "in the space of all distributions" seems to relate to statistics only. – Carl Dec 30 '17 at 23:49
  • How to take "distributions" as continuous paths in relationships, as functions or as relations? Do not understand how this generalizes. – Carl Dec 31 '17 at 00:50
  • The rest of the post shows what a continuous path of distributions means in a rigorous sense and why it's important. It would indeed be possible to amplify this answer--greatly--by including other examples, which abound in mathematics, ranging among perturbations of differential equations, moduli spaces, homotopy theory (which I did mention), hydrodynamics (time-varying flows), and on and on. In the end, the concept is as simple and natural as the scatterplot, which uses points in an abstract space to represent observations in such a way that nearby points represent "similar" observations. – whuber Dec 31 '17 at 15:56
  • Please, I want an answer that is general so that the contrapositive is also true, which then would obviate improper usage of the term "family." – Carl Dec 31 '17 at 18:50
  • Could you articulate a "contrapositive"? That concept of the propositional calculus is not directly relevant to supplying definitions or examples, especially in a case where we seem to agree that there is no universal mathematical definition of "family": the scope is too broad for that. We're discussing a kind of meta-mathematical concept. But I would say that if you are looking at some kind of mathematical object or statistical model and cannot cast it into the general framework I describe here, then you would have little justification to use the term "family." – whuber Dec 31 '17 at 18:55
  • Definition (dictionary) that is not useful: *A function a relationship or expression involving one or more variables.* Useful definition whose contrapositive is logical: *A function is a relation for which each value from the set the first components of the ordered pairs is associated with exactly one value from the set of second components of the ordered pair.* – Carl Dec 31 '17 at 19:36
  • 1
    Dictionaries are hopeless for communicating mathematical concepts! Regardless, there's no proposition in evidence for which a contrapositive would make sense. The contrapositive of a conditional $P\to Q$ is, by definition, the conditional $-Q\to -P$. What statements correspond to $P$ and $Q$ here? Moreover, I hope this thread won't devolve to discussing the definition of a function: such basic material is treated in many texts at all levels. – whuber Dec 31 '17 at 20:29
  • A relation is not a function when there exist one or more x-values for which the y-value is not unique. What would be desirable is to say what things are not "families" so that logical errors can be avoided, i.e., a useful definition; a one one. If that is not possible, then at least some guidance as to how to not use the term would be desirable. – Carl Dec 31 '17 at 20:41
  • 1
    Almost five years ago I provided a [less-technical answer](https://stats.stackexchange.com/questions/63386/is-any-quantitative-property-of-the-population-a-parameter/63461#63461) to a closely related question. I just rediscovered that post and realized it also answers your question here. Maybe it will help. – whuber Feb 28 '18 at 23:49
  • This [family of curves](https://stats.stackexchange.com/a/330952/99274) shows a normal distribution to be a limiting value of a gamma distribution family. So, is the normal distribution in the gamma distribution family or not? Is it in a related family? Still do not "get" it. – Carl Mar 01 '18 at 16:44
  • @Carl No Gamma distribution is a Normal distribution. There are (infinitely many) families that include both types of distribution. "Related" has no inherent meaning in this context. A perfectly good mathematical analogy is that the curve consisting of all points $(t,1/t)$ for $t\gt 0$ comes arbitrarily close to both axes, but has no point in common with either axis. – whuber Mar 01 '18 at 16:48
  • [Exponential family of distributions](https://en.wikipedia.org/wiki/Exponential_family) includes normal and gamma. Gamma distribution for shape is one is exponential distribution. Any distribution with variable parameters is a family including exponential distribution, Thus an exponential family is a subset of an exponential family. Still confused. – Carl Mar 01 '18 at 17:11
  • 1
    @Carl That is confusing because you are referring to two completely different meanings of "Exponential distribution!" One refers to a [Gamma$(1)$ distribution](https://en.wikipedia.org/wiki/Exponential_distribution) while the other refers to a [large class of distributions](https://en.wikipedia.org/wiki/Exponential_family) (which is not even parametric). – whuber Mar 01 '18 at 17:14
  • Both have the same exact mathematical form. I think I am confused because the language is imprecise. – Carl Mar 01 '18 at 17:18
  • Rather than 'exponential family' which aren't even parametric, why not say "class of exponential form distributions"? – Carl Mar 01 '18 at 17:51
  • @Carl The exponential distribution is one example that follows the form of the exponential family, but they are not "the same." (A mouse is a mammal but it's not the same thing as a mammal.) I'm afraid I have no good solution to the terminology question, because it is what it is and it would take a great deal of effort and influence to change it. – whuber Mar 01 '18 at 18:04
1

To address a specific point brought up in the question: "exponential family" does not denote a set of distributions. (The standard, say, exponential distribution is a member of the family of exponential distributions, an exponential family; of the family of gamma distributions, also an exponential family; of the family of Weibull distributions, not an exponential family; & of any number of other families you might dream up.) Rather, "exponential" here refers to a property possessed by a family of distributions. So we shouldn't talk of "distributions in the exponential family" but of "exponential families of distributions"—the former is an abuse of terminology, as @JuhoKokkala points out. For some reason no-one commits this abuse when talking of location–scale families.

Scortchi - Reinstate Monica
  • 27,560
  • 8
  • 81
  • 248
0

Thanks to @whuber there is enough information to summarize in what I hope is a simpler form relating to the question from which this post arose. "Another name for a family [Sic, statistical family] is [a] statistical model."

From that Wikipedia entry: A statistical model consists of all distributions that we suppose govern our observations, but we do not otherwise know which distribution is the actual one. What distinguishes a statistical model from other mathematical models is that a statistical model is non-deterministic. Thus, in a statistical model specified via mathematical equations, some of the variables do not have specific values, but instead have probability distributions; i.e., some of the variables are stochastic. A statistical model is usually thought of as a pair $( S , P )$, where $S$ is the set of possible observations, i.e., the sample space, and $P$ is a set of probability distributions on $S$.

Suppose that we have a statistical model $(S, \mathcal{P})$ with $\mathcal{P}=\{P_{\theta} : \theta \in \Theta\}$. The model is said to be a Parametric model if $\Theta$ has a finite dimension. In notation, we write that $\Theta \subseteq \mathbb{R}^d$ where $d$ is a positive integer ($\mathbb{R}$ denotes the real numbers; other sets can be used, in principle). Here, $d$ is called the dimension of the model.

As an example, if we assume that data arise from a univariate Gaussian distribution, then we are assuming that
$$\mathcal{P}=\left\{P_{\mu,\sigma }(x) \equiv \frac{1}{\sqrt{2 \pi} \sigma} \exp\left( -\frac{(x-\mu)^2}{2\sigma^2}\right) : \mu \in \mathbb{R}, \sigma > 0 \right\}. $$ In this example, the dimension, $d$, equals 2, end quote.

Thus, if we reduce the dimensionality by assigning, for the example above, $\mu=0$, we can show a family of curves by plotting $\sigma=1,2,3,4,5$ or whatever choices for $\sigma$.

Carl
  • 11,532
  • 7
  • 45
  • 102