I've gone through a variety of introductions to generalized linear models (GLM), and there's always a point in the discussion that confuses me. The story often begins saying that $P(y|x)$ belongs to a member of the exponential family of distributions. Shortly thereafter, and without explanation, everyone always switches to the canonical exponential family, and finally to the exponential dispersion family of distributions, with an assumption that the dispersion parameter $\phi$ is known and constant.
Examples
Here is an example from an MIT OpenCourseWare lecture series (lectures 21-23 are on GLM). The exponential family discussion begins towards the end of lecture 21 and is much of the focus of lecture 22.
As another example, the Wikipedia article on GLM begins the Overview section with the statement
"In a generalized linear model (GLM), each outcome Y of the dependent variables is assumed to be generated from a particular distribution in an exponential family"
...and in the definition states (emphasis mine)
The GLM consists of three elements:
- An exponential family of probability distributions.
- A linear predictor η = Xβ.
- A link function g such that E(Y|X) = μ = g−1(η).
In the following line, however, the article begins discussing the overdispersed exponential family and limits further discussion to scalar parameters.
My Question
What I'm missing is why instructors are using these smaller classes of distributions - is it just because the examples are more tractable for a classroom, or are the log-likelihoods not guaranteed to be convex, or something else?
If it's the case that GLM only applies to exponential dispersion distributions, why is the requirement always stated as the broader, multi-parameter exponential family?