Questions tagged [generalized-linear-model]

A generalization of linear regression allowing for nonlinear relationships via a "link function" and for the variance of the response to depend on the predicted value. (Not to be confused with "general linear model" which extends the ordinary linear model to general covariance structure and multivariate response.)

A generalized linear model extends regression models by allowing a more general (conditional) distribution for the observations, a variance function related to the mean, and by allowing non-linear relationship between the mean and the linear predictor, $X\beta$.

A generalized linear model consists of three components:

  1. Systematic part: $\eta_i = X_i'\beta$ . This is the linear predictor.
  2. Random part: $Y_1, Y_2, ..., Y_n$ that are independent random variables where $$ Y_i \sim D(\mu_i = EY_i)$$ where $D$ is an exponential family distribution. More generally we can have an additional parameter, the overdispersion parameter $\phi$ which controls the dispersion in $Y_i$
  3. Link function: an invertible function $g$, such that $\eta_i = g(\mu_i)$, or equivalently, $E(Y_i) = \mu_i = g^{-1}(\eta_i) = g^{-1}(X_i'\beta)$

The similar term "general linear model" is often confused with generalized linear models (both are typically abbreviated GLM). A general linear model is the standard multiple regression setting $Y = X\beta + \varepsilon$ (for a "design matrix" $X$, parameters $\beta$, and "error term" $\varepsilon$). Use the or tags for such cases (see discussion).

3987 questions
354
votes
12 answers

Difference between logit and probit models

What is the difference between Logit and Probit model? I'm more interested here in knowing when to use logistic regression, and when to use Probit. If there is any literature which defines it using R, that would be helpful as well.
Beta
  • 5,784
  • 9
  • 33
  • 44
119
votes
4 answers

When to use gamma GLMs?

The gamma distribution can take on a pretty wide range of shapes, and given the link between the mean and the variance through its two parameters, it seems suited to dealing with heteroskedasticity in non-negative data, in a way that log-transformed…
generic_user
  • 11,981
  • 8
  • 40
  • 63
103
votes
5 answers

Diagnostic plots for count regression

What diagnostic plots (and perhaps formal tests) do you find most informative for regressions where the outcome is a count variable? I'm especially interested in Poisson and negative binomial models, as well as zero-inflated and hurdle counterparts…
87
votes
4 answers

What is the difference between a "link function" and a "canonical link function" for GLM

What's the difference between terms 'link function' and 'canonical link function'? Also, are there any (theoretical) advantages of using one over the other? For example, a binary response variable can be modeled using many link functions such as…
steadyfish
  • 1,772
  • 2
  • 15
  • 30
87
votes
5 answers

What are modern, easily used alternatives to stepwise regression?

I have a dataset with around 30 independent variables and would like to construct a generalized linear model (GLM) to explore the relationship between them and the dependent variable. I am aware that the method I was taught for this situation,…
86
votes
5 answers

What do the residuals in a logistic regression mean?

In answering this question John Christie suggested that the fit of logistic regression models should be assessed by evaluating the residuals. I'm familiar with how to interpret residuals in OLS, they are in the same scale as the DV and very clearly…
russellpierce
  • 17,079
  • 16
  • 67
  • 98
79
votes
1 answer

How to interpret coefficients in a Poisson regression?

How can I interpret the main effects (coefficients for dummy-coded factor) in a Poisson regression? Assume the following example: treatment <- factor(rep(c(1, 2), c(43, 41)), levels = c(1, 2), …
66
votes
1 answer

Why is the square root transformation recommended for count data?

It is often recommended to take the square root when you have count data. (For some examples on CV, see @HarveyMotulsky's answer here, or @whuber's answer here.) On the other hand, when fitting a generalized linear model with a response variable…
63
votes
9 answers

Advanced statistics books recommendation

There are several threads on this site for book recommendations on introductory statistics and machine learning but I am looking for a text on advanced statistics including, in order of priority: maximum likelihood, generalized linear models,…
63
votes
3 answers

Interpreting Residual and Null Deviance in GLM R

How to interpret the Null and Residual Deviance in GLM in R? Like, we say that smaller AIC is better. Is there any similar and quick interpretation for the deviances also? Null deviance: 1146.1 on 1077 degrees of freedom Residual deviance: 4589.4…
Anjali
  • 891
  • 3
  • 10
  • 10
61
votes
4 answers

How are regression, the t-test, and the ANOVA all versions of the general linear model?

How are they all versions of the same basic statistical method?
60
votes
3 answers

Linear model with log-transformed response vs. generalized linear model with log link

In this paper titled "CHOOSING AMONG GENERALIZED LINEAR MODELS APPLIED TO MEDICAL DATA" the authors write: In a generalized linear model, the mean is transformed, by the link function, instead of transforming the response itself. The two methods …
59
votes
4 answers

Choosing between LM and GLM for a log-transformed response variable

I'm trying to understand the philosophy behind using a Generalized Linear Model (GLM) vs a Linear Model (LM). I've created an example data set below where: $$\log(y) = x + \varepsilon $$ The example does not have the error $\varepsilon$ as a…
56
votes
4 answers

Regression for an outcome (ratio or fraction) between 0 and 1

I am thinking of building a model predicting a ratio $a/b$, where $a \le b$ and $a > 0$ and $b > 0$. So, the ratio would be between $0$ and $1$. I could use linear regression, although it doesn't naturally limit to 0..1. I have no reason to believe…
53
votes
2 answers

How to simulate artificial data for logistic regression?

I know I'm missing something in my understanding of logistic regression, and would really appreciate any help. As far as I understand it, the logistic regression assumes that the probability of a '1' outcome given the inputs, is a linear combination…
zorbar
  • 727
  • 1
  • 7
  • 9
1
2 3
99 100