Most Popular

1500 questions
87
votes
3 answers

What is "restricted maximum likelihood" and when should it be used?

I have read in the abstract of this paper that: "The maximum likelihood (ML) procedure of Hartley aud Rao is modified by adapting a transformation from Patterson and Thompson which partitions the likelihood render normality into two parts, one…
Joe King
  • 3,024
  • 6
  • 32
  • 58
87
votes
4 answers

What is the difference between a "link function" and a "canonical link function" for GLM

What's the difference between terms 'link function' and 'canonical link function'? Also, are there any (theoretical) advantages of using one over the other? For example, a binary response variable can be modeled using many link functions such as…
steadyfish
  • 1,772
  • 2
  • 15
  • 30
87
votes
26 answers

What is the single most influential book every statistician should read?

If you could go back in time and tell yourself to read a specific book at the beginning of your career as a statistician, which book would it be?
Neil McGuigan
  • 9,292
  • 13
  • 54
  • 62
87
votes
11 answers

Why should I be Bayesian when my model is wrong?

Edits: I have added a simple example: inference of the mean of the $X_i$. I have also slightly clarified why the credible intervals not matching confidence intervals is bad. I, a fairly devout Bayesian, am in the middle of a crisis of faith of…
Guillaume Dehaene
  • 2,137
  • 1
  • 10
  • 18
87
votes
7 answers

What are principal component scores?

What are principal component scores (PC scores, PCA scores)?
vrish88
  • 1,143
  • 1
  • 9
  • 8
87
votes
4 answers

What is an "uninformative prior"? Can we ever have one with truly no information?

Inspired by a comment from this question: What do we consider "uninformative" in a prior - and what information is still contained in a supposedly uninformative prior? I generally see the prior in an analysis where it's either a frequentist-type…
Fomite
  • 21,264
  • 10
  • 78
  • 137
87
votes
3 answers

Why is ridge regression called "ridge", why is it needed, and what happens when $\lambda$ goes to infinity?

Ridge regression coefficient estimate $\hat{\beta}^R$ are the values that minimize the $$ \text{RSS} + \lambda \sum_{j=1}^p\beta_j^2. $$ My questions are: If $\lambda = 0$, then we see that the expression above reduces to the usual RSS. What if…
cgo
  • 7,445
  • 10
  • 42
  • 61
87
votes
5 answers

What are modern, easily used alternatives to stepwise regression?

I have a dataset with around 30 independent variables and would like to construct a generalized linear model (GLM) to explore the relationship between them and the dependent variable. I am aware that the method I was taught for this situation,…
86
votes
3 answers

An example: LASSO regression using glmnet for binary outcome

I am starting to dabble with the use of glmnet with LASSO Regression where my outcome of interest is dichotomous. I have created a small mock data frame below: age <- c(4, 8, 7, 12, 6, 9, 10, 14, 7) gender <- c(1, 0, 1, 1, 1, 0, 1, 0, 0) bmi_p…
Matt Reichenbach
  • 3,404
  • 6
  • 25
  • 43
86
votes
6 answers

Why is the L2 regularization equivalent to Gaussian prior?

I keep reading this and intuitively I can see this but how does one go from L2 regularization to saying that this is a Gaussian Prior analytically? Same goes for saying L1 is equivalent to a Laplacean prior. Any further references would be great.
Anonymous
  • 1,169
  • 2
  • 10
  • 10
86
votes
5 answers

What do the residuals in a logistic regression mean?

In answering this question John Christie suggested that the fit of logistic regression models should be assessed by evaluating the residuals. I'm familiar with how to interpret residuals in OLS, they are in the same scale as the DV and very clearly…
russellpierce
  • 17,079
  • 16
  • 67
  • 98
86
votes
5 answers

How to interpret an inverse covariance or precision matrix?

I was wondering whether anyone could point me to some references that discuss the interpretation of the elements of the inverse covariance matrix, also known as the concentration matrix or the precision matrix. I have access to Cox and Wermuth's…
Vinh Nguyen
  • 1,031
  • 1
  • 9
  • 4
85
votes
14 answers

What is the meaning of "All models are wrong, but some are useful"

"Essentially, all models are wrong, but some are useful." --- Box, George E. P.; Norman R. Draper (1987). Empirical Model-Building and Response Surfaces, p. 424, Wiley. ISBN 0471810339. What exactly is the meaning of the above phrase?
gpuguy
  • 1,063
  • 3
  • 10
  • 10
85
votes
7 answers

Line of best fit does not look like a good fit. Why?

Have a look at this Excel graph: The 'common sense' line-of-best-fit would appear be an almost vertical line straight through the center of the points (edited by hand in red). However the linear trend line as decided by Excel is the diagonal black…
ConanTheGerbil
  • 921
  • 1
  • 6
  • 4
85
votes
5 answers

What is the reason that a likelihood function is not a pdf?

What is the reason that a likelihood function is not a pdf (probability density function)?
John Doe
  • 1,275
  • 2
  • 15
  • 24