Most Popular
1500 questions
40
votes
2 answers
When do Poisson and negative binomial regressions fit the same coefficients?
I’ve noticed that in R, Poisson and negative binomial (NB) regressions always seem to fit the same coefficients for categorical, but not continuous, predictors.
For example, here's a regression with a categorical…

half-pass
- 3,594
- 7
- 23
- 34
40
votes
4 answers
Recall and precision in classification
I read some definitions of recall and precision, though it is every time in the context of information retrieval. I was wondering if someone could explain this a bit more in a classification context and maybe illustrate some examples. Say for…

Olivier_s_j
- 1,055
- 2
- 11
- 25
40
votes
11 answers
Open Source statistical textbooks?
There have been a few questions about statistical textbooks, such as the question Free statistical textbooks. However, I am looking for textbooks that are Open Source, for example, having an Creative Commons license. The reason is that in course…

Egon Willighagen
- 176
- 1
- 3
- 7
40
votes
2 answers
How to find a good fit for semi-sinusoidal model in R?
I want to assume that the sea surface temperature of the Baltic Sea is the same year after year, and then describe that with a function / linear model. The idea I had was to just input year as a decimal number (or num_months/12) and get out what the…

GaRyu
- 503
- 1
- 5
- 6
40
votes
4 answers
Polynomial regression using scikit-learn
I am trying to use scikit-learn for polynomial regression. From what I read polynomial regression is a special case of linear regression. I was hopping that maybe one of scikit's generalized linear models can be parameterised to fit higher order…

Mihai Damian
- 503
- 1
- 4
- 6
40
votes
3 answers
What is the Wine/Water Paradox in Bayesian statistics, and what is its resolution?
I have just heard about the Wine/Water Paradox in Bayesian statistics, but didn't understand it very well (see Mikkelson 2004 for an introduction). Can you explain in simple terms what the paradox is (and why is it a paradox), why it matters for…
user314217
40
votes
2 answers
Purpose of the link function in generalized linear model
What is the purpose of the link function as a component of the generalized linear model? Why do we need it?
Wikipedia states:
It can be convenient to match the domain of the link function to the range of the distribution function's mean
What's the…

Chris
- 1,169
- 3
- 12
- 16
40
votes
6 answers
Least-angle regression vs. lasso
Least-angle regression and the lasso tend to produce very similar regularization paths (identical except when a coefficient crosses zero.)
They both can be efficiently fit by virtually identical algorithms.
Is there ever any practical reason to…

NPE
- 5,351
- 5
- 33
- 44
40
votes
5 answers
How to derive the least square estimator for multiple linear regression?
In the simple linear regression case $y=\beta_0+\beta_1x$, you can derive the least square estimator $\hat\beta_1=\frac{\sum(x_i-\bar x)(y_i-\bar y)}{\sum(x_i-\bar x)^2}$ such that you don't have to know $\hat\beta_0$ to estimate…

Saber CN
- 739
- 2
- 7
- 11
40
votes
3 answers
How to determine the quality of a multiclass classifier
Given
a dataset with instances $x_i$ together with $N$ classes where every instance $x_i$ belongs exactly to one class $y_i$
a multiclass classifier
After the training and testing I basically have a table with the true class $y_i$ and the…

Gerenuk
- 1,833
- 3
- 14
- 20
40
votes
3 answers
Does statistical independence mean lack of causation?
Two random variables A and B are statistically independent. That means that in the DAG of the process: $(A {\perp\!\!\!\perp} B)$ and of course $P(A|B)=P(A)$. But does that also mean that there's no front-door from B to A?
Because then we should get…

user1834069
- 593
- 4
- 9
40
votes
4 answers
L1 regression estimates median whereas L2 regression estimates mean?
So I was asked a question on which central measures L1 (i.e., lasso) and L2 (i.e., ridge regression) estimated. The answer is L1=median and L2=mean. Is there any type of intuitive reasoning to this? Or does it have to be determined algebraically? If…

Bstat
- 791
- 1
- 7
- 5
40
votes
4 answers
When should I use a variational autoencoder as opposed to an autoencoder?
I understand the basic structure of variational autoencoder and normal (deterministic) autoencoder and the math behind them, but when and why would I prefer one type of autoencoder to the other? All I can think about is the prior distribution of…

DiveIntoML
- 1,583
- 1
- 11
- 21
40
votes
3 answers
What is the rationale of the Matérn covariance function?
The Matérn covariance function is commonly used as kernel function in Gaussian Process. It is defined like this
$$
{\displaystyle C_{\nu }(d)=\sigma ^{2}{\frac {2^{1-\nu }}{\Gamma (\nu )}}{\Bigg (}{\sqrt {2\nu }}{\frac {d}{\rho }}{\Bigg )}^{\nu…

Recuerdos de la Alhambra
- 477
- 1
- 4
- 7
40
votes
1 answer
PCA objective function: what is the connection between maximizing variance and minimizing error?
The PCA algorithm can be formulated in terms of the correlation matrix (assume the data $X$ has already been normalized and we are only considering projection onto the first PC). The objective function can be written as:
$$ \max_w (Xw)^T(Xw)\; \:…

Cam.Davidson.Pilon
- 11,476
- 5
- 47
- 75