Most Popular

1500 questions
40
votes
2 answers

When do Poisson and negative binomial regressions fit the same coefficients?

I’ve noticed that in R, Poisson and negative binomial (NB) regressions always seem to fit the same coefficients for categorical, but not continuous, predictors. For example, here's a regression with a categorical…
40
votes
4 answers

Recall and precision in classification

I read some definitions of recall and precision, though it is every time in the context of information retrieval. I was wondering if someone could explain this a bit more in a classification context and maybe illustrate some examples. Say for…
Olivier_s_j
  • 1,055
  • 2
  • 11
  • 25
40
votes
11 answers

Open Source statistical textbooks?

There have been a few questions about statistical textbooks, such as the question Free statistical textbooks. However, I am looking for textbooks that are Open Source, for example, having an Creative Commons license. The reason is that in course…
Egon Willighagen
  • 176
  • 1
  • 3
  • 7
40
votes
2 answers

How to find a good fit for semi-sinusoidal model in R?

I want to assume that the sea surface temperature of the Baltic Sea is the same year after year, and then describe that with a function / linear model. The idea I had was to just input year as a decimal number (or num_months/12) and get out what the…
GaRyu
  • 503
  • 1
  • 5
  • 6
40
votes
4 answers

Polynomial regression using scikit-learn

I am trying to use scikit-learn for polynomial regression. From what I read polynomial regression is a special case of linear regression. I was hopping that maybe one of scikit's generalized linear models can be parameterised to fit higher order…
40
votes
3 answers

What is the Wine/Water Paradox in Bayesian statistics, and what is its resolution?

I have just heard about the Wine/Water Paradox in Bayesian statistics, but didn't understand it very well (see Mikkelson 2004 for an introduction). Can you explain in simple terms what the paradox is (and why is it a paradox), why it matters for…
user314217
40
votes
2 answers

Purpose of the link function in generalized linear model

What is the purpose of the link function as a component of the generalized linear model? Why do we need it? Wikipedia states: It can be convenient to match the domain of the link function to the range of the distribution function's mean What's the…
Chris
  • 1,169
  • 3
  • 12
  • 16
40
votes
6 answers

Least-angle regression vs. lasso

Least-angle regression and the lasso tend to produce very similar regularization paths (identical except when a coefficient crosses zero.) They both can be efficiently fit by virtually identical algorithms. Is there ever any practical reason to…
NPE
  • 5,351
  • 5
  • 33
  • 44
40
votes
5 answers

How to derive the least square estimator for multiple linear regression?

In the simple linear regression case $y=\beta_0+\beta_1x$, you can derive the least square estimator $\hat\beta_1=\frac{\sum(x_i-\bar x)(y_i-\bar y)}{\sum(x_i-\bar x)^2}$ such that you don't have to know $\hat\beta_0$ to estimate…
40
votes
3 answers

How to determine the quality of a multiclass classifier

Given a dataset with instances $x_i$ together with $N$ classes where every instance $x_i$ belongs exactly to one class $y_i$ a multiclass classifier After the training and testing I basically have a table with the true class $y_i$ and the…
Gerenuk
  • 1,833
  • 3
  • 14
  • 20
40
votes
3 answers

Does statistical independence mean lack of causation?

Two random variables A and B are statistically independent. That means that in the DAG of the process: $(A {\perp\!\!\!\perp} B)$ and of course $P(A|B)=P(A)$. But does that also mean that there's no front-door from B to A? Because then we should get…
user1834069
  • 593
  • 4
  • 9
40
votes
4 answers

L1 regression estimates median whereas L2 regression estimates mean?

So I was asked a question on which central measures L1 (i.e., lasso) and L2 (i.e., ridge regression) estimated. The answer is L1=median and L2=mean. Is there any type of intuitive reasoning to this? Or does it have to be determined algebraically? If…
Bstat
  • 791
  • 1
  • 7
  • 5
40
votes
4 answers

When should I use a variational autoencoder as opposed to an autoencoder?

I understand the basic structure of variational autoencoder and normal (deterministic) autoencoder and the math behind them, but when and why would I prefer one type of autoencoder to the other? All I can think about is the prior distribution of…
DiveIntoML
  • 1,583
  • 1
  • 11
  • 21
40
votes
3 answers

What is the rationale of the Matérn covariance function?

The Matérn covariance function is commonly used as kernel function in Gaussian Process. It is defined like this $$ {\displaystyle C_{\nu }(d)=\sigma ^{2}{\frac {2^{1-\nu }}{\Gamma (\nu )}}{\Bigg (}{\sqrt {2\nu }}{\frac {d}{\rho }}{\Bigg )}^{\nu…
40
votes
1 answer

PCA objective function: what is the connection between maximizing variance and minimizing error?

The PCA algorithm can be formulated in terms of the correlation matrix (assume the data $X$ has already been normalized and we are only considering projection onto the first PC). The objective function can be written as: $$ \max_w (Xw)^T(Xw)\; \:…
Cam.Davidson.Pilon
  • 11,476
  • 5
  • 47
  • 75