Most Popular
1500 questions
50
votes
3 answers
Suppression effect in regression: definition and visual explanation/depiction
What is a suppressor variable in multiple regression and what might be the ways to display suppression effect visually (its mechanics or its evidence in results)? I'd like to invite everybody who has a thought, to share.

ttnphns
- 51,648
- 40
- 253
- 462
50
votes
3 answers
Why does correlation matrix need to be positive semi-definite and what does it mean to be or not to be positive semi-definite?
I have been researching the meaning of positive semi-definite property of correlation or covariance matrices.
I am looking for any information on
Definition of positive semi-definiteness;
Its important properties, practical implications;
The…

Melon
- 509
- 1
- 5
- 4
50
votes
3 answers
How can I calculate $\int^{\infty}_{-\infty}\Phi\left(\frac{w-a}{b}\right)\phi(w)\,\mathrm dw$
Suppose $\phi(\cdot)$ and $\Phi(\cdot)$ are density function and distribution function of the standard normal distribution.
How can one calculate the integral:
$$\int^{\infty}_{-\infty}\Phi\left(\frac{w-a}{b}\right)\phi(w)\,\mathrm dw$$

hadisanji
- 793
- 6
- 7
50
votes
4 answers
Normality of dependent variable = normality of residuals?
This issue seems to rear its ugly head all the time, and I'm trying to decapitate it for my own understanding of statistics (and sanity!).
The assumptions of general linear models (t-test, ANOVA, regression etc.) include the "assumption of…

DeanP
- 841
- 2
- 11
- 11
50
votes
2 answers
Random forest assumptions
I am kind of new to random forest so I am still struggling with some basic concepts.
In linear regression, we assume independent observations, constant variance…
What are the basic assumptions/hypothesis we make, when we use random forest? …

user1848018
- 745
- 1
- 7
- 10
50
votes
2 answers
Regression: Transforming Variables
When transforming variables, do you have to use all of the same transformation? For example, can I pick and choose differently transformed variables, as in:
Let, $x_1,x_2,x_3$ be age, length of employment, length of residence, and income.
Y =…

Brandon Bertelsen
- 6,672
- 9
- 35
- 46
50
votes
6 answers
Debunking wrong CLT statement
The central limit theorem (CLT) gives some nice properties about converging to a normal distribution. Prior to studying statistics formally, I was under the extremely wrong impression that the CLT said that data approached normality.
I now find…

Dave
- 28,473
- 4
- 52
- 104
50
votes
15 answers
A smaller dataset is better: Is this statement false in statistics? How to refute it properly?
Dr. Raoult, who promotes Hydroxychloroquine, has some really intriguing statement about statistics in the biomedical field:
It's counterintuitive, but the smaller the sample size of a clinical test, the more significant its results are. The…

Stephane Rolland
- 654
- 6
- 13
50
votes
6 answers
Motivation for Kolmogorov distance between distributions
There are many ways to measure how similar two probability distributions are. Among methods which are popular (in different circles) are:
the Kolmogorov distance: the sup-distance between the distribution functions;
the Kantorovich-Rubinstein…

Mark Meckes
- 2,916
- 3
- 19
- 18
50
votes
5 answers
What is the difference between the forward-backward and Viterbi algorithms?
I want to know what the differences between the forward-backward algorithm and the Viterbi algorithm for inference in hidden Markov models (HMM) are.

user34790
- 6,049
- 6
- 42
- 64
50
votes
7 answers
Why would someone use a Bayesian approach with a 'noninformative' improper prior instead of the classical approach?
If the interest is merely estimating the parameters of a model (pointwise and/or interval estimation) and the prior information is not reliable, weak, (I know this is a bit vague but I am trying to establish an scenario where the choice of a prior…
user10525
50
votes
3 answers
Why do we only see $L_1$ and $L_2$ regularization but not other norms?
I am just curious why there are usually only $L_1$ and $L_2$ norms regularization. Are there proofs of why these are better?

user10024395
- 1
- 2
- 11
- 20
50
votes
3 answers
What is the root cause of the class imbalance problem?
I've been thinking a lot about the "class imbalance problem" in machine/statistical learning lately, and am drawing ever deeper into a feeling that I just don't understand what is going on.
First let me define (or attempt to) define my terms:
The…

Matthew Drury
- 33,314
- 2
- 101
- 132
50
votes
1 answer
How does centering the data get rid of the intercept in regression and PCA?
I keep reading about instances where we center the data (e.g., with regularization or PCA) in order to remove the intercept (as mentioned in this question). I know it's simple, but I'm having a hard time intuitively understanding this. Could someone…

Alec
- 2,185
- 4
- 17
- 14
50
votes
4 answers
When is a biased estimator preferable to unbiased one?
It's obvious many times why one prefers an unbiased estimator. But, are there any circumstances under which we might actually prefer a biased estimator over an unbiased one?

Stan Shunpike
- 3,623
- 2
- 27
- 36