Most Popular
1500 questions
95
votes
4 answers
Does the variance of a sum equal the sum of the variances?
Is it (always) true that
$$\mathrm{Var}\left(\sum\limits_{i=1}^m{X_i}\right) = \sum\limits_{i=1}^m{\mathrm{Var}(X_i)} \>?$$

Abe
- 3,561
- 7
- 27
- 45
95
votes
6 answers
Convergence in probability vs. almost sure convergence
I've never really grokked the difference between these two measures of convergence. (Or, in fact, any of the different types of convergence, but I mention these two in particular because of the Weak and Strong Laws of Large Numbers.)
Sure, I can…

raegtin
- 9,090
- 12
- 48
- 53
95
votes
2 answers
How much do we know about p-hacking "in the wild"?
The phrase p-hacking (also: "data dredging", "snooping" or "fishing") refers to various kinds of statistical malpractice in which results become artificially statistically significant. There are many ways to procure a "more significant" result,…

Silverfish
- 20,678
- 23
- 92
- 180
95
votes
5 answers
How to calculate Area Under the Curve (AUC), or the c-statistic, by hand
I am interested in calculating area under the curve (AUC), or the c-statistic, by hand for a binary logistic regression model.
For example, in the validation dataset, I have the true value for the dependent variable, retention (1 = retained; 0 = not…

Matt Reichenbach
- 3,404
- 6
- 25
- 43
95
votes
5 answers
Loadings vs eigenvectors in PCA: when to use one or another?
In principal component analysis (PCA), we get eigenvectors (unit vectors) and eigenvalues. Now, let us define loadings as $$\text{Loadings} = \text{Eigenvectors} \cdot \sqrt{\text{Eigenvalues}}.$$
I know that eigenvectors are just directions and…

user2696565
- 1,239
- 1
- 10
- 14
95
votes
8 answers
If mean is so sensitive, why use it in the first place?
It is a known fact that median is resistant to outliers. If that is the case, when and why would we use the mean in the first place?
One thing I can think of perhaps is to understand the presence of outliers i.e. if the median is far from the mean,…

Legend
- 4,232
- 7
- 37
- 50
95
votes
6 answers
What is the difference between Multiclass and Multilabel Problem
What is the difference between a multiclass problem and a multilabel problem?

Learner
- 4,007
- 11
- 37
- 39
94
votes
12 answers
Who Are The Bayesians?
As one becomes interested in statistics, the dichotomy "Frequentist" vs. "Bayesian" soon becomes commonplace (and who hasn't read Nate Silver's The Signal and the Noise, anyway?). In talks and introductory courses, the point of view is…

Antoni Parellada
- 23,430
- 15
- 100
- 197
94
votes
6 answers
Essential data checking tests
In my job role I often work with other people's datasets, non-experts bring me clinical data and I help them to summarise it and perform statistical tests.
The problem I am having is that the datasets I am brought are almost always riddled with…

Chris Beeley
- 5,465
- 5
- 36
- 40
93
votes
2 answers
When to use regularization methods for regression?
In what circumstances should one consider using regularization methods (ridge, lasso or least angles regression) instead of OLS?
In case this helps steer the discussion, my main interest is improving predictive accuracy.

NPE
- 5,351
- 5
- 33
- 44
93
votes
1 answer
What is an ablation study? And is there a systematic way to perform it?
What is an ablation study? And is there a systematic way to perform it? For example, I have $n$ predictors in a linear regression which I will call as my model.
How will I perform an ablation study to this? What metrics should I use?
A…

cgo
- 7,445
- 10
- 42
- 61
93
votes
7 answers
The Book of Why by Judea Pearl: Why is he bashing statistics?
I am reading The Book of Why by Judea Pearl, and it is getting under my skin1. Specifically, it appears to me that he is unconditionally bashing "classical" statistics by putting up a straw man argument that statistics is never, ever able to…

January
- 6,999
- 1
- 32
- 55
93
votes
4 answers
How does the correlation coefficient differ from regression slope?
I would have expected the correlation coefficient to be the same as a regression slope (beta), however having just compared the two, they are different. How do they differ - what different information do they give?

luciano
- 12,197
- 30
- 87
- 119
93
votes
9 answers
Are there any examples where Bayesian credible intervals are obviously inferior to frequentist confidence intervals
A recent question on the difference between confidence and credible intervals led me to start re-reading Edwin Jaynes' article on that topic:
Jaynes, E. T., 1976. `Confidence Intervals vs Bayesian Intervals,' in Foundations of Probability Theory,…

Dikran Marsupial
- 46,962
- 5
- 121
- 178
93
votes
6 answers
Principled way of collapsing categorical variables with many levels?
What techniques are available for collapsing (or pooling) many categories to a few, for the purpose of using them as an input (predictor) in a statistical model?
Consider a variable like college student major (discipline chosen by an undergraduate…

shadowtalker
- 11,395
- 3
- 49
- 109