Highest Voted Questions - Statistical Analysis Stack Exchange

95

votes

4 answers

Does the variance of a sum equal the sum of the variances?

Is it (always) true that $$\mathrm{Var}\left(\sum\limits_{i=1}^m{X_i}\right) = \sum\limits_{i=1}^m{\mathrm{Var}(X_i)} \>?$$

variance

asked Jun 26 '12 at 22:44

Abe

3,561
7
27
45

95

votes

6 answers

Convergence in probability vs. almost sure convergence

I've never really grokked the difference between these two measures of convergence. (Or, in fact, any of the different types of convergence, but I mention these two in particular because of the Weak and Strong Laws of Large Numbers.) Sure, I can…

probability random-variable

asked Aug 31 '10 at 03:57

raegtin

9,090
12
48
53

95

votes

2 answers

How much do we know about p-hacking "in the wild"?

The phrase p-hacking (also: "data dredging", "snooping" or "fishing") refers to various kinds of statistical malpractice in which results become artificially statistically significant. There are many ways to procure a "more significant" result,…

hypothesis-testing statistical-significance p-value model-selection reproducible-research

asked Mar 09 '16 at 13:14

Silverfish

20,678
23
92
180

95

votes

5 answers

How to calculate Area Under the Curve (AUC), or the c-statistic, by hand

I am interested in calculating area under the curve (AUC), or the c-statistic, by hand for a binary logistic regression model. For example, in the validation dataset, I have the true value for the dependent variable, retention (1 = retained; 0 = not…

regression logistic classification roc auc

asked Apr 09 '15 at 17:53

Matt Reichenbach

3,404
6
25
43

95

votes

5 answers

Loadings vs eigenvectors in PCA: when to use one or another?

In principal component analysis (PCA), we get eigenvectors (unit vectors) and eigenvalues. Now, let us define loadings as $$\text{Loadings} = \text{Eigenvectors} \cdot \sqrt{\text{Eigenvalues}}.$$ I know that eigenvectors are just directions and…

pca

asked Mar 29 '15 at 09:23

user2696565

1,239
1
10
14

95

votes

8 answers

If mean is so sensitive, why use it in the first place?

It is a known fact that median is resistant to outliers. If that is the case, when and why would we use the mean in the first place? One thing I can think of perhaps is to understand the presence of outliers i.e. if the median is far from the mean,…

mathematical-statistics mean median

asked Aug 13 '11 at 07:50

Legend

4,232
7
37
50

95

votes

6 answers

What is the difference between Multiclass and Multilabel Problem

What is the difference between a multiclass problem and a multilabel problem?

classification clustering terminology multi-class multilabel

asked Jun 13 '11 at 05:35

Learner

4,007
11
37
39

94

votes

12 answers

Who Are The Bayesians?

As one becomes interested in statistics, the dichotomy "Frequentist" vs. "Bayesian" soon becomes commonplace (and who hasn't read Nate Silver's The Signal and the Noise, anyway?). In talks and introductory courses, the point of view is…

bayesian mathematical-statistics inference bayes frequentist

asked Aug 13 '15 at 18:11

Antoni Parellada

23,430
15
100
197

94

votes

6 answers

Essential data checking tests

In my job role I often work with other people's datasets, non-experts bring me clinical data and I help them to summarise it and perform statistical tests. The problem I am having is that the datasets I am brought are almost always riddled with…

dataset outliers checking

asked Jun 07 '11 at 08:19

Chris Beeley

5,465
5
36
40

93

votes

2 answers

When to use regularization methods for regression?

In what circumstances should one consider using regularization methods (ridge, lasso or least angles regression) instead of OLS? In case this helps steer the discussion, my main interest is improving predictive accuracy.

regression least-squares lasso ridge-regression fused-lasso

asked Nov 06 '10 at 17:53

NPE

5,351
5
33
44

93

votes

1 answer

What is an ablation study? And is there a systematic way to perform it?

What is an ablation study? And is there a systematic way to perform it? For example, I have $n$ predictors in a linear regression which I will call as my model. How will I perform an ablation study to this? What metrics should I use? A…

regression machine-learning neural-networks

asked Dec 03 '18 at 09:09

cgo

7,445
10
42
61

93

votes

7 answers

The Book of Why by Judea Pearl: Why is he bashing statistics?

I am reading The Book of Why by Judea Pearl, and it is getting under my skin1. Specifically, it appears to me that he is unconditionally bashing "classical" statistics by putting up a straw man argument that statistics is never, ever able to…

causality

asked Nov 14 '18 at 09:22

January

6,999
1
32
55

93

votes

4 answers

How does the correlation coefficient differ from regression slope?

I would have expected the correlation coefficient to be the same as a regression slope (beta), however having just compared the two, they are different. How do they differ - what different information do they give?

regression correlation

asked Jul 17 '12 at 14:43

luciano

12,197
30
87
119

93

votes

9 answers

Are there any examples where Bayesian credible intervals are obviously inferior to frequentist confidence intervals

A recent question on the difference between confidence and credible intervals led me to start re-reading Edwin Jaynes' article on that topic: Jaynes, E. T., 1976. `Confidence Intervals vs Bayesian Intervals,' in Foundations of Probability Theory,…

bayesian confidence-interval

asked Sep 03 '10 at 18:23

Dikran Marsupial

46,962
5
121
178

93

votes

6 answers

Principled way of collapsing categorical variables with many levels?

What techniques are available for collapsing (or pooling) many categories to a few, for the purpose of using them as an input (predictor) in a statistical model? Consider a variable like college student major (discipline chosen by an undergraduate…

regression categorical-data feature-engineering many-categories faq

asked Apr 17 '15 at 13:31

shadowtalker

11,395
3
49
109

Most Popular