Most Popular
1500 questions
131
votes
3 answers
What if residuals are normally distributed, but y is not?
I've got a weird question. Assume that you have a small sample where the dependent variable that you're going to analyze with a simple linear model is highly left skewed. Thus you assume that $u$ is not normally distributed, because this would…

MarkDollar
- 5,575
- 14
- 44
- 60
130
votes
4 answers
Differences between cross validation and bootstrapping to estimate the prediction error
I would like your thoughts about the differences between cross validation and bootstrapping to estimate the prediction error.
Does one work better for small dataset sizes or large datasets?

grant
- 1,491
- 2
- 11
- 10
129
votes
14 answers
What's wrong with XKCD's Frequentists vs. Bayesians comic?
This xkcd comic (Frequentists vs. Bayesians) makes fun of a frequentist statistician who derives an obviously wrong result.
However it seems to me that his reasoning is actually correct in the sense that it follows the standard frequentist…

repied2
- 1,577
- 2
- 10
- 10
129
votes
6 answers
Is there an intuitive interpretation of $A^TA$ for a data matrix $A$?
For a given data matrix $A$ (with variables in columns and data points in rows), it seems like $A^TA$ plays an important role in statistics. For example, it is an important part of the analytical solution of ordinary least squares. Or, for PCA, its…

Alec
- 2,185
- 4
- 17
- 14
128
votes
10 answers
Why does the Cauchy distribution have no mean?
From the distribution density function we could identify a mean (=0) for Cauchy distribution just like the graph below shows. But why do we say Cauchy distribution has no mean?

Flying pig
- 5,689
- 11
- 32
- 31
128
votes
2 answers
Removal of statistically significant intercept term increases $R^2$ in linear model
In a simple linear model with a single explanatory variable,
$\alpha_i = \beta_0 + \beta_1 \delta_i + \epsilon_i$
I find that removing the intercept term improves the fit greatly (value of $R^2$ goes from 0.3 to 0.9). However, the intercept term…

Ernest A
- 2,062
- 3
- 17
- 16
128
votes
5 answers
How does a Support Vector Machine (SVM) work?
How does a Support Vector Machine (SVM) work, and what differentiates it from other linear classifiers, such as the Linear Perceptron, Linear Discriminant Analysis, or Logistic Regression? *
(* I'm thinking in terms of the underlying motivations for…

tdc
- 7,289
- 5
- 32
- 62
128
votes
28 answers
Free statistical textbooks
Are there any free statistical textbooks available?

csgillespie
- 11,849
- 9
- 56
- 85
126
votes
6 answers
How would you explain the difference between correlation and covariance?
Following up on this question, How would you explain covariance to someone who understands only the mean?, which addresses the issue of explaining covariance to a lay person, brought up a similar question in my mind.
How would one explain to a…

pmgjones
- 5,543
- 8
- 36
- 36
125
votes
9 answers
Numerical example to understand Expectation-Maximization
I am trying to get a good grasp on the EM algorithm, to be able to implement and use it. I spent a full day reading the theory and a paper where EM is used to track an aircraft using the position information coming from a radar. Honestly, I don't…

arjsgh21
- 2,403
- 6
- 15
- 8
125
votes
7 answers
Clustering on the output of t-SNE
I've got an application where it'd be handy to cluster a noisy dataset before looking for subgroup effects within the clusters. I first looked at PCA, but it takes ~30 components to get to 90% of the variability, so clustering on just a couple of…

generic_user
- 11,981
- 8
- 40
- 63
124
votes
7 answers
How to choose between t-test or non-parametric test e.g. Wilcoxon in small samples
Certain hypotheses can be tested using Student's t-test (maybe using Welch's correction for unequal variances in the two-sample case), or by a non-parametric test like the Wilcoxon paired signed rank test, the Wilcoxon-Mann-Whitney U test, or the…

Silverfish
- 20,678
- 23
- 92
- 180
123
votes
20 answers
Most interesting statistical paradoxes
Because I find them fascinating, I'd like to hear what folks in this community find as the most interesting statistical paradox and why.

Nick
- 3,327
- 6
- 28
- 24
122
votes
8 answers
Bias and variance in leave-one-out vs K-fold cross validation
How do different cross-validation methods compare in terms of model variance and bias?
My question is partly motivated by this thread: Optimal number of folds in $K$-fold cross-validation: is leave-one-out CV always the best choice?. The answer…

Amelio Vazquez-Reina
- 17,546
- 26
- 74
- 110
121
votes
4 answers
Is it possible to have a pair of Gaussian random variables for which the joint distribution is not Gaussian?
Somebody asked me this question in a job interview and I replied that their joint distribution is always Gaussian. I thought that I can always write a bivariate Gaussian with their means and variance and covariances. I am wondering if there can be a…

MarkSAlen
- 2,559
- 5
- 24
- 25