Highest Voted Questions - Statistical Analysis Stack Exchange

131

votes

3 answers

What if residuals are normally distributed, but y is not?

I've got a weird question. Assume that you have a small sample where the dependent variable that you're going to analyze with a simple linear model is highly left skewed. Thus you assume that $u$ is not normally distributed, because this would…

regression residuals error normality-assumption

asked Jun 23 '11 at 06:00

MarkDollar

5,575
14
44
60

130

votes

4 answers

Differences between cross validation and bootstrapping to estimate the prediction error

I would like your thoughts about the differences between cross validation and bootstrapping to estimate the prediction error. Does one work better for small dataset sizes or large datasets?

cross-validation predictive-models bootstrap

asked Nov 14 '11 at 14:57

grant

1,491
2
11
10

129

votes

14 answers

What's wrong with XKCD's Frequentists vs. Bayesians comic?

This xkcd comic (Frequentists vs. Bayesians) makes fun of a frequentist statistician who derives an obviously wrong result. However it seems to me that his reasoning is actually correct in the sense that it follows the standard frequentist…

bayesian frequentist

asked Nov 11 '12 at 15:56

repied2

1,577
2
10
10

129

votes

6 answers

Is there an intuitive interpretation of $A^TA$ for a data matrix $A$?

For a given data matrix $A$ (with variables in columns and data points in rows), it seems like $A^TA$ plays an important role in statistics. For example, it is an important part of the analytical solution of ordinary least squares. Or, for PCA, its…

matrix covariance-matrix correlation-matrix

asked Feb 09 '12 at 08:05

Alec

2,185
4
17
14

128

votes

10 answers

Why does the Cauchy distribution have no mean?

From the distribution density function we could identify a mean (=0) for Cauchy distribution just like the graph below shows. But why do we say Cauchy distribution has no mean?

distributions mathematical-statistics mean density-function cauchy-distribution

asked Sep 10 '12 at 15:28

Flying pig

5,689
11
32
31

128

votes

2 answers

Removal of statistically significant intercept term increases $R^2$ in linear model

In a simple linear model with a single explanatory variable, $\alpha_i = \beta_0 + \beta_1 \delta_i + \epsilon_i$ I find that removing the intercept term improves the fit greatly (value of $R^2$ goes from 0.3 to 0.9). However, the intercept term…

r linear-model interpretation r-squared intercept

asked Apr 10 '12 at 11:29

Ernest A

2,062
3
17
16

128

votes

5 answers

How does a Support Vector Machine (SVM) work?

How does a Support Vector Machine (SVM) work, and what differentiates it from other linear classifiers, such as the Linear Perceptron, Linear Discriminant Analysis, or Logistic Regression? * (* I'm thinking in terms of the underlying motivations for…

machine-learning classification svm statistical-learning

asked Feb 16 '12 at 13:25

tdc

7,289
5
32
62

128

votes

28 answers

Free statistical textbooks

Are there any free statistical textbooks available?

teaching references

asked Jul 19 '10 at 23:29

csgillespie

11,849
9
56
85

126

votes

6 answers

How would you explain the difference between correlation and covariance?

Following up on this question, How would you explain covariance to someone who understands only the mean?, which addresses the issue of explaining covariance to a lay person, brought up a similar question in my mind. How would one explain to a…

correlation covariance

asked Nov 08 '11 at 16:52

pmgjones

5,543
8
36
36

125

votes

9 answers

Numerical example to understand Expectation-Maximization

I am trying to get a good grasp on the EM algorithm, to be able to implement and use it. I spent a full day reading the theory and a paper where EM is used to track an aircraft using the position information coming from a radar. Honestly, I don't…

regression probability mathematical-statistics intuition expectation-maximization

asked Oct 14 '13 at 22:37

arjsgh21

2,403
6
15
8

125

votes

7 answers

Clustering on the output of t-SNE

I've got an application where it'd be handy to cluster a noisy dataset before looking for subgroup effects within the clusters. I first looked at PCA, but it takes ~30 components to get to 90% of the variability, so clustering on just a couple of…

clustering interpretation k-means tsne

asked Feb 23 '17 at 01:39

generic_user

11,981
8
40
63

124

votes

7 answers

How to choose between t-test or non-parametric test e.g. Wilcoxon in small samples

Certain hypotheses can be tested using Student's t-test (maybe using Welch's correction for unequal variances in the two-sample case), or by a non-parametric test like the Wilcoxon paired signed rank test, the Wilcoxon-Mann-Whitney U test, or the…

hypothesis-testing t-test nonparametric small-sample wilcoxon-mann-whitney-test

asked Oct 29 '14 at 03:02

Silverfish

20,678
23
92
180

123

votes

20 answers

Most interesting statistical paradoxes

Because I find them fascinating, I'd like to hear what folks in this community find as the most interesting statistical paradox and why.

paradox

asked Feb 28 '12 at 04:08

Nick

3,327
6
28
24

122

votes

8 answers

Bias and variance in leave-one-out vs K-fold cross validation

How do different cross-validation methods compare in terms of model variance and bias? My question is partly motivated by this thread: Optimal number of folds in $K$-fold cross-validation: is leave-one-out CV always the best choice?. The answer…

machine-learning variance cross-validation bias bias-variance-tradeoff

asked Jun 14 '13 at 20:14

Amelio Vazquez-Reina

17,546
26
74
110

121

votes

4 answers

Is it possible to have a pair of Gaussian random variables for which the joint distribution is not Gaussian?

Somebody asked me this question in a job interview and I replied that their joint distribution is always Gaussian. I thought that I can always write a bivariate Gaussian with their means and variance and covariances. I am wondering if there can be a…

normal-distribution multivariate-analysis copula bivariate

asked Jun 09 '12 at 22:31

MarkSAlen

2,559
5
24
25

Most Popular