Most Popular
1500 questions
85
votes
7 answers
What are the 'big problems' in statistics?
Mathematics has its famous Millennium Problems (and, historically, Hilbert's 23), questions that helped to shape the direction of the field.
I have little idea, though, what the Riemann Hypotheses and P vs. NP's of statistics would be.
So, what are…

raegtin
- 9,090
- 12
- 48
- 53
85
votes
5 answers
Cross-Validation in plain english?
How would you describe cross-validation to someone without a data analysis background?

Shane
- 11,961
- 17
- 71
- 89
85
votes
11 answers
What is the best way to remember the difference between sensitivity, specificity, precision, accuracy, and recall?
Despite having seen these terms 502847894789 times, I cannot for the life of me remember the difference between sensitivity, specificity, precision, accuracy, and recall. They're pretty simple concepts, but the names are highly unintuitive to me,…

Jessica
- 1,781
- 2
- 15
- 17
85
votes
14 answers
Why haven't robust (and resistant) statistics replaced classical techniques?
When solving business problems using data, it's common that at least one key assumption that under-pins classical statistics is invalid. Most of the time, no one bothers to check those assumptions so you never actually know.
For instance, that so…

doug
- 9,901
- 1
- 22
- 26
84
votes
6 answers
Why does k-means clustering algorithm use only Euclidean distance metric?
Is there a specific purpose in terms of efficiency or functionality why the k-means algorithm does not use for example cosine (dis)similarity as a distance metric, but can only use the Euclidean norm? In general, will K-means method comply and be…

curious
- 971
- 1
- 7
- 7
84
votes
9 answers
Mathematician wants the equivalent knowledge to a quality stats degree
I know people love to close duplicates so I am not asking for a reference to start learning statistics (as here).
I have a doctorate in mathematics but never learned statistics. What is the shortest route to the equivalent knowledge to a top notch…

John Robertson
- 973
- 3
- 15
- 25
84
votes
9 answers
Why is it possible to get significant F statistic (p<.001) but non-significant regressor t-tests?
In a multiple linear regression, why is it possible to have a highly significant F statistic (p<.001) but have very high p-values on all the regressor's t tests?
In my model, there are 10 regressors. One has a p-value of 0.1 and the rest are above…

Ηλίας
- 1,439
- 3
- 15
- 16
84
votes
4 answers
What're the differences between PCA and autoencoder?
Both PCA and autoencoder can do demension reduction, so what are the difference between them? In what situation I should use one over another?

RockTheStar
- 11,277
- 31
- 63
- 89
83
votes
5 answers
Mutual information versus correlation
Why and when we should use Mutual Information over statistical correlation measurements such as "Pearson", "spearman", or "Kendall's tau" ?

SaZa
- 975
- 1
- 7
- 6
83
votes
11 answers
What are disadvantages of using the lasso for variable selection for regression?
From what I know, using lasso for variable selection handles the problem of correlated inputs. Also, since it is equivalent to Least Angle Regression, it is not slow computationally. However, many people (for example people I know doing…

xuexue
- 2,098
- 2
- 16
- 11
83
votes
4 answers
How to visualize what canonical correlation analysis does (in comparison to what principal component analysis does)?
Canonical correlation analysis (CCA) is a technique related to principal component analysis (PCA). While it is easy to teach PCA or linear regression using a scatter plot (see a few thousand examples on google image search), I have not seen a…

figure
- 933
- 2
- 7
- 6
83
votes
28 answers
Examples for teaching: Correlation does not mean causation
There is an old saying: "Correlation does not mean causation". When I teach, I tend to use the following standard examples to illustrate this point:
number of storks and birth rate in Denmark;
number of priests in America and alcoholism;
in the…

csgillespie
- 11,849
- 9
- 56
- 85
83
votes
14 answers
When (if ever) is a frequentist approach substantively better than a Bayesian?
Background: I do not have an formal training in Bayesian statistics (though I am very interested in learning more), but I know enough--I think--to get the gist of why many feel as though they are preferable to Frequentist statistics. Even the…

jsakaluk
- 5,006
- 1
- 20
- 45
82
votes
7 answers
How to generate uniformly distributed points on the surface of the 3-d unit sphere?
I am wondering how to generate uniformly distributed points on the surface of the 3-d unit sphere? Also after generating those points, what is the best way to visualize and check whether they are truly uniform on the surface $x^2+y^2+z^2=1$?

Qiang Li
- 1,145
- 2
- 9
- 10
82
votes
1 answer
Help me understand Support Vector Machines
I understand the basics of what a Support Vector Machines' aim is in terms of classifying an input set into several different classes, but what I don't understand is some of the nitty-gritty details. For starters, I'm a bit confused by the use of…

rohanbk
- 1,187
- 1
- 10
- 10