Most Popular
1500 questions
370
votes
80 answers
What is your favorite "data analysis" cartoon?
Data analysis cartoons can be useful for many reasons: they help communicate; they show that quantitative people have a sense of humor too; they can instigate good teaching moments; and they can help us remember important principles and…

Shane
- 11,961
- 17
- 71
- 89
362
votes
7 answers
How to normalize data to 0-1 range?
I am lost in normalizing, could anyone guide me please.
I have a minimum and maximum values, say -23.89 and 7.54990767, respectively.
If I get a value of 5.6878 how can I scale this value on a scale of 0 to 1.

Angelo
- 3,989
- 3
- 16
- 12
355
votes
16 answers
Is normality testing 'essentially useless'?
A former colleague once argued to me as follows:
We usually apply normality tests to the results of processes that,
under the null, generate random variables that are only
asymptotically or nearly normal (with the 'asymptotically' part…

shabbychef
- 10,388
- 7
- 50
- 93
354
votes
12 answers
Difference between logit and probit models
What is the difference between Logit and Probit model?
I'm more interested here in knowing when to use logistic regression, and when to use Probit.
If there is any literature which defines it using R, that would be helpful as well.

Beta
- 5,784
- 9
- 33
- 44
331
votes
5 answers
What is the trade-off between batch size and number of iterations to train a neural network?
When training a neural network, what difference does it make to set:
batch size to $a$ and number of iterations to $b$
vs. batch size to $c$ and number of iterations to $d$
where $ ab = cd $?
To put it otherwise, assuming that we train the neural…

Franck Dernoncourt
- 42,093
- 30
- 155
- 271
328
votes
8 answers
Why is Euclidean distance not a good metric in high dimensions?
I read that 'Euclidean distance is not a good distance in high dimensions'. I guess this statement has something to do with the curse of dimensionality, but what exactly? Besides, what is 'high dimensions'? I have been applying hierarchical…

teaLeef
- 3,497
- 3
- 12
- 11
315
votes
13 answers
How to understand degrees of freedom?
From Wikipedia, there are three interpretations of the degrees of freedom of a statistic:
In statistics, the number of degrees of freedom is the number of
values in the final calculation of a statistic that are free to vary.
Estimates of…

Tim
- 1
- 29
- 102
- 189
296
votes
8 answers
What should I do when my neural network doesn't learn?
I'm training a neural network but the training loss doesn't decrease. How can I fix this?
I'm not asking about overfitting or regularization. I'm asking about how to solve the problem where my network's performance doesn't improve on the training…

Sycorax
- 76,417
- 20
- 189
- 313
292
votes
10 answers
What's the difference between a confidence interval and a credible interval?
Joris and Srikant's exchange here got me wondering (again) if my internal explanations for the difference between confidence intervals and credible intervals were the correct ones. How you would explain the difference?

Matt Parker
- 5,597
- 5
- 26
- 37
287
votes
8 answers
Bagging, boosting and stacking in machine learning
What's the similarities and differences between these 3 methods:
Bagging,
Boosting,
Stacking?
Which is the best one? And why?
Can you give me an example for each?

Bucsa Lucian
- 2,979
- 3
- 13
- 3
280
votes
16 answers
What is the meaning of p values and t values in statistical tests?
After taking a statistics course and then trying to help fellow students, I noticed one subject that inspires much head-desk banging is interpreting the results of statistical hypothesis tests. It seems that students easily learn how to perform the…

Sharpie
- 4,126
- 5
- 21
- 18
280
votes
16 answers
Why does a 95% Confidence Interval (CI) not imply a 95% chance of containing the mean?
It seems that through various related questions here, there is consensus that the "95%" part of what we call a "95% confidence interval" refers to the fact that if we were to exactly replicate our sampling and CI-computation procedures many times,…

Mike Lawrence
- 12,691
- 8
- 40
- 65
275
votes
151 answers
Famous statistical quotations
What is your favorite statistical quote?
This is community wiki, so please one quote per answer.

robin girard
- 6,335
- 6
- 46
- 60
275
votes
6 answers
What does AUC stand for and what is it?
Searched high and low and have not been able to find out what AUC, as in related to prediction, stands for or means.

josh
- 3,119
- 4
- 12
- 14
271
votes
2 answers
Interpretation of R's lm() output
The help pages in R assume I know what those numbers mean, but I don't.
I'm trying to really intuitively understand every number here. I will just post the output and comment on what I found out. There might (will) be mistakes, as I'll just write…

Alexander Engelhardt
- 4,161
- 3
- 21
- 25