Most Popular

1500 questions
370
votes
80 answers

What is your favorite "data analysis" cartoon?

Data analysis cartoons can be useful for many reasons: they help communicate; they show that quantitative people have a sense of humor too; they can instigate good teaching moments; and they can help us remember important principles and…
Shane
  • 11,961
  • 17
  • 71
  • 89
362
votes
7 answers

How to normalize data to 0-1 range?

I am lost in normalizing, could anyone guide me please. I have a minimum and maximum values, say -23.89 and 7.54990767, respectively. If I get a value of 5.6878 how can I scale this value on a scale of 0 to 1.
Angelo
  • 3,989
  • 3
  • 16
  • 12
355
votes
16 answers

Is normality testing 'essentially useless'?

A former colleague once argued to me as follows: We usually apply normality tests to the results of processes that, under the null, generate random variables that are only asymptotically or nearly normal (with the 'asymptotically' part…
shabbychef
  • 10,388
  • 7
  • 50
  • 93
354
votes
12 answers

Difference between logit and probit models

What is the difference between Logit and Probit model? I'm more interested here in knowing when to use logistic regression, and when to use Probit. If there is any literature which defines it using R, that would be helpful as well.
Beta
  • 5,784
  • 9
  • 33
  • 44
331
votes
5 answers

What is the trade-off between batch size and number of iterations to train a neural network?

When training a neural network, what difference does it make to set: batch size to $a$ and number of iterations to $b$ vs. batch size to $c$ and number of iterations to $d$ where $ ab = cd $? To put it otherwise, assuming that we train the neural…
Franck Dernoncourt
  • 42,093
  • 30
  • 155
  • 271
328
votes
8 answers

Why is Euclidean distance not a good metric in high dimensions?

I read that 'Euclidean distance is not a good distance in high dimensions'. I guess this statement has something to do with the curse of dimensionality, but what exactly? Besides, what is 'high dimensions'? I have been applying hierarchical…
315
votes
13 answers

How to understand degrees of freedom?

From Wikipedia, there are three interpretations of the degrees of freedom of a statistic: In statistics, the number of degrees of freedom is the number of values in the final calculation of a statistic that are free to vary. Estimates of…
Tim
  • 1
  • 29
  • 102
  • 189
296
votes
8 answers

What should I do when my neural network doesn't learn?

I'm training a neural network but the training loss doesn't decrease. How can I fix this? I'm not asking about overfitting or regularization. I'm asking about how to solve the problem where my network's performance doesn't improve on the training…
Sycorax
  • 76,417
  • 20
  • 189
  • 313
292
votes
10 answers

What's the difference between a confidence interval and a credible interval?

Joris and Srikant's exchange here got me wondering (again) if my internal explanations for the difference between confidence intervals and credible intervals were the correct ones. How you would explain the difference?
287
votes
8 answers

Bagging, boosting and stacking in machine learning

What's the similarities and differences between these 3 methods: Bagging, Boosting, Stacking? Which is the best one? And why? Can you give me an example for each?
280
votes
16 answers

What is the meaning of p values and t values in statistical tests?

After taking a statistics course and then trying to help fellow students, I noticed one subject that inspires much head-desk banging is interpreting the results of statistical hypothesis tests. It seems that students easily learn how to perform the…
280
votes
16 answers

Why does a 95% Confidence Interval (CI) not imply a 95% chance of containing the mean?

It seems that through various related questions here, there is consensus that the "95%" part of what we call a "95% confidence interval" refers to the fact that if we were to exactly replicate our sampling and CI-computation procedures many times,…
Mike Lawrence
  • 12,691
  • 8
  • 40
  • 65
275
votes
151 answers

Famous statistical quotations

What is your favorite statistical quote? This is community wiki, so please one quote per answer.
robin girard
  • 6,335
  • 6
  • 46
  • 60
275
votes
6 answers

What does AUC stand for and what is it?

Searched high and low and have not been able to find out what AUC, as in related to prediction, stands for or means.
josh
  • 3,119
  • 4
  • 12
  • 14
271
votes
2 answers

Interpretation of R's lm() output

The help pages in R assume I know what those numbers mean, but I don't. I'm trying to really intuitively understand every number here. I will just post the output and comment on what I found out. There might (will) be mistakes, as I'll just write…
Alexander Engelhardt
  • 4,161
  • 3
  • 21
  • 25