Most Popular
1500 questions
103
votes
5 answers
Diagnostic plots for count regression
What diagnostic plots (and perhaps formal tests) do you find most informative for regressions where the outcome is a count variable?
I'm especially interested in Poisson and negative binomial models, as well as zero-inflated and hurdle counterparts…

half-pass
- 3,594
- 7
- 23
- 34
103
votes
25 answers
Locating freely available data samples
I've been working on a new method for analyzing and parsing datasets to identify and isolate subgroups of a population without foreknowledge of any subgroup's characteristics. While the method works well enough with artificial data samples (i.e.…

EAMann
- 163
- 3
- 4
- 7
103
votes
2 answers
Why do we need to normalize data before principal component analysis (PCA)?
I'm doing principal component analysis on my dataset and my professor told me that I should normalize the data before doing the analysis. Why?
What would happen If I did PCA without normalization?
Why do we normalize data in general?
Could…

jjepsuomi
- 5,207
- 11
- 34
- 47
103
votes
12 answers
What, precisely, is a confidence interval?
I know roughly and informally what a confidence interval is. However, I can't seem to wrap my head around one rather important detail: According to Wikipedia:
A confidence interval does not predict that the true value of the parameter has a…

dsimcha
- 7,375
- 7
- 32
- 29
103
votes
32 answers
What book would you recommend for non-statistician scientists?
What book would you recommend for scientists who are not statisticians?
Clear delivery is most appreciated. As well as the explanation of the appropriate techniques and methods for typical tasks: time series analysis, presentation and aggregation of…

SilentGhost
- 329
- 3
- 6
- 9
103
votes
11 answers
Explain "Curse of dimensionality" to a child
I heard many times about curse of dimensionality, but somehow I'm still unable to grasp the idea, it's all foggy.
Can anyone explain this in the most intuitive way, as you would explain it to a child, so that I (and the others confused as I am)…

Kobe-Wan Kenobi
- 2,437
- 3
- 20
- 33
102
votes
9 answers
Is this really how p-values work? Can a million research papers per year be based on pure randomness?
I'm very new to statistics, and I'm just learning to understand the basics, including $p$-values. But there is a huge question mark in my mind right now, and I kind of hope my understanding is wrong. Here's my thought process:
Aren't all researches…

n_mu_sigma
- 1,071
- 2
- 8
- 6
102
votes
4 answers
Why isn't Logistic Regression called Logistic Classification?
Since Logistic Regression is a statistical classification model dealing with categorical dependent variables, why isn't it called Logistic Classification? Shouldn't the "Regression" name be reserved to models dealing with continuous dependent…

Ismael Ghalimi
- 1,968
- 2
- 12
- 21
102
votes
2 answers
How scared should we be about convergence warnings in lme4
If we a re fitting a glmer we may get a warning that tells us the model is finding a hard time to converge...e.g.
>Warning message:
In checkConv(attr(opt, "derivs"), opt$par, ctrl = control$checkConv, :
Model failed to converge with max|grad| =…

user1322296
- 1,485
- 3
- 12
- 16
101
votes
3 answers
What are examples where a "naive bootstrap" fails?
Suppose I have a set of sample data from an unknown or complex distribution, and I want to perform some inference on a statistic $T$ of the data. My default inclination is to just generate a bunch of bootstrap samples with replacement, and calculate…

raegtin
- 9,090
- 12
- 48
- 53
101
votes
17 answers
Under what conditions does correlation imply causation?
We all know the mantra "correlation does not imply causation" which is drummed into all first year statistics students. There are some nice examples here to illustrate the idea.
But sometimes correlation does imply causation. The following example…

Rob Hyndman
- 51,928
- 23
- 126
- 178
101
votes
6 answers
How is it possible that validation loss is increasing while validation accuracy is increasing as well
I am training a simple neural network on the CIFAR10 dataset. After some time, validation loss started to increase, whereas validation accuracy is also increasing. The test loss and test accuracy continue to improve.
How is this possible? It seems…

Konstantin Solomatov
- 1,203
- 2
- 10
- 8
101
votes
3 answers
Feature selection and cross-validation
I have recently been reading a lot on this site (@Aniko, @Dikran Marsupial, @Erik) and elsewhere about the problem of overfitting occuring with cross validation - (Smialowski et al 2010 Bioinformatics, Hastie, Elements of statistical learning).
The…

BGreene
- 3,045
- 4
- 16
- 33
101
votes
10 answers
What's the difference between correlation and simple linear regression?
In particular, I am referring to the Pearson product-moment correlation coefficient.

Neil McGuigan
- 9,292
- 13
- 54
- 62
101
votes
7 answers
How does the reparameterization trick for VAEs work and why is it important?
How does the reparameterization trick for variational autoencoders (VAE) work? Is there an intuitive and easy explanation without simplifying the underlying math? And why do we need the 'trick'?

David Dao
- 2,474
- 3
- 12
- 16