Highest Voted Questions - Statistical Analysis Stack Exchange

103

votes

5 answers

Diagnostic plots for count regression

What diagnostic plots (and perhaps formal tests) do you find most informative for regressions where the outcome is a count variable? I'm especially interested in Poisson and negative binomial models, as well as zero-inflated and hurdle counterparts…

generalized-linear-model residuals negative-binomial-distribution zero-inflation poisson-regression

asked Sep 20 '13 at 01:17

half-pass

3,594
7
23
34

103

votes

25 answers

Locating freely available data samples

I've been working on a new method for analyzing and parsing datasets to identify and isolate subgroups of a population without foreknowledge of any subgroup's characteristics. While the method works well enough with artificial data samples (i.e.…

dataset sample population teaching

asked Jul 19 '10 at 19:15

EAMann

163
3
4
7

103

votes

2 answers

Why do we need to normalize data before principal component analysis (PCA)?

I'm doing principal component analysis on my dataset and my professor told me that I should normalize the data before doing the analysis. Why? What would happen If I did PCA without normalization? Why do we normalize data in general? Could…

pca normalization dimensionality-reduction

asked Sep 04 '13 at 08:12

jjepsuomi

5,207
11
34
47

103

votes

12 answers

What, precisely, is a confidence interval?

I know roughly and informally what a confidence interval is. However, I can't seem to wrap my head around one rather important detail: According to Wikipedia: A confidence interval does not predict that the true value of the parameter has a…

confidence-interval definition

asked Jan 28 '11 at 00:23

dsimcha

7,375
7
32
29

103

votes

32 answers

What book would you recommend for non-statistician scientists?

What book would you recommend for scientists who are not statisticians? Clear delivery is most appreciated. As well as the explanation of the appropriate techniques and methods for typical tasks: time series analysis, presentation and aggregation of…

references

asked Jul 21 '10 at 15:01

SilentGhost

329
3
6
9

103

votes

11 answers

Explain "Curse of dimensionality" to a child

I heard many times about curse of dimensionality, but somehow I'm still unable to grasp the idea, it's all foggy. Can anyone explain this in the most intuitive way, as you would explain it to a child, so that I (and the others confused as I am)…

machine-learning dimensionality-reduction high-dimensional

asked Aug 28 '15 at 09:11

Kobe-Wan Kenobi

2,437
3
20
33

102

votes

9 answers

Is this really how p-values work? Can a million research papers per year be based on pure randomness?

I'm very new to statistics, and I'm just learning to understand the basics, including $p$-values. But there is a huge question mark in my mind right now, and I kind of hope my understanding is wrong. Here's my thought process: Aren't all researches…

hypothesis-testing statistical-significance p-value

asked Jul 19 '15 at 10:25

n_mu_sigma

1,071
2
8
6

102

votes

4 answers

Why isn't Logistic Regression called Logistic Classification?

Since Logistic Regression is a statistical classification model dealing with categorical dependent variables, why isn't it called Logistic Classification? Shouldn't the "Regression" name be reserved to models dealing with continuous dependent…

regression machine-learning logistic classification terminology

asked Dec 07 '14 at 18:44

Ismael Ghalimi

1,968
2
12
21

102

votes

2 answers

How scared should we be about convergence warnings in lme4

If we a re fitting a glmer we may get a warning that tells us the model is finding a hard time to converge...e.g. >Warning message: In checkConv(attr(opt, "derivs"), opt$par, ctrl = control$checkConv, : Model failed to converge with max|grad| =…

r mixed-model lme4-nlme

asked Jul 30 '14 at 15:02

user1322296

1,485
3
12
16

101

votes

3 answers

What are examples where a "naive bootstrap" fails?

Suppose I have a set of sample data from an unknown or complex distribution, and I want to perform some inference on a statistic $T$ of the data. My default inclination is to just generate a bunch of bootstrap samples with replacement, and calculate…

hypothesis-testing confidence-interval bootstrap

asked Apr 18 '11 at 05:44

raegtin

9,090
12
48
53

101

votes

17 answers

Under what conditions does correlation imply causation?

We all know the mantra "correlation does not imply causation" which is drummed into all first year statistics students. There are some nice examples here to illustrate the idea. But sometimes correlation does imply causation. The following example…

correlation causality

asked Jul 23 '10 at 01:56

Rob Hyndman

51,928
23
126
178

101

votes

6 answers

How is it possible that validation loss is increasing while validation accuracy is increasing as well

I am training a simple neural network on the CIFAR10 dataset. After some time, validation loss started to increase, whereas validation accuracy is also increasing. The test loss and test accuracy continue to improve. How is this possible? It seems…

neural-networks deep-learning conv-neural-network overfitting

asked May 28 '17 at 14:13

Konstantin Solomatov

1,203
2
10
8

101

votes

3 answers

Feature selection and cross-validation

I have recently been reading a lot on this site (@Aniko, @Dikran Marsupial, @Erik) and elsewhere about the problem of overfitting occuring with cross validation - (Smialowski et al 2010 Bioinformatics, Hastie, Elements of statistical learning). The…

cross-validation feature-selection

asked May 04 '12 at 10:09

BGreene

3,045
4
16
33

101

votes

10 answers

What's the difference between correlation and simple linear regression?

In particular, I am referring to the Pearson product-moment correlation coefficient.

correlation regression

asked Aug 25 '10 at 23:53

Neil McGuigan

9,292
13
54
62

101

votes

7 answers

How does the reparameterization trick for VAEs work and why is it important?

How does the reparameterization trick for variational autoencoders (VAE) work? Is there an intuitive and easy explanation without simplifying the underlying math? And why do we need the 'trick'?

mathematical-statistics autoencoders variational-bayes generative-models

asked Mar 02 '16 at 20:10

David Dao

2,474
3
12
16

Most Popular