Highest Voted Questions - Statistical Analysis Stack Exchange

121

votes

6 answers

Where should I place dropout layers in a neural network?

Is there any general guidelines on where to place dropout layers in a neural network?

neural-networks dropout

asked Oct 14 '16 at 20:23

Franck Dernoncourt

42,093
30
155
271

120

votes

5 answers

What are the main differences between K-means and K-nearest neighbours?

I know that k-means is unsupervised and is used for clustering etc and that k-NN is supervised. But I wanted to know concrete differences between the two?

machine-learning k-means k-nearest-neighbour

asked Apr 18 '13 at 17:15

nsc010

1,467
2
10
9

120

votes

6 answers

Why are neural networks becoming deeper, but not wider?

In recent years, convolutional neural networks (or perhaps deep neural networks in general) have become deeper and deeper, with state-of-the-art networks going from 7 layers (AlexNet) to 1000 layers (Residual Nets) in the space of 4 years. The…

machine-learning classification neural-networks deep-learning conv-neural-network

asked Jul 09 '16 at 06:35

Karnivaurus

5,909
10
36
52

119

votes

4 answers

When to use gamma GLMs?

The gamma distribution can take on a pretty wide range of shapes, and given the link between the mean and the variance through its two parameters, it seems suited to dealing with heteroskedasticity in non-negative data, in a way that log-transformed…

generalized-linear-model gamma-distribution

asked Aug 16 '13 at 08:13

generic_user

11,981
8
40
63

119

votes

20 answers

At each step of a limiting infinite process, put 10 balls in an urn and remove one at random. How many balls are left?

The question (slightly modified) goes as follows and if you have never encountered it before you can check it in example 6a, chapter 2, of Sheldon Ross' A First Course in Probability: Suppose that we possess an infinitely large urn and an infinite …

probability paradox

asked Nov 24 '17 at 18:23

Carlos Cinelli

10,500
5
42
77

119

votes

6 answers

Difference between confidence intervals and prediction intervals

For a prediction interval in linear regression you still use $\hat{E}[Y|x] = \hat{\beta_0}+\hat{\beta}_{1}x$ to generate the interval. You also use this to generate a confidence interval of $E[Y|x_0]$. What's the difference between the two?

regression confidence-interval predictive-models prediction-interval

asked Oct 04 '11 at 18:35

question

1,357
4
10
8

118

votes

3 answers

Intuitive explanation of unit root

How would you explain intuitively what is a unit root, in the context of the unit root test? I'm thinking in ways of explaining much like I've founded in this question. The case with unit root is that I know (little, by the way) that the unit root…

intuition unit-root

asked May 24 '12 at 22:07

Lucas Reis

1,962
3
16
15

118

votes

4 answers

PCA and proportion of variance explained

In general, what is meant by saying that the fraction $x$ of the variance in an analysis like PCA is explained by the first principal component? Can someone explain this intuitively but also give a precise mathematical definition of what "variance…

regression pca linear-model dimensionality-reduction

asked Feb 10 '12 at 05:36

user9097

2,973
7
18
11

117

votes

2 answers

KL divergence between two univariate Gaussians

I need to determine the KL-divergence between two Gaussians. I am comparing my results to these, but I can't reproduce their result. My result is obviously wrong, because the KL is not 0 for KL(p, p). I wonder where I am doing a mistake and ask if…

normal-distribution kullback-leibler

asked Feb 21 '11 at 10:30

bayerj

12,735
3
35
56

117

votes

14 answers

Maximum Likelihood Estimation (MLE) in layman terms

Could anyone explain to me in detail about maximum likelihood estimation (MLE) in layman's terms? I would like to know the underlying concept before going into mathematical derivation or equation.

mathematical-statistics maximum-likelihood intuition definition philosophical

asked Aug 19 '14 at 12:46

StatsUser

1,529
4
13
13

116

votes

4 answers

Assessing approximate distribution of data based on a histogram

Suppose I want to see whether my data is exponential based on a histogram (i.e. skewed to the right). Depending on how I group or bin the data, I can get wildly different histograms. One set of histograms will make is seem that the data is…

distributions data-visualization histogram binning

asked Mar 08 '13 at 17:58

guestoeijreor

1,161
3
8
3

116

votes

10 answers

ASA discusses limitations of $p$-values - what are the alternatives?

We already have multiple threads tagged as p-values that reveal lots of misunderstandings about them. Ten months ago we had a thread about psychological journal that "banned" $p$-values, now American Statistical Association (2016) says that with our…

hypothesis-testing bayesian p-value frequentist

asked Mar 08 '16 at 08:32

Tim

108,699
20
212
390

116

votes

5 answers

Comprehensive list of activation functions in neural networks with pros/cons

Are there any reference document(s) that give a comprehensive list of activation functions in neural networks along with their pros/cons (and ideally some pointers to publications where they were successful or not so successful)?

neural-networks references

asked Sep 12 '14 at 13:28

Franck Dernoncourt

42,093
30
155
271

115

votes

16 answers

If 900 out of 1000 people say a car is blue, what is the probability that it is blue?

This initially arose in connection some work we are doing to a model to classify natural text, but I've simplified it... Perhaps too much. You have a blue car (by some objective scientific measure - it is blue). You show it to 1000 people. 900 say…

probability

asked Aug 20 '17 at 19:57

Pat Molloy

1,115
2
7
6

115

votes

21 answers

What's a real-world example of "overfitting"?

I kind of understand what "overfitting" means, but I need help as to how to come up with a real-world example that applies to overfitting.

overfitting

asked Dec 11 '14 at 06:28

user3851283

307
2
4
3

Most Popular