Most Popular
1500 questions
121
votes
6 answers
Where should I place dropout layers in a neural network?
Is there any general guidelines on where to place dropout layers in a neural network?

Franck Dernoncourt
- 42,093
- 30
- 155
- 271
120
votes
5 answers
What are the main differences between K-means and K-nearest neighbours?
I know that k-means is unsupervised and is used for clustering etc and that k-NN is supervised. But I wanted to know concrete differences between the two?

nsc010
- 1,467
- 2
- 10
- 9
120
votes
6 answers
Why are neural networks becoming deeper, but not wider?
In recent years, convolutional neural networks (or perhaps deep neural networks in general) have become deeper and deeper, with state-of-the-art networks going from 7 layers (AlexNet) to 1000 layers (Residual Nets) in the space of 4 years. The…

Karnivaurus
- 5,909
- 10
- 36
- 52
119
votes
4 answers
When to use gamma GLMs?
The gamma distribution can take on a pretty wide range of shapes, and given the link between the mean and the variance through its two parameters, it seems suited to dealing with heteroskedasticity in non-negative data, in a way that log-transformed…

generic_user
- 11,981
- 8
- 40
- 63
119
votes
20 answers
At each step of a limiting infinite process, put 10 balls in an urn and remove one at random. How many balls are left?
The question (slightly modified) goes as follows and if you have never encountered it before you can check it in example 6a, chapter 2, of Sheldon Ross' A First Course in Probability:
Suppose that we possess an infinitely large urn and an infinite
…

Carlos Cinelli
- 10,500
- 5
- 42
- 77
119
votes
6 answers
Difference between confidence intervals and prediction intervals
For a prediction interval in linear regression you still use $\hat{E}[Y|x] = \hat{\beta_0}+\hat{\beta}_{1}x$ to generate the interval. You also use this to generate a confidence interval of $E[Y|x_0]$. What's the difference between the two?

question
- 1,357
- 4
- 10
- 8
118
votes
3 answers
Intuitive explanation of unit root
How would you explain intuitively what is a unit root, in the context of the unit root test?
I'm thinking in ways of explaining much like I've founded in this question.
The case with unit root is that I know (little, by the way) that the unit root…

Lucas Reis
- 1,962
- 3
- 16
- 15
118
votes
4 answers
PCA and proportion of variance explained
In general, what is meant by saying that the fraction $x$ of the variance in an analysis like PCA is explained by the first principal component? Can someone explain this intuitively but also give a precise mathematical definition of what "variance…

user9097
- 2,973
- 7
- 18
- 11
117
votes
2 answers
KL divergence between two univariate Gaussians
I need to determine the KL-divergence between two Gaussians. I am comparing my results to these, but I can't reproduce their result. My result is obviously wrong, because the KL is not 0 for KL(p, p).
I wonder where I am doing a mistake and ask if…

bayerj
- 12,735
- 3
- 35
- 56
117
votes
14 answers
Maximum Likelihood Estimation (MLE) in layman terms
Could anyone explain to me in detail about maximum likelihood estimation (MLE) in layman's terms? I would like to know the underlying concept before going into mathematical derivation or equation.

StatsUser
- 1,529
- 4
- 13
- 13
116
votes
4 answers
Assessing approximate distribution of data based on a histogram
Suppose I want to see whether my data is exponential based on a histogram (i.e. skewed to the right).
Depending on how I group or bin the data, I can get wildly different histograms.
One set of histograms will make is seem that the data is…

guestoeijreor
- 1,161
- 3
- 8
- 3
116
votes
10 answers
ASA discusses limitations of $p$-values - what are the alternatives?
We already have multiple threads tagged as p-values that reveal lots of misunderstandings about them. Ten months ago we had a thread about psychological journal that "banned" $p$-values, now American Statistical Association (2016) says that with our…

Tim
- 108,699
- 20
- 212
- 390
116
votes
5 answers
Comprehensive list of activation functions in neural networks with pros/cons
Are there any reference document(s) that give a comprehensive list of activation functions in neural networks along with their pros/cons (and ideally some pointers to publications where they were successful or not so successful)?

Franck Dernoncourt
- 42,093
- 30
- 155
- 271
115
votes
16 answers
If 900 out of 1000 people say a car is blue, what is the probability that it is blue?
This initially arose in connection some work we are doing to a model to classify natural text, but I've simplified it... Perhaps too much.
You have a blue car (by some objective scientific measure - it is blue).
You show it to 1000 people.
900 say…

Pat Molloy
- 1,115
- 2
- 7
- 6
115
votes
21 answers
What's a real-world example of "overfitting"?
I kind of understand what "overfitting" means, but I need help as to how to come up with a real-world example that applies to overfitting.

user3851283
- 307
- 2
- 4
- 3