Most Popular
1500 questions
32
votes
3 answers
How and why does Batch Normalization use moving averages to track the accuracy of the model as it trains?
I was reading the batch normalization (BN) paper (1) and didn't understand the need to use moving averages to track the accuracy of the model and even if I accepted that it was the right thing to do, I don't understand what they are doing…

Charlie Parker
- 5,836
- 11
- 57
- 113
32
votes
5 answers
What do confidence intervals say about precision (if anything)?
Morey et al (2015) argue that confidence intervals are misleading and there are multiple bias related to understanding of them. Among others, they describe the precision fallacy as following:
The Precision fallacy
The width of a confidence interval…

Tim
- 108,699
- 20
- 212
- 390
32
votes
2 answers
What is the difference between dropout and drop connect?
What is the difference between dropout and drop connect?
AFAIK, dropout randomly drops hidden nodes during training but keeps them in testing, and drop connect drops connections.
But isn't dropping connections equivalent to dropping the hidden…

Machina333
- 863
- 2
- 9
- 10
32
votes
10 answers
Recommendations for non-technical yet deep articles in statistics
The inspiration for this question comes from the late Leo-Breiman's well-known article Statistical Modeling: The Two Cultures (available open access). The author compares what he sees as two disparate approaches to analyzing data, touching upon key…

Richard Border
- 1,128
- 9
- 26
32
votes
2 answers
How to make a reward function in reinforcement learning?
While studying Reinforcement Learning, I have come across many forms of the reward function: $R(s,a)$, $R(s,a,s')$, and even a reward function that only depends on the current state. Having said that, I realized it is not very easy to 'make' or…

cgo
- 7,445
- 10
- 42
- 61
32
votes
3 answers
Why are bias nodes used in neural networks?
Why are bias nodes used in neural networks?
How many you should use?
In which layers you should use them: all hidden layers and the output layer?

grmmhp
- 421
- 1
- 4
- 4
32
votes
2 answers
Diagnostics for generalized linear (mixed) models (specifically residuals)
I am currently struggling with finding the right model for difficult count data (dependent variable). I have tried various different models (mixed effects models are necessary for my kind of data) such as lmer and lme4 (with a log transform) as well…

fsociety
- 1,084
- 1
- 12
- 25
32
votes
7 answers
Why is a comma a bad record separator/delimiter in CSV files?
I was reading this article and I'm curious for the proper answer to this question.
The only thing that comes to my mind it's perhaps that in some countries the decimal separator is a comma, and it may be problems when sharing data in CSV, but I'm…

David Gasquez
- 498
- 1
- 5
- 11
32
votes
2 answers
Why is Lasso penalty equivalent to the double exponential (Laplace) prior?
I have read in a number of references that the Lasso estimate for the regression parameter vector $B$ is equivalent to the posterior mode of $B$ in which the prior distribution for each $B_i$ is a double exponential distribution (also known as…

Wintermute
- 1,207
- 2
- 16
- 24
32
votes
5 answers
Fisher's Exact Test in contingency tables larger than 2x2
I was taught to only apply Fisher's Exact Test in contingency tables that were 2x2.
Questions:
Did Fisher himself ever envision this test to be used in tables larger than 2x2 (I am aware of the tale of him devising the test while trying to guess…

pmgjones
- 5,543
- 8
- 36
- 36
32
votes
4 answers
Can anyone explain conjugate priors in simplest possible terms?
I have been trying to understand the idea of conjugate priors in Bayesian statistics for a while but I simply don't get it. Can anyone explain the idea in the simplest possible terms, perhaps using the "Gaussian prior" as an example?

Jenna Maiz
- 779
- 7
- 17
32
votes
4 answers
Choosing the best model from among different "best" models
How do you choose a model from among different models chosen by different methods (e.g. backwards or forwards selection)?
Also what is a parsimonious model?

tom
- 361
- 1
- 4
- 6
32
votes
1 answer
predict() Function for lmer Mixed Effects Models
The problem:
I have read in other posts that predict is not available for mixed effects lmer {lme4} models in [R].
I tried exploring this subject with a toy dataset...
Background:
The dataset is adapted form this source, and available…

Antoni Parellada
- 23,430
- 15
- 100
- 197
32
votes
5 answers
What does interaction depth mean in GBM?
I had a question on the interaction depth parameter in gbm in R. This may be a noob question, for which I apologize, but how does the parameter, which I believe denotes the number of terminal nodes in a tree, basically indicate X-way interaction…

tomas
- 1,715
- 4
- 20
- 26
32
votes
2 answers
Can PCA be applied for time series data?
I understand that Principal Component Analysis (PCA) can be applied basically for cross sectional data. Can PCA be used for time series data effectively by specifying year as time series variable and running PCA normally? I have found that dynamic…

Nisha Simon
- 471
- 1
- 6
- 5