Most Popular
1500 questions
141
votes
5 answers
Percentile vs quantile vs quartile
What is the difference between the three terms below?
percentile
quantile
quartile

luciano
- 12,197
- 30
- 87
- 119
140
votes
21 answers
What's the difference between probability and statistics?
What's the difference between probability and statistics, and why are they studied together?

hslc
- 1,537
- 3
- 10
- 3
140
votes
7 answers
What is the difference between off-policy and on-policy learning?
Artificial intelligence website defines off-policy and on-policy learning as follows:
"An off-policy learner learns the value of the optimal policy independently of the agent's actions. Q-learning is an off-policy learner. An on-policy learner…

cgo
- 7,445
- 10
- 42
- 61
139
votes
5 answers
Batch gradient descent versus stochastic gradient descent
Suppose we have some training set $(x_{(i)}, y_{(i)})$ for $i = 1, \dots, m$. Also suppose we run some type of supervised learning algorithm on the training set. Hypotheses are represented as $h_{\theta}(x_{(i)}) = \theta_0+\theta_{1}x_{(i)1} +…

user20616
- 1,431
- 3
- 11
- 7
139
votes
9 answers
Obtaining knowledge from a random forest
Random forests are considered to be black boxes, but recently I was thinking what knowledge can be obtained from a random forest?
The most obvious thing is the importance of the variables, in the simplest variant it can be done just by calculating…

Tomek Tarczynski
- 3,854
- 7
- 29
- 37
137
votes
5 answers
Why normalize images by subtracting dataset's image mean, instead of the current image mean in deep learning?
There are some variations on how to normalize the images but most seem to use these two methods:
Subtract the mean per channel calculated over all images (e.g. VGG_ILSVRC_16_layers)
Subtract by pixel/channel calculated over all images (e.g. CNN_S,…

Max Gordon
- 5,616
- 8
- 30
- 51
136
votes
6 answers
Should one remove highly correlated variables before doing PCA?
I'm reading a paper where author discards several variables due to high correlation to other variables before doing PCA. The total number of variables is around 20.
Does this give any benefits? It looks like an overhead to me as PCA should handle…

type2
- 1,471
- 3
- 10
- 4
136
votes
3 answers
What is the difference between linear regression and logistic regression?
What is the difference between linear regression and logistic regression?
When would you use each?

B Seven
- 2,873
- 4
- 24
- 29
136
votes
4 answers
What is the difference between convolutional neural networks, restricted Boltzmann machines, and auto-encoders?
Recently I have been reading about deep learning and I am confused about the terms (or say technologies). What is the difference between
Convolutional neural networks (CNN),
Restricted Boltzmann machines (RBM) and
Auto-encoders?

RockTheStar
- 11,277
- 31
- 63
- 89
135
votes
9 answers
Why does a time series have to be stationary?
I understand that a stationary time series is one whose mean and variance is constant over time. Can someone please explain why we have to make sure our data set is stationary before we can run different ARIMA or ARM models on it? Does this also…

alex
- 1,351
- 3
- 9
- 3
134
votes
5 answers
What is the .632+ rule in bootstrapping?
Here @gung makes reference to the .632+ rule. A quick Google search doesn't yield an easy to understand answer as to what this rule means and for what purpose it is used. Would someone please elucidate the .632+ rule?

russellpierce
- 17,079
- 16
- 67
- 98
134
votes
9 answers
What is the difference between linear regression on y with x and x with y?
The Pearson correlation coefficient of x and y is the same, whether you compute pearson(x, y) or pearson(y, x). This suggests that doing a linear regression of y given x or x given y should be the same, but I don't think that's the case.
Can…

user9097
- 2,973
- 7
- 18
- 11
131
votes
4 answers
Nested cross validation for model selection
How can one use nested cross validation for model selection?
From what I read online, nested CV works as follows:
There is the inner CV loop, where we may conduct a grid search (e.g. running K-fold for every available model, e.g. combination of…

Amelio Vazquez-Reina
- 17,546
- 26
- 74
- 110
131
votes
4 answers
Difference between neural net weight decay and learning rate
In the context of neural networks, what is the difference between the learning rate and weight decay?

Ryan Zotti
- 5,927
- 6
- 29
- 33
131
votes
19 answers
Books for self-studying time series analysis?
I started by Time Series Analysis by Hamilton, but I am lost hopelessly. This book is really too theoretical for me to learn by myself.
Does anybody have a recommendation for a textbook on time series analysis that's suitable for self-study?

CuriousMind
- 2,133
- 5
- 24
- 32