Highest Voted Questions - Statistical Analysis Stack Exchange

141

votes

5 answers

Percentile vs quantile vs quartile

What is the difference between the three terms below? percentile quantile quartile

descriptive-statistics quantiles median percentage

asked Jun 13 '15 at 11:25

luciano

12,197
30
87
119

140

votes

21 answers

What's the difference between probability and statistics?

What's the difference between probability and statistics, and why are they studied together?

probability teaching mathematical-statistics

asked Jul 26 '10 at 20:17

hslc

1,537
3
10
3

140

votes

7 answers

What is the difference between off-policy and on-policy learning?

Artificial intelligence website defines off-policy and on-policy learning as follows: "An off-policy learner learns the value of the optimal policy independently of the agent's actions. Q-learning is an off-policy learner. An on-policy learner…

machine-learning reinforcement-learning artificial-intelligence

asked Dec 02 '15 at 14:21

cgo

7,445
10
42
61

139

votes

5 answers

Batch gradient descent versus stochastic gradient descent

Suppose we have some training set $(x_{(i)}, y_{(i)})$ for $i = 1, \dots, m$. Also suppose we run some type of supervised learning algorithm on the training set. Hypotheses are represented as $h_{\theta}(x_{(i)}) = \theta_0+\theta_{1}x_{(i)1} +…

optimization gradient-descent stochastic-gradient-descent

asked Feb 07 '13 at 19:34

user20616

1,431
3
11
7

139

votes

9 answers

Obtaining knowledge from a random forest

Random forests are considered to be black boxes, but recently I was thinking what knowledge can be obtained from a random forest? The most obvious thing is the importance of the variables, in the simplest variant it can be done just by calculating…

machine-learning data-mining interaction random-forest cart

asked Jan 16 '12 at 11:09

Tomek Tarczynski

3,854
7
29
37

137

votes

5 answers

Why normalize images by subtracting dataset's image mean, instead of the current image mean in deep learning?

There are some variations on how to normalize the images but most seem to use these two methods: Subtract the mean per channel calculated over all images (e.g. VGG_ILSVRC_16_layers) Subtract by pixel/channel calculated over all images (e.g. CNN_S,…

deep-learning image-processing

asked May 08 '16 at 11:11

Max Gordon

5,616
8
30
51

136

votes

6 answers

Should one remove highly correlated variables before doing PCA?

I'm reading a paper where author discards several variables due to high correlation to other variables before doing PCA. The total number of variables is around 20. Does this give any benefits? It looks like an overhead to me as PCA should handle…

correlation pca

asked Feb 21 '13 at 16:41

type2

1,471
3
10
4

136

votes

3 answers

What is the difference between linear regression and logistic regression?

What is the difference between linear regression and logistic regression? When would you use each?

regression logistic linear-model

asked May 28 '12 at 18:17

B Seven

2,873
4
24
29

136

votes

4 answers

What is the difference between convolutional neural networks, restricted Boltzmann machines, and auto-encoders?

Recently I have been reading about deep learning and I am confused about the terms (or say technologies). What is the difference between Convolutional neural networks (CNN), Restricted Boltzmann machines (RBM) and Auto-encoders?

neural-networks deep-learning conv-neural-network autoencoders restricted-boltzmann-machine

asked Sep 04 '14 at 20:52

RockTheStar

11,277
31
63
89

135

votes

9 answers

Why does a time series have to be stationary?

I understand that a stationary time series is one whose mean and variance is constant over time. Can someone please explain why we have to make sure our data set is stationary before we can run different ARIMA or ARM models on it? Does this also…

regression time-series stationarity

asked Dec 12 '11 at 21:11

alex

1,351
3
9
3

134

votes

5 answers

What is the .632+ rule in bootstrapping?

Here @gung makes reference to the .632+ rule. A quick Google search doesn't yield an easy to understand answer as to what this rule means and for what purpose it is used. Would someone please elucidate the .632+ rule?

bootstrap

asked May 07 '14 at 12:16

russellpierce

17,079
16
67
98

134

votes

9 answers

What is the difference between linear regression on y with x and x with y?

The Pearson correlation coefficient of x and y is the same, whether you compute pearson(x, y) or pearson(y, x). This suggests that doing a linear regression of y given x or x given y should be the same, but I don't think that's the case. Can…

regression correlation linear-model pearson-r

asked Feb 13 '12 at 05:15

user9097

2,973
7
18
11

131

votes

4 answers

Nested cross validation for model selection

How can one use nested cross validation for model selection? From what I read online, nested CV works as follows: There is the inner CV loop, where we may conduct a grid search (e.g. running K-fold for every available model, e.g. combination of…

cross-validation model-selection

asked Jul 22 '13 at 15:53

Amelio Vazquez-Reina

17,546
26
74
110

131

votes

4 answers

Difference between neural net weight decay and learning rate

In the context of neural networks, what is the difference between the learning rate and weight decay?

neural-networks terminology

asked May 25 '12 at 05:17

Ryan Zotti

5,927
6
29
33

131

votes

19 answers

Books for self-studying time series analysis?

I started by Time Series Analysis by Hamilton, but I am lost hopelessly. This book is really too theoretical for me to learn by myself. Does anybody have a recommendation for a textbook on time series analysis that's suitable for self-study?

time-series self-study references

asked Jan 03 '12 at 01:22

CuriousMind

2,133
5
24
32

Most Popular