Most Popular
1500 questions
76
votes
8 answers
How and why do normalization and feature scaling work?
I see that lots of machine learning algorithms work better with mean cancellation and covariance equalization. For example, Neural Networks tend to converge faster, and K-Means generally gives better clustering with pre-processed features. I do not…

erogol
- 1,427
- 1
- 15
- 26
76
votes
4 answers
Why does including latitude and longitude in a GAM account for spatial autocorrelation?
I have produced generalized additive models for deforestation. To account for spatial-autocorrelation, I have included latitude and longitude as a smoothed, interaction term (i.e. s(x,y)).
I've based this on reading many papers where the authors say…

gisol
- 943
- 1
- 8
- 10
76
votes
2 answers
Multivariate multiple regression in R
I have 2 dependent variables (DVs) each of whose score may be influenced by the set of 7 independent variables (IVs). DVs are continuous, while the set of IVs consists of a mix of continuous and binary coded variables. (In code below continuous…

Andrej
- 2,131
- 2
- 18
- 26
76
votes
1 answer
Understanding ROC curve
I'm having trouble understanding the ROC curve.
Is there any advantage / improvement in area under the ROC curve if I build different models from each unique subset of the training set and use it to produce a probability?
For example, if $y$ has…

Tay Shin
- 965
- 2
- 7
- 10
75
votes
1 answer
How to split the dataset for cross validation, learning curve, and final evaluation?
What is an appropriate strategy for splitting the dataset?
I ask for feedback on the following approach (not on the individual parameters like test_size or n_iter, but if I used X, y, X_train, y_train, X_test, and y_test appropriately and if the…

tobip
- 1,450
- 4
- 14
- 11
75
votes
4 answers
How should tiny $p$-values be reported? (and why does R put a minimum on 2.22e-16?)
For some tests in R, there is a lower limit on the p-value calculations of $2.22 \cdot 10^{-16}$. I'm not sure why it's this number, if there is a good reason for it or if it's just arbitrary. A lot of other stats packages just go to 0.0001, so this…

paul
- 1,342
- 3
- 11
- 16
75
votes
4 answers
What is the difference Cross-entropy and KL divergence?
Both the cross-entropy and the KL divergence are tools to measure the distance between two probability distributions, but what is the difference between them?
$$ H(P,Q) = -\sum_x P(x)\log Q(x) $$
$$ KL(P | Q) = \sum_{x} P(x)\log {\frac{P(x)}{Q(x)}}…

yoyo
- 979
- 1
- 6
- 9
75
votes
6 answers
What is an intuitive explanation for how PCA turns from a geometric problem (with distances) to a linear algebra problem (with eigenvectors)?
I've read a lot about PCA, including various tutorials and questions (such as this one, this one, this one, and this one).
The geometric problem that PCA is trying to optimize is clear to me: PCA tries to find the first principal component by…

stackoverflowuser2010
- 3,190
- 5
- 27
- 35
75
votes
6 answers
What method can be used to detect seasonality in data?
I want to detect seasonality in data that I receive. There are some methods that I have found like the seasonal subseries plot and the autocorrelation plot but the thing is I don't understand how to read the graph, could anyone help? The other…

Danial
- 751
- 1
- 6
- 3
74
votes
2 answers
Practical questions on tuning Random Forests
My questions are about Random Forests. The concept of this beautiful classifier is clear to me, but still there are a lot of practical usage questions. Unfortunately, I failed to find any practical guide to RF (I've been searching for something like…

lithuak
- 993
- 1
- 8
- 8
74
votes
5 answers
Understanding stratified cross-validation
I read in Wikipedia:
In stratified k-fold cross-validation, the folds are selected so that the mean response value is approximately equal in all the folds. In
the case of a dichotomous classification, this means that each fold
contains roughly…

Amelio Vazquez-Reina
- 17,546
- 26
- 74
- 110
74
votes
6 answers
Model for predicting number of Youtube views of Gangnam Style
PSY's music video "Gangnam style" is popular, after a little more than 2 months it has about 540 million viewers. I learned this from my preteen children at dinner last week and soon the discussion went in the direction of if it was possible to do…

FredrikD
- 843
- 7
- 15
74
votes
12 answers
What are some of the most common misconceptions about linear regression?
I'm curious, for those of you who have extensive experience collaborating with other researchers, what are some of the most common misconceptions about linear regression that you encounter?
I think can be a useful exercise to think about common…

ST21
- 155
- 4
- 10
74
votes
4 answers
What makes the Gaussian kernel so magical for PCA, and also in general?
I was reading about kernel PCA (1, 2, 3) with Gaussian and polynomial kernels.
How does the Gaussian kernel separate seemingly any sort of nonlinear data exceptionally well? Please give an intuitive analysis, as well as a mathematically involved…

Simon Kuang
- 2,051
- 3
- 17
- 18
74
votes
5 answers
Unified view on shrinkage: what is the relation (if any) between Stein's paradox, ridge regression, and random effects in mixed models?
Consider the following three phenomena.
Stein's paradox: given some data from multivariate normal distribution in $\mathbb R^n, \: n\ge 3$, sample mean is not a very good estimator of the true mean. One can obtain an estimation with lower mean…

amoeba
- 93,463
- 28
- 275
- 317