Most Popular
1500 questions
43
votes
5 answers
LDA vs word2vec
I am trying to understand what is similarity between Latent Dirichlet Allocation and word2vec for calculating word similarity.
As I understand, LDA maps words to a vector of probabilities of latent topics, while word2vec maps them to a vector of…

Piotr Migdal
- 5,586
- 2
- 26
- 70
43
votes
1 answer
PCA and Correspondence analysis in their relation to Biplot
Biplot is often used to display results of principal component analysis (and of related techniques). It is a dual or overlay scatterplot showing component loadings and component scores simultaneously. I was informed by @amoeba today that he has…

ttnphns
- 51,648
- 40
- 253
- 462
43
votes
2 answers
Differences between Bhattacharyya distance and KL divergence
I'm looking for an intuitive explanation for the following questions:
In statistics and information theory, what's the difference between Bhattacharyya distance and KL divergence, as measures of the difference between two discrete probability…

JewelSue
- 533
- 1
- 4
- 7
43
votes
9 answers
When teaching statistics, use "normal" or "Gaussian"?
I use mostly "Gaussian distribution" in my book, but someone just suggested I switch to "normal distribution". Any consensus on which term to use for beginners?
Of course the two terms are synonyms, so this is not a question about substance, but…

Harvey Motulsky
- 14,903
- 11
- 51
- 98
43
votes
3 answers
R - Confused on Residual Terminology
Root mean square error
residual sum of squares
residual standard error
mean squared error
test error
I thought I used to understand these terms but the more I do statistic problems the more I have gotten myself confused where I second guess…

user3788557
- 1,479
- 4
- 22
- 24
42
votes
6 answers
Improve classification with many categorical variables
I'm working on a dataset with 200,000+ samples and approximately 50 features per sample: 10 continuous variables and the other ~40 are categorical variables (countries, languages, scientific fields etc.). For these categorical variables, you have…

Bertrand R
- 656
- 1
- 7
- 8
42
votes
4 answers
Ridge, lasso and elastic net
How do ridge, LASSO and elasticnet regularization methods compare? What are their respective advantages and disadvantages? Any good technical paper, or lecture notes would be appreciated as well.

user3269
- 4,622
- 8
- 43
- 53
42
votes
3 answers
Random number-Set.seed(N) in R
I realize that one uses set.seed() in R for pseudo-random number generation. I also realize that using the same number, like set.seed(123) insures you can reproduce results.
But what I don't get is what do the values themselves mean. I am playing…

mylesg
- 613
- 1
- 5
- 6
42
votes
5 answers
Statistical test to tell whether two samples are pulled from the same population?
Let's say I have two samples. If I want to tell whether they are pulled from different populations, I can run a t-test. But let's say I want to test whether the samples are from the same population. How does one do this? That is, how do I calculate…

user1566200
- 837
- 1
- 9
- 18
42
votes
4 answers
Bound for the correlation of three random variables
There are three random variables, $x,y,z$. The three correlations between the three variables are the same. That is,
$$\rho=\textrm{cor}(x,y)=\textrm{cor}(x,z)=\textrm{cor}(y,z)$$
What is the tightest bound you can give for $\rho$?

user1352399
- 521
- 1
- 5
- 3
42
votes
1 answer
When and how to use standardized explanatory variables in linear regression
I have 2 simple questions about linear regression:
When is it advised to standardize the explanatory variables?
Once estimation is carried out with standardized values, how can one predict with new values (how one should standardize the new…

teucer
- 1,801
- 2
- 16
- 29
42
votes
2 answers
How can I test whether a random effect is significant?
I am trying to understand when to use a random effect and when it is unnecessary. Ive been told a rule of thumb is if you have 4 or more groups/individuals which I do (15 individual moose). Some of those moose were experimented on 2 or 3 times for…

Kerry
- 1,129
- 3
- 14
- 20
42
votes
3 answers
Why is it that my colleagues and I learned opposite definitions for test and validation sets?
In my master's program I learned that when building a ML model you:
train the model on the training set
compare the performance of this against the validation set
tweak the settings and repeat steps 1-2
when you are satisfied, compare the final…

Jacob Myer
- 585
- 4
- 8
42
votes
6 answers
What are best practices in identifying interaction effects?
Other than literally testing each possible combination of variable(s) in a model (x1:x2 or x1*x2 ... xn-1 * xn). How do you identify if an interaction SHOULD or COULD exist between your independent (hopefully) variables?
What are best practices in…

Brandon Bertelsen
- 6,672
- 9
- 35
- 46
42
votes
7 answers
How often do you have to roll a 6-sided die to obtain every number at least once?
I've just played a game with my kids that basically boils down to: whoever rolls every number at least once on a 6-sided die wins.
I won, eventually, and the others finished 1-2 turns later. Now I'm wondering: what is the expectation of the length…

Jonas
- 1,578
- 1
- 13
- 16