Most Popular

1500 questions
43
votes
5 answers

LDA vs word2vec

I am trying to understand what is similarity between Latent Dirichlet Allocation and word2vec for calculating word similarity. As I understand, LDA maps words to a vector of probabilities of latent topics, while word2vec maps them to a vector of…
43
votes
1 answer

PCA and Correspondence analysis in their relation to Biplot

Biplot is often used to display results of principal component analysis (and of related techniques). It is a dual or overlay scatterplot showing component loadings and component scores simultaneously. I was informed by @amoeba today that he has…
ttnphns
  • 51,648
  • 40
  • 253
  • 462
43
votes
2 answers

Differences between Bhattacharyya distance and KL divergence

I'm looking for an intuitive explanation for the following questions: In statistics and information theory, what's the difference between Bhattacharyya distance and KL divergence, as measures of the difference between two discrete probability…
43
votes
9 answers

When teaching statistics, use "normal" or "Gaussian"?

I use mostly "Gaussian distribution" in my book, but someone just suggested I switch to "normal distribution". Any consensus on which term to use for beginners? Of course the two terms are synonyms, so this is not a question about substance, but…
Harvey Motulsky
  • 14,903
  • 11
  • 51
  • 98
43
votes
3 answers

R - Confused on Residual Terminology

Root mean square error residual sum of squares residual standard error mean squared error test error I thought I used to understand these terms but the more I do statistic problems the more I have gotten myself confused where I second guess…
user3788557
  • 1,479
  • 4
  • 22
  • 24
42
votes
6 answers

Improve classification with many categorical variables

I'm working on a dataset with 200,000+ samples and approximately 50 features per sample: 10 continuous variables and the other ~40 are categorical variables (countries, languages, scientific fields etc.). For these categorical variables, you have…
42
votes
4 answers

Ridge, lasso and elastic net

How do ridge, LASSO and elasticnet regularization methods compare? What are their respective advantages and disadvantages? Any good technical paper, or lecture notes would be appreciated as well.
user3269
  • 4,622
  • 8
  • 43
  • 53
42
votes
3 answers

Random number-Set.seed(N) in R

I realize that one uses set.seed() in R for pseudo-random number generation. I also realize that using the same number, like set.seed(123) insures you can reproduce results. But what I don't get is what do the values themselves mean. I am playing…
mylesg
  • 613
  • 1
  • 5
  • 6
42
votes
5 answers

Statistical test to tell whether two samples are pulled from the same population?

Let's say I have two samples. If I want to tell whether they are pulled from different populations, I can run a t-test. But let's say I want to test whether the samples are from the same population. How does one do this? That is, how do I calculate…
user1566200
  • 837
  • 1
  • 9
  • 18
42
votes
4 answers

Bound for the correlation of three random variables

There are three random variables, $x,y,z$. The three correlations between the three variables are the same. That is, $$\rho=\textrm{cor}(x,y)=\textrm{cor}(x,z)=\textrm{cor}(y,z)$$ What is the tightest bound you can give for $\rho$?
user1352399
  • 521
  • 1
  • 5
  • 3
42
votes
1 answer

When and how to use standardized explanatory variables in linear regression

I have 2 simple questions about linear regression: When is it advised to standardize the explanatory variables? Once estimation is carried out with standardized values, how can one predict with new values (how one should standardize the new…
teucer
  • 1,801
  • 2
  • 16
  • 29
42
votes
2 answers

How can I test whether a random effect is significant?

I am trying to understand when to use a random effect and when it is unnecessary. Ive been told a rule of thumb is if you have 4 or more groups/individuals which I do (15 individual moose). Some of those moose were experimented on 2 or 3 times for…
Kerry
  • 1,129
  • 3
  • 14
  • 20
42
votes
3 answers

Why is it that my colleagues and I learned opposite definitions for test and validation sets?

In my master's program I learned that when building a ML model you: train the model on the training set compare the performance of this against the validation set tweak the settings and repeat steps 1-2 when you are satisfied, compare the final…
42
votes
6 answers

What are best practices in identifying interaction effects?

Other than literally testing each possible combination of variable(s) in a model (x1:x2 or x1*x2 ... xn-1 * xn). How do you identify if an interaction SHOULD or COULD exist between your independent (hopefully) variables? What are best practices in…
Brandon Bertelsen
  • 6,672
  • 9
  • 35
  • 46
42
votes
7 answers

How often do you have to roll a 6-sided die to obtain every number at least once?

I've just played a game with my kids that basically boils down to: whoever rolls every number at least once on a 6-sided die wins. I won, eventually, and the others finished 1-2 turns later. Now I'm wondering: what is the expectation of the length…
Jonas
  • 1,578
  • 1
  • 13
  • 16