Highest Voted Questions - Statistical Analysis Stack Exchange

43

votes

5 answers

LDA vs word2vec

I am trying to understand what is similarity between Latent Dirichlet Allocation and word2vec for calculating word similarity. As I understand, LDA maps words to a vector of probabilities of latent topics, while word2vec maps them to a vector of…

machine-learning self-study natural-language latent-variable word2vec

asked Apr 09 '15 at 09:17

Piotr Migdal

5,586
2
26
70

43

votes

1 answer

PCA and Correspondence analysis in their relation to Biplot

Biplot is often used to display results of principal component analysis (and of related techniques). It is a dual or overlay scatterplot showing component loadings and component scores simultaneously. I was informed by @amoeba today that he has…

pca multivariate-analysis svd correspondence-analysis biplot

asked Mar 14 '15 at 19:47

ttnphns

51,648
40
253
462

43

votes

2 answers

Differences between Bhattacharyya distance and KL divergence

I'm looking for an intuitive explanation for the following questions: In statistics and information theory, what's the difference between Bhattacharyya distance and KL divergence, as measures of the difference between two discrete probability…

mathematical-statistics information-theory kullback-leibler bhattacharyya

asked Dec 27 '14 at 08:11

JewelSue

533
1
4
7

43

votes

9 answers

When teaching statistics, use "normal" or "Gaussian"?

I use mostly "Gaussian distribution" in my book, but someone just suggested I switch to "normal distribution". Any consensus on which term to use for beginners? Of course the two terms are synonyms, so this is not a question about substance, but…

normal-distribution terminology

asked Sep 08 '14 at 23:43

Harvey Motulsky

14,903
11
51
98

43

votes

3 answers

R - Confused on Residual Terminology

Root mean square error residual sum of squares residual standard error mean squared error test error I thought I used to understand these terms but the more I do statistic problems the more I have gotten myself confused where I second guess…

r regression residuals

asked Aug 07 '14 at 05:57

user3788557

1,479
4
22
24

42

votes

6 answers

Improve classification with many categorical variables

I'm working on a dataset with 200,000+ samples and approximately 50 features per sample: 10 continuous variables and the other ~40 are categorical variables (countries, languages, scientific fields etc.). For these categorical variables, you have…

machine-learning classification categorical-data random-forest many-categories

asked Apr 25 '14 at 17:14

Bertrand R

656
1
7
8

42

votes

4 answers

Ridge, lasso and elastic net

How do ridge, LASSO and elasticnet regularization methods compare? What are their respective advantages and disadvantages? Any good technical paper, or lecture notes would be appreciated as well.

references lasso regularization ridge-regression elastic-net

asked Apr 09 '14 at 14:40

user3269

4,622
8
43
53

42

votes

3 answers

Random number-Set.seed(N) in R

I realize that one uses set.seed() in R for pseudo-random number generation. I also realize that using the same number, like set.seed(123) insures you can reproduce results. But what I don't get is what do the values themselves mean. I am playing…

r random-generation

asked Feb 12 '14 at 02:09

mylesg

613
1
5
6

42

votes

5 answers

Statistical test to tell whether two samples are pulled from the same population?

Let's say I have two samples. If I want to tell whether they are pulled from different populations, I can run a t-test. But let's say I want to test whether the samples are from the same population. How does one do this? That is, how do I calculate…

statistical-significance

asked Jan 23 '14 at 20:41

user1566200

837
1
9
18

42

votes

4 answers

Bound for the correlation of three random variables

There are three random variables, $x,y,z$. The three correlations between the three variables are the same. That is, $$\rho=\textrm{cor}(x,y)=\textrm{cor}(x,z)=\textrm{cor}(y,z)$$ What is the tightest bound you can give for $\rho$?

correlation correlation-matrix

asked Oct 15 '13 at 01:55

user1352399

521
1
5
3

42

votes

1 answer

When and how to use standardized explanatory variables in linear regression

I have 2 simple questions about linear regression: When is it advised to standardize the explanatory variables? Once estimation is carried out with standardized values, how can one predict with new values (how one should standardize the new…

regression predictive-models references standardization predictor

asked Feb 11 '11 at 23:09

teucer

1,801
2
16
29

42

votes

2 answers

How can I test whether a random effect is significant?

I am trying to understand when to use a random effect and when it is unnecessary. Ive been told a rule of thumb is if you have 4 or more groups/individuals which I do (15 individual moose). Some of those moose were experimented on 2 or 3 times for…

mixed-model lme4-nlme random-effects-model glmm

asked Apr 15 '13 at 12:37

Kerry

1,129
3
14
20

42

votes

3 answers

Why is it that my colleagues and I learned opposite definitions for test and validation sets?

In my master's program I learned that when building a ML model you: train the model on the training set compare the performance of this against the validation set tweak the settings and repeat steps 1-2 when you are satisfied, compare the final…

machine-learning neural-networks cross-validation terminology validation

asked May 24 '21 at 13:59

Jacob Myer

585
4
8

42

votes

6 answers

What are best practices in identifying interaction effects?

Other than literally testing each possible combination of variable(s) in a model (x1:x2 or x1*x2 ... xn-1 * xn). How do you identify if an interaction SHOULD or COULD exist between your independent (hopefully) variables? What are best practices in…

regression modeling interaction

asked Nov 25 '10 at 05:32

Brandon Bertelsen

6,672
9
35
46

42

votes

7 answers

How often do you have to roll a 6-sided die to obtain every number at least once?

I've just played a game with my kids that basically boils down to: whoever rolls every number at least once on a 6-sided die wins. I won, eventually, and the others finished 1-2 turns later. Now I'm wondering: what is the expectation of the length…

probability dice coupon-collector-problem

asked Jan 24 '13 at 02:04

Jonas

1,578
1
13
16

Most Popular