Highest Voted Questions - Statistical Analysis Stack Exchange

57

votes

8 answers

Why continue to teach and use hypothesis testing (when confidence intervals are available)?

Why continue to teach and use hypothesis testing (with all its difficult concepts and which are among the most statistical sins) for problems where there is an interval estimator (confidence, bootstrap, credibility or whatever)? What is the best…

hypothesis-testing confidence-interval teaching

asked Feb 07 '11 at 18:05

Washington S. Silva

781
7
6

57

votes

3 answers

ANOVA assumption normality/normal distribution of residuals

The Wikipedia page on ANOVA lists three assumptions, namely: Independence of cases – this is an assumption of the model that simplifies the statistical analysis. Normality – the distributions of the residuals are normal. Equality (or "homogeneity")…

anova residuals normality-assumption assumptions faq

asked Jan 18 '11 at 19:07

Roman Luštrik

3,338
3
31
39

57

votes

3 answers

What is the difference between a Normal and a Gaussian Distribution

Is there a deep difference between a Normal and a Gaussian distribution, I've seen many papers using them without distinction, and I usually also refer to them as the same thing. However, my PI recently told me that a normal is the specific case of…

normal-distribution terminology

asked Apr 12 '13 at 17:28

Leon palafox

825
1
6
9

57

votes

4 answers

Manually Calculating P value from t-value in t-test

I have a sample dataset with 31 values. I ran a two-tailed t-test using R to test if the true mean is equal to 10: t.test(x=data, mu=10, conf.level=0.95) Output: t = 11.244, df = 30, p-value = 2.786e-12 alternative hypothesis: true mean is not…

r statistical-significance t-test p-value

asked Dec 05 '12 at 01:51

herbps10

673
1
5
6

57

votes

7 answers

Interview question: If correlation doesn't imply causation, how do you detect causation?

I got this question: If correlation doesn't imply causation, how do you detect causation? in an interview. My answer was: You do some form of A/B testing. The interviewer kept prodding me for another approach but I couldn't think of any, and he…

self-study correlation causality

asked Nov 08 '19 at 21:15

Akaike's Children

1,251
7
15

57

votes

3 answers

Box-Cox like transformation for independent variables?

Is there a Box-Cox like transformation for independent variables? That is, a transformation that optimizes the $x$ variable so that the y~f(x) will make a more reasonable fit for a linear model? If so, is there a function to perform this with R?

r regression data-transformation normality-assumption

asked Sep 05 '12 at 10:37

Tal Galili

19,935
32
133
195

57

votes

1 answer

Why do we use Kullback-Leibler divergence rather than cross entropy in the t-SNE objective function?

In my mind, KL divergence from sample distribution to true distribution is simply the difference between cross entropy and entropy. Why do we use cross entropy to be the cost function in many machine learning models, but use Kullback-Leibler…

kullback-leibler tsne cross-entropy

asked Mar 07 '17 at 13:26

JimSpark

673
1
6
5

57

votes

11 answers

How to decide on the correct number of clusters?

We find the cluster centers and assign points to k different cluster bins in k-means clustering which is a very well known algorithm and is found almost in every machine learning package on the net. But the missing and most important part in my…

clustering k-means

asked Feb 09 '12 at 14:45

petrichor

1,615
2
15
17

57

votes

10 answers

Who are frequentists?

We already had a thread asking who are Bayesians and one asking if frequentists are Bayesians, but there was no thread asking directly who are frequentists? This is a question that was asked by @whuber as a comment to this thread and it begs to be…

bayesian frequentist

asked Aug 29 '16 at 18:48

Tim

108,699
20
212
390

57

votes

6 answers

Alternatives to logistic regression in R

I would like as many algorithms that perform the same task as logistic regression. That is algorithms/models that can give a prediction to a binary response (Y) with some explanatory variable (X). I would be glad if after you name the algorithm,…

r regression logistic classification predictive-models

asked Aug 31 '10 at 10:02

Tal Galili

19,935
32
133
195

57

votes

10 answers

What are some examples of anachronistic practices in statistics?

I am referring to practices that still maintain their presence, even though the problems (usually computational) they were designed to cope with have been mostly solved. For example, Yates' continuity correction was invented to approximate Fisher's…

references philosophical

asked Jun 18 '16 at 05:42

Francis

2,972
1
20
26

57

votes

9 answers

Reference book for linear algebra applied to statistics?

I have been working in R for a bit and have been faced with things like PCA, SVD, QR decompositions and many such linear algebra results (when inspecting estimating weighted regressions and such) so I wanted to know if anyone has a recommendation on…

references matrix linear-algebra weighted-regression

asked Jan 19 '12 at 17:32

Palace Chan

759
3
9
17

57

votes

1 answer

Should I normalize word2vec's word vectors before using them?

After training word vectors with word2vec, is it better to normalize them before using them for some downstream applications? I.e what are the pros/cons of normalizing them?

natural-language word2vec word-embeddings

asked Oct 20 '15 at 23:56

Franck Dernoncourt

42,093
30
155
271

57

votes

1 answer

How to apply standardization/normalization to train- and testset if prediction is the goal?

Do I transform all my data or folds (if CV is applied) at the same time? e.g. (allData - mean(allData)) / sd(allData) Do I transform trainset and testset separately? e.g. (trainData - mean(trainData)) / sd(trainData) (testData - mean(testData)) /…

r cross-validation data-transformation normalization standardization

asked Sep 30 '15 at 12:39

DerTom

737
1
6
10

57

votes

3 answers

Won't highly-correlated variables in random forest distort accuracy and feature-selection?

In my understanding, highly correlated variables won't cause multi-collinearity issues in random forest model (Please correct me if I'm wrong). However, on the other way, if I have too many variables containing similar information, will the model…

random-forest multicollinearity ensemble-learning

asked Mar 13 '15 at 14:46

Yoki

739
1
7
10

Most Popular