Highest Voted Questions - Statistical Analysis Stack Exchange

64

votes

5 answers

Is it meaningful to calculate Pearson or Spearman correlation between two Boolean vectors?

There are two Boolean vectors, which contain 0 and 1 only. If I calculate the Pearson or Spearman correlation, are they meaningful or reasonable?

correlation binary-data pearson-r spearman-rho

asked Jun 18 '14 at 07:52

Zhilong Jia

785
1
6
9

63

votes

3 answers

Explain the xkcd jelly bean comic: What makes it funny?

I see that one time out of the twenty total tests they run, $p < 0.05$, so they wrongly assume that during one of the twenty tests, the result is significant ($0.05 = 1/20$). xkcd jelly bean comic - "Significant" Title: Significant Hover text:…

hypothesis-testing statistical-significance confidence-interval p-value humor

asked Feb 27 '14 at 00:34

DJG

693
1
7
6

63

votes

4 answers

Perform feature normalization before or within model validation?

A common good practice in Machine Learning is to do feature normalization or data standardization of the predictor variables, that's it, center the data substracting the mean and normalize it dividing by the variance (or standard deviation too). For…

machine-learning normalization standardization

asked Nov 22 '13 at 13:16

SkyWalker

825
1
7
12

63

votes

3 answers

What is the difference in Bayesian estimate and maximum likelihood estimate?

Please explain to me the difference in Bayesian estimate and Maximum likelihood estimate?

bayesian maximum-likelihood

asked Oct 29 '13 at 23:15

triomphe

787
1
6
9

63

votes

32 answers

What are the worst (commonly adopted) ideas/principles in statistics?

In my statistical teaching, I encounter some stubborn ideas/principles relating to statistics that have become popularised, yet seem to me to be misleading, or in some cases utterly without merit. I would like to solicit the views of others on this…

inference teaching philosophical

asked Jul 10 '20 at 01:57

Ben

91,027
3
150
376

63

votes

9 answers

Advanced statistics books recommendation

There are several threads on this site for book recommendations on introductory statistics and machine learning but I am looking for a text on advanced statistics including, in order of priority: maximum likelihood, generalized linear models,…

generalized-linear-model pca maximum-likelihood references saddlepoint-approximation

asked Jul 27 '12 at 16:15

Robert Kubrick

4,078
8
38
55

63

votes

5 answers

Why does collecting data until finding a significant result increase Type I error rate?

I was wondering exactly why collecting data until a significant result (e.g., $p \lt .05$) is obtained (i.e., p-hacking) increases the Type I error rate? I would also highly appreciate an R demonstration of this phenomenon.

r hypothesis-testing p-value simulation type-i-and-ii-errors

asked Oct 26 '17 at 17:29

Reza

876
7
10

63

votes

14 answers

If we fail to reject the null hypothesis in a large study, isn't it evidence for the null?

A basic limitation of null hypothesis significance testing is that it does not allow a researcher to gather evidence in favor of the null (Source) I see this claim repeated in multiple places, but I can't find justification for it. If we perform a…

hypothesis-testing

asked Apr 25 '17 at 04:55

Atte Juvonen

1,199
1
10
18

63

votes

4 answers

Should I use a categorical cross-entropy or binary cross-entropy loss for binary predictions?

First of all, I realized if I need to perform binary predictions, I have to create at least two classes through performing a one-hot-encoding. Is this correct? However, is binary cross-entropy only for predictions with only one class? If I were to…

machine-learning neural-networks loss-functions tensorflow cross-entropy

asked Feb 07 '17 at 15:02

infomin101

1,363
4
14
20

63

votes

3 answers

Who created the first standard normal table?

I'm about to introduce the standard normal table in my introductory statistics class, and that got me wondering: who created the first standard normal table? How did they do it before computers came along? I shudder to think of someone brute-force…

normal-distribution algorithms history tables

asked Sep 04 '16 at 23:16

Daniel Smolkin

633
5
7

63

votes

6 answers

Test if two binomial distributions are statistically different from each other

I have three groups of data, each with a binomial distribution (i.e. each group has elements that are either success or failure). I do not have a predicted probability of success, but instead can only rely on the success rate of each as an…

statistical-significance binomial-distribution bernoulli-distribution

asked Aug 28 '14 at 17:14

Scott

900
1
8
12

63

votes

3 answers

Interpreting Residual and Null Deviance in GLM R

How to interpret the Null and Residual Deviance in GLM in R? Like, we say that smaller AIC is better. Is there any similar and quick interpretation for the deviances also? Null deviance: 1146.1 on 1077 degrees of freedom Residual deviance: 4589.4…

generalized-linear-model deviance

asked Jul 23 '14 at 10:18

Anjali

891
3
10
10

62

votes

7 answers

Why doesn't Random Forest handle missing values in predictors?

What are theoretical reasons to not handle missing values? Gradient boosting machines, regression trees handle missing values. Why doesn't Random Forest do that?

random-forest missing-data boosting

asked May 16 '14 at 13:08

Fedorenko Kristina

723
1
6
6

62

votes

3 answers

A generalization of the Law of Iterated Expectations

I recently came across this identity: $$E \left[ E \left(Y|X,Z \right) |X \right] =E \left[Y | X \right]$$ I am of course familiar with the simpler version of that rule, namely that $E \left[ E \left(Y|X \right) \right]=E \left(Y\right) $ but I was…

self-study conditional-probability conditional-expectation

asked May 01 '14 at 13:17

JohnK

18,298
10
60
103

62

votes

3 answers

What is the effect of having correlated predictors in a multiple regression model?

I learned in my linear models class that if two predictors are correlated and both are included in a model, one will be insignificant. For example, assume the size of a house and the number of bedrooms are correlated. When predicting the cost of a…

regression multiple-regression p-value linear-model multicollinearity

asked Feb 11 '14 at 22:23

Vivek Subramanian

2,613
2
19
34

Most Popular