Most Popular

1500 questions
64
votes
5 answers

Is it meaningful to calculate Pearson or Spearman correlation between two Boolean vectors?

There are two Boolean vectors, which contain 0 and 1 only. If I calculate the Pearson or Spearman correlation, are they meaningful or reasonable?
Zhilong Jia
  • 785
  • 1
  • 6
  • 9
63
votes
3 answers

Explain the xkcd jelly bean comic: What makes it funny?

I see that one time out of the twenty total tests they run, $p < 0.05$, so they wrongly assume that during one of the twenty tests, the result is significant ($0.05 = 1/20$). xkcd jelly bean comic - "Significant" Title: Significant Hover text:…
63
votes
4 answers

Perform feature normalization before or within model validation?

A common good practice in Machine Learning is to do feature normalization or data standardization of the predictor variables, that's it, center the data substracting the mean and normalize it dividing by the variance (or standard deviation too). For…
SkyWalker
  • 825
  • 1
  • 7
  • 12
63
votes
3 answers

What is the difference in Bayesian estimate and maximum likelihood estimate?

Please explain to me the difference in Bayesian estimate and Maximum likelihood estimate?
triomphe
  • 787
  • 1
  • 6
  • 9
63
votes
32 answers

What are the worst (commonly adopted) ideas/principles in statistics?

In my statistical teaching, I encounter some stubborn ideas/principles relating to statistics that have become popularised, yet seem to me to be misleading, or in some cases utterly without merit. I would like to solicit the views of others on this…
Ben
  • 91,027
  • 3
  • 150
  • 376
63
votes
9 answers

Advanced statistics books recommendation

There are several threads on this site for book recommendations on introductory statistics and machine learning but I am looking for a text on advanced statistics including, in order of priority: maximum likelihood, generalized linear models,…
63
votes
5 answers

Why does collecting data until finding a significant result increase Type I error rate?

I was wondering exactly why collecting data until a significant result (e.g., $p \lt .05$) is obtained (i.e., p-hacking) increases the Type I error rate? I would also highly appreciate an R demonstration of this phenomenon.
Reza
  • 876
  • 7
  • 10
63
votes
14 answers

If we fail to reject the null hypothesis in a large study, isn't it evidence for the null?

A basic limitation of null hypothesis significance testing is that it does not allow a researcher to gather evidence in favor of the null (Source) I see this claim repeated in multiple places, but I can't find justification for it. If we perform a…
Atte Juvonen
  • 1,199
  • 1
  • 10
  • 18
63
votes
4 answers

Should I use a categorical cross-entropy or binary cross-entropy loss for binary predictions?

First of all, I realized if I need to perform binary predictions, I have to create at least two classes through performing a one-hot-encoding. Is this correct? However, is binary cross-entropy only for predictions with only one class? If I were to…
63
votes
3 answers

Who created the first standard normal table?

I'm about to introduce the standard normal table in my introductory statistics class, and that got me wondering: who created the first standard normal table? How did they do it before computers came along? I shudder to think of someone brute-force…
Daniel Smolkin
  • 633
  • 5
  • 7
63
votes
6 answers

Test if two binomial distributions are statistically different from each other

I have three groups of data, each with a binomial distribution (i.e. each group has elements that are either success or failure). I do not have a predicted probability of success, but instead can only rely on the success rate of each as an…
63
votes
3 answers

Interpreting Residual and Null Deviance in GLM R

How to interpret the Null and Residual Deviance in GLM in R? Like, we say that smaller AIC is better. Is there any similar and quick interpretation for the deviances also? Null deviance: 1146.1 on 1077 degrees of freedom Residual deviance: 4589.4…
Anjali
  • 891
  • 3
  • 10
  • 10
62
votes
7 answers

Why doesn't Random Forest handle missing values in predictors?

What are theoretical reasons to not handle missing values? Gradient boosting machines, regression trees handle missing values. Why doesn't Random Forest do that?
Fedorenko Kristina
  • 723
  • 1
  • 6
  • 6
62
votes
3 answers

A generalization of the Law of Iterated Expectations

I recently came across this identity: $$E \left[ E \left(Y|X,Z \right) |X \right] =E \left[Y | X \right]$$ I am of course familiar with the simpler version of that rule, namely that $E \left[ E \left(Y|X \right) \right]=E \left(Y\right) $ but I was…
JohnK
  • 18,298
  • 10
  • 60
  • 103
62
votes
3 answers

What is the effect of having correlated predictors in a multiple regression model?

I learned in my linear models class that if two predictors are correlated and both are included in a model, one will be insignificant. For example, assume the size of a house and the number of bedrooms are correlated. When predicting the cost of a…