Most Popular
1500 questions
67
votes
3 answers
Is standardization needed before fitting logistic regression?
My question is do we need to standardize the data set to make sure all variables have the same scale, between [0,1], before fitting logistic regression. The formula is:
$$\frac{x_i-\min(x_i)}{\max(x_i)-\min(x_i)}$$
My data set has 2 variables,…

user1946504
- 1,247
- 3
- 14
- 17
67
votes
8 answers
Regression with multiple dependent variables?
Is it possible to have a (multiple) regression equation with two or more dependent variables? Sure, you could run two separate regression equations, one for each DV, but that doesn't seem like it would capture any relationship between the two DVs?

Jeff
- 3,525
- 5
- 27
- 38
67
votes
9 answers
Taleb and the Black Swan
Taleb's book "The Black Swan" was a New York Times best seller when it came out several years ago. The book is now in its second edition. After meeting with statisticians at a JSM (an annual statistical conference), Taleb toned down his criticism…

Michael R. Chernick
- 39,640
- 28
- 74
- 143
67
votes
5 answers
How small a quantity should be added to x to avoid taking the log of zero?
I have analysed my data as they are. Now I want to look at my analyses after taking the log of all variables. Many variables contain many zeros. Therefore I add a small quantity to avoid taking the log of zero.
So far I've added 10^-10, without any…

miura
- 3,364
- 3
- 21
- 27
67
votes
6 answers
Criticism of Pearl's theory of causality
In the year 2000, Judea Pearl published Causality. What controversies surround this work? What are its major criticisms?

Neil G
- 13,633
- 3
- 41
- 84
67
votes
7 answers
Where did the frequentist-Bayesian debate go?
The world of statistics was divided between frequentists and Bayesians. These days it seems everyone does a bit of both. How can this be? If the different approaches are suitable for different problems, why did the founding fathers of statistics did…

JohnRos
- 5,336
- 26
- 56
67
votes
3 answers
Variables are often adjusted (e.g. standardised) before making a model - when is this a good idea, and when is it a bad one?
In what circumstances would you want to, or not want to scale or standardize a variable prior to model fitting? And what are the advantages / disadvantages of scaling a variable?

Andrew
- 5,478
- 5
- 21
- 21
67
votes
3 answers
References containing arguments against null hypothesis significance testing?
In the last few years I've read a number of papers arguing against the use of null hypothesis significance testing in science, but didn't think to keep a persistent list. A colleague recently asked me for such a list, so I thought I'd ask everyone…

Mike Lawrence
- 12,691
- 8
- 40
- 65
66
votes
4 answers
Why is logistic regression a linear classifier?
Since we are using the logistic function to transform a linear combination of the input into a non-linear output, how can logistic regression be considered a linear classifier?
Linear regression is just like a neural network without the hidden…

Jack Twain
- 7,781
- 14
- 48
- 74
66
votes
4 answers
What is a contrast matrix?
What exactly is contrast matrix (a term, pertaining to an analysis with categorical predictors) and how exactly is contrast matrix specified? I.e. what are columns, what are rows, what are the constraints on that matrix and what does number in…

Tomas
- 5,735
- 11
- 52
- 93
66
votes
9 answers
How to visualize what ANOVA does?
What way (ways?) is there to visually explain what is ANOVA?
Any references, link(s) (R packages?) will be welcomed.

Tal Galili
- 19,935
- 32
- 133
- 195
66
votes
1 answer
Why is the square root transformation recommended for count data?
It is often recommended to take the square root when you have count data. (For some examples on CV, see @HarveyMotulsky's answer here, or @whuber's answer here.) On the other hand, when fitting a generalized linear model with a response variable…

gung - Reinstate Monica
- 132,789
- 81
- 357
- 650
66
votes
1 answer
Are unbalanced datasets problematic, and (how) does oversampling (purport to) help?
TL;DR
See title.
Motivation
I am hoping for a canonical answer along the lines of "(1) No, (2) Not applicable, because (1)", which we can use to close many wrong questions about unbalanced datasets and oversampling. I would be quite as happy to be…

Stephan Kolassa
- 95,027
- 13
- 197
- 357
66
votes
8 answers
Is the R language reliable for the field of economics?
I am a graduate student in economics who recently converted to R from other very well-known statistical packages (I was using SPSS mainly). My little problem at the moment is that I am the only R user in my class. My classmates use Stata and Gauss…

SavedByJESUS
- 1,123
- 3
- 10
- 14
66
votes
1 answer
40,000 neuroscience papers might be wrong
I saw this article in the Economist about a seemingly devastating paper [1] casting doubt on "something like 40,000 published [fMRI] studies." The error, they say, is because of "erroneous statistical assumptions." I read the paper and see it's…

R Greg Stacey
- 2,202
- 2
- 15
- 30