Most Popular
1500 questions
218
votes
13 answers
How should I transform non-negative data including zeros?
If I have highly skewed positive data I often take logs. But what should I do with highly skewed non-negative data that include zeros? I have seen two transformations used:
$\log(x+1)$ which has the neat feature that 0 maps to 0.
$\log(x+c)$ where…

Rob Hyndman
- 51,928
- 23
- 126
- 178
217
votes
5 answers
Which "mean" to use and when?
So we have arithmetic mean (AM), geometric mean (GM) and harmonic mean (HM). Their mathematical formulation is also well known along with their associated stereotypical examples (e.g., Harmonic mean and it's application to 'speed' related…

PhD
- 13,429
- 19
- 45
- 47
215
votes
4 answers
How to interpret a QQ plot
I am working with a small dataset (21 observations) and have the following normal QQ plot in R:
Seeing that the plot does not support normality, what could I infer about the underlying distribution? It seems to me that a distribution more skewed…

JohnK
- 18,298
- 10
- 60
- 103
204
votes
17 answers
Intuitive explanation for dividing by $n-1$ when calculating standard deviation?
I was asked today in class why you divide the sum of square error by $n-1$ instead of with $n$, when calculating the standard deviation.
I said I am not going to answer it in class (since I didn't wanna go into unbiased estimators), but later I…

Tal Galili
- 19,935
- 32
- 133
- 195
204
votes
8 answers
In linear regression, when is it appropriate to use the log of an independent variable instead of the actual values?
Am I looking for a better behaved distribution for the independent variable in question, or to reduce the effect of outliers, or something else?

d_2
- 2,191
- 3
- 14
- 5
201
votes
6 answers
Can principal component analysis be applied to datasets containing a mix of continuous and categorical variables?
I have a dataset that has both continuous and categorical data. I am analyzing by using PCA and am wondering if it is fine to include the categorical variables as a part of the analysis. My understanding is that PCA can only be applied to continuous…

Nikolina Icitovic
- 2,011
- 3
- 13
- 4
200
votes
4 answers
What does the hidden layer in a neural network compute?
I'm sure many people will respond with links to 'let me google that for you', so I want to say that I've tried to figure this out so please forgive my lack of understanding here, but I cannot figure out how the practical implementation of a neural…

FAtBalloon
- 2,137
- 3
- 13
- 8
198
votes
3 answers
When should I use lasso vs ridge?
Say I want to estimate a large number of parameters, and I want to penalize some of them because I believe they should have little effect compared to the others. How do I decide what penalization scheme to use? When is ridge regression more…

Larry Wang
- 2,091
- 3
- 13
- 8
197
votes
3 answers
R's lmer cheat sheet
There's a lot of discussion going on on this forum about the proper way to specify various hierarchical models using lmer.
I thought it would be great to have all the information in one place.
A couple of questions to start:
How to specify multiple…
DBR
197
votes
3 answers
Generative vs. discriminative
I know that generative means "based on $P(x,y)$" and discriminative means "based on $P(y|x)$," but I'm confused on several points:
Wikipedia (+ many other hits on the web) classify things like SVMs and decision trees as being discriminative. But…

Yang
- 2,981
- 3
- 20
- 18
196
votes
7 answers
PCA on correlation or covariance?
What are the main differences between performing principal component analysis (PCA) on the correlation matrix and on the covariance matrix? Do they give the same results?

Random
- 2,140
- 3
- 13
- 8
193
votes
10 answers
How to deal with perfect separation in logistic regression?
If you have a variable which perfectly separates zeroes and ones in target variable, R will yield the following "perfect or quasi perfect separation" warning message:
Warning message:
glm.fit: fitted probabilities numerically 0 or 1 occurred
We…

user333
- 6,621
- 17
- 44
- 54
192
votes
14 answers
What is a data scientist?
Having recently graduated from my PhD program in statistics, I had for the last couple of months began searching for work in the field of statistics. Almost every company I considered had a job posting with a job title of "Data Scientist". In fact,…

RustyStatistician
- 1,709
- 3
- 13
- 35
191
votes
7 answers
What are the advantages of ReLU over sigmoid function in deep neural networks?
The state of the art of non-linearity is to use rectified linear units (ReLU) instead of sigmoid function in deep neural network. What are the advantages?
I know that training a network when ReLU is used would be faster, and it is more biological…

RockTheStar
- 11,277
- 31
- 63
- 89
190
votes
10 answers
Why is accuracy not the best measure for assessing classification models?
This is a general question that was asked indirectly multiple times in here, but it lacks a single authoritative answer. It would be great to have a detailed answer to this for the reference.
Accuracy, the proportion of correct classifications among…

Tim
- 108,699
- 20
- 212
- 390