Highest Voted Questions - Statistical Analysis Stack Exchange

218

votes

13 answers

How should I transform non-negative data including zeros?

If I have highly skewed positive data I often take logs. But what should I do with highly skewed non-negative data that include zeros? I have seen two transformations used: $\log(x+1)$ which has the neat feature that 0 maps to 0. $\log(x+c)$ where…

data-transformation large-data

asked Aug 09 '10 at 13:57

Rob Hyndman

51,928
23
126
178

217

votes

5 answers

Which "mean" to use and when?

So we have arithmetic mean (AM), geometric mean (GM) and harmonic mean (HM). Their mathematical formulation is also well known along with their associated stereotypical examples (e.g., Harmonic mean and it's application to 'speed' related…

types-of-averages

asked Feb 19 '12 at 19:43

PhD

13,429
19
45
47

215

votes

4 answers

How to interpret a QQ plot

I am working with a small dataset (21 observations) and have the following normal QQ plot in R: Seeing that the plot does not support normality, what could I infer about the underlying distribution? It seems to me that a distribution more skewed…

r data-visualization inference qq-plot faq

asked Jun 05 '14 at 10:44

JohnK

18,298
10
60
103

204

votes

17 answers

Intuitive explanation for dividing by $n-1$ when calculating standard deviation?

I was asked today in class why you divide the sum of square error by $n-1$ instead of with $n$, when calculating the standard deviation. I said I am not going to answer it in class (since I didn't wanna go into unbiased estimators), but later I…

standard-error intuition teaching bessels-correction faq

asked Oct 23 '10 at 22:04

Tal Galili

19,935
32
133
195

204

votes

8 answers

In linear regression, when is it appropriate to use the log of an independent variable instead of the actual values?

Am I looking for a better behaved distribution for the independent variable in question, or to reduce the effect of outliers, or something else?

regression distributions data-transformation logarithm faq

asked Jul 20 '10 at 13:11

d_2

2,191
3
14
5

201

votes

6 answers

Can principal component analysis be applied to datasets containing a mix of continuous and categorical variables?

I have a dataset that has both continuous and categorical data. I am analyzing by using PCA and am wondering if it is fine to include the categorical variables as a part of the analysis. My understanding is that PCA can only be applied to continuous…

categorical-data pca correspondence-analysis mixed-type-data

asked Dec 28 '10 at 03:47

Nikolina Icitovic

2,011
3
13
4

200

votes

4 answers

What does the hidden layer in a neural network compute?

I'm sure many people will respond with links to 'let me google that for you', so I want to say that I've tried to figure this out so please forgive my lack of understanding here, but I cannot figure out how the practical implementation of a neural…

machine-learning neural-networks nonlinear-regression

asked Jul 02 '13 at 15:59

FAtBalloon

2,137
3
13
8

198

votes

3 answers

When should I use lasso vs ridge?

Say I want to estimate a large number of parameters, and I want to penalize some of them because I believe they should have little effect compared to the others. How do I decide what penalization scheme to use? When is ridge regression more…

regression lasso ridge-regression

asked Jul 28 '10 at 01:10

Larry Wang

2,091
3
13
8

197

votes

3 answers

R's lmer cheat sheet

There's a lot of discussion going on on this forum about the proper way to specify various hierarchical models using lmer. I thought it would be great to have all the information in one place. A couple of questions to start: How to specify multiple…

r mixed-model random-effects-model fixed-effects-model lme4-nlme

asked Jul 17 '11 at 21:50

DBR

197

votes

3 answers

Generative vs. discriminative

I know that generative means "based on $P(x,y)$" and discriminative means "based on $P(y|x)$," but I'm confused on several points: Wikipedia (+ many other hits on the web) classify things like SVMs and decision trees as being discriminative. But…

machine-learning generative-models

asked Jun 27 '11 at 20:40

Yang

2,981
3
20
18

196

votes

7 answers

PCA on correlation or covariance?

What are the main differences between performing principal component analysis (PCA) on the correlation matrix and on the covariance matrix? Do they give the same results?

correlation pca covariance factor-analysis

asked Jul 19 '10 at 19:39

Random

2,140
3
13
8

193

votes

10 answers

How to deal with perfect separation in logistic regression?

If you have a variable which perfectly separates zeroes and ones in target variable, R will yield the following "perfect or quasi perfect separation" warning message: Warning message: glm.fit: fitted probabilities numerically 0 or 1 occurred We…

r regression logistic separation

asked May 22 '11 at 10:37

user333

6,621
17
44
54

192

votes

14 answers

What is a data scientist?

Having recently graduated from my PhD program in statistics, I had for the last couple of months began searching for work in the field of statistics. Almost every company I considered had a job posting with a job title of "Data Scientist". In fact,…

terminology definition careers

asked Feb 11 '16 at 08:44

RustyStatistician

1,709
3
13
35

191

votes

7 answers

What are the advantages of ReLU over sigmoid function in deep neural networks?

The state of the art of non-linearity is to use rectified linear units (ReLU) instead of sigmoid function in deep neural network. What are the advantages? I know that training a network when ReLU is used would be faster, and it is more biological…

machine-learning neural-networks sigmoid-curve

asked Dec 02 '14 at 02:13

RockTheStar

11,277
31
63
89

190

votes

10 answers

Why is accuracy not the best measure for assessing classification models?

This is a general question that was asked indirectly multiple times in here, but it lacks a single authoritative answer. It would be great to have a detailed answer to this for the reference. Accuracy, the proportion of correct classifications among…

machine-learning model-evaluation accuracy scoring-rules faq

asked Nov 09 '17 at 07:32

Tim

108,699
20
212
390

Most Popular