Questions tagged [faq]

Added by moderators to canonical forms of frequently asked questions

The FAQ tag helps people find information that may be difficult to search for but is often requested.

29 questions
522
votes
23 answers

Why square the difference instead of taking the absolute value in standard deviation?

In the definition of standard deviation, why do we have to square the difference from the mean to get the mean (E) and take the square root back at the end? Can't we just simply take the absolute value of the difference instead and get the expected…
c4il
  • 5,465
  • 4
  • 16
  • 9
516
votes
3 answers

Relationship between SVD and PCA. How to use SVD to perform PCA?

Principal component analysis (PCA) is usually explained via an eigen-decomposition of the covariance matrix. However, it can also be performed via singular value decomposition (SVD) of the data matrix $\mathbf X$. How does it work? What is the…
amoeba
  • 93,463
  • 28
  • 275
  • 317
296
votes
8 answers

What should I do when my neural network doesn't learn?

I'm training a neural network but the training loss doesn't decrease. How can I fix this? I'm not asking about overfitting or regularization. I'm asking about how to solve the problem where my network's performance doesn't improve on the training…
Sycorax
  • 76,417
  • 20
  • 189
  • 313
228
votes
8 answers

Algorithms for automatic model selection

I would like to implement an algorithm for automatic model selection. I am thinking of doing stepwise regression but anything will do (it has to be based on linear regressions though). My problem is that I am unable to find a methodology, or an…
S4M
  • 2,432
  • 3
  • 13
  • 6
215
votes
4 answers

How to interpret a QQ plot

I am working with a small dataset (21 observations) and have the following normal QQ plot in R: Seeing that the plot does not support normality, what could I infer about the underlying distribution? It seems to me that a distribution more skewed…
JohnK
  • 18,298
  • 10
  • 60
  • 103
204
votes
17 answers

Intuitive explanation for dividing by $n-1$ when calculating standard deviation?

I was asked today in class why you divide the sum of square error by $n-1$ instead of with $n$, when calculating the standard deviation. I said I am not going to answer it in class (since I didn't wanna go into unbiased estimators), but later I…
Tal Galili
  • 19,935
  • 32
  • 133
  • 195
204
votes
8 answers

In linear regression, when is it appropriate to use the log of an independent variable instead of the actual values?

Am I looking for a better behaved distribution for the independent variable in question, or to reduce the effect of outliers, or something else?
d_2
  • 2,191
  • 3
  • 14
  • 5
190
votes
10 answers

Why is accuracy not the best measure for assessing classification models?

This is a general question that was asked indirectly multiple times in here, but it lacks a single authoritative answer. It would be great to have a detailed answer to this for the reference. Accuracy, the proportion of correct classifications among…
Tim
  • 108,699
  • 20
  • 212
  • 390
174
votes
6 answers

Can a probability distribution value exceeding 1 be OK?

On the Wikipedia page about naive Bayes classifiers, there is this line: $p(\mathrm{height}|\mathrm{male}) = 1.5789$ (A probability distribution over 1 is OK. It is the area under the bell curve that is equal to 1.) How can a value $>1$ be OK? I…
100
votes
9 answers

Is there an intuitive explanation why multicollinearity is a problem in linear regression?

The wiki discusses the problems that arise when multicollinearity is an issue in linear regression. The basic problem is multicollinearity results in unstable parameter estimates which makes it very difficult to assess the effect of independent…
user28
98
votes
8 answers

What is the benefit of breaking up a continuous predictor variable?

I'm wondering what the value is in taking a continuous predictor variable and breaking it up (e.g., into quintiles), before using it in a model. It seems to me that by binning the variable we lose information. Is this just so we can model…
Tom
  • 1,511
  • 1
  • 12
  • 17
93
votes
6 answers

Principled way of collapsing categorical variables with many levels?

What techniques are available for collapsing (or pooling) many categories to a few, for the purpose of using them as an input (predictor) in a statistical model? Consider a variable like college student major (discipline chosen by an undergraduate…
69
votes
8 answers

What are good basic statistics to use for ordinal data?

I have some ordinal data gained from survey questions. In my case they are Likert style responses (Strongly Disagree-Disagree-Neutral-Agree-Strongly Agree). In my data they are coded as 1-5. I don't think means would mean much here, so what basic…
PaulHurleyuk
  • 1,549
  • 3
  • 16
  • 18
67
votes
5 answers

How small a quantity should be added to x to avoid taking the log of zero?

I have analysed my data as they are. Now I want to look at my analyses after taking the log of all variables. Many variables contain many zeros. Therefore I add a small quantity to avoid taking the log of zero. So far I've added 10^-10, without any…
miura
  • 3,364
  • 3
  • 21
  • 27
59
votes
5 answers

Best practice when analysing pre-post treatment-control designs

Imagine the following common design: 100 participants are randomly allocated to either a treatment or a control group the dependent variable is numeric and measured pre- and post- treatment Three obvious options for analysing such data are: Test…
Jeromy Anglim
  • 42,044
  • 23
  • 146
  • 250
1
2