Most Popular
1500 questions
101
votes
18 answers
Including the interaction but not the main effects in a model
Is it ever valid to include a two-way interaction in a model without including the main effects? What if your hypothesis is only about the interaction, do you still need to include the main effects?

Glen
- 6,320
- 4
- 37
- 59
100
votes
5 answers
Why is ANOVA taught / used as if it is a different research methodology compared to linear regression?
ANOVA is equivalent to linear regression with the use of suitable dummy variables. The conclusions remain the same irrespective of whether you use ANOVA or linear regression.
In light of their equivalence, is there any reason why ANOVA is used…
user28
100
votes
9 answers
Is there an intuitive explanation why multicollinearity is a problem in linear regression?
The wiki discusses the problems that arise when multicollinearity is an issue in linear regression. The basic problem is multicollinearity results in unstable parameter estimates which makes it very difficult to assess the effect of independent…
user28
99
votes
1 answer
Interpreting plot.lm()
I had a question about interpreting the graphs generated by plot(lm) in R. I was wondering if you guys could tell me how to interpret the scale-location and leverage-residual plots? Any comments would be appreciated. Assume basic knowledge of…

Guest
- 991
- 2
- 7
- 3
99
votes
5 answers
What is the relation between k-means clustering and PCA?
It is a common practice to apply PCA (principal component analysis) before a clustering algorithm (such as k-means). It is believed that it improves the clustering results in practice (noise reduction).
However I am interested in a comparative and…

mic
- 3,848
- 3
- 23
- 38
98
votes
8 answers
What is the benefit of breaking up a continuous predictor variable?
I'm wondering what the value is in taking a continuous predictor variable and breaking it up (e.g., into quintiles), before using it in a model.
It seems to me that by binning the variable we lose information.
Is this just so we can model…

Tom
- 1,511
- 1
- 12
- 17
98
votes
5 answers
Mean absolute error OR root mean squared error?
Why use Root Mean Squared Error (RMSE) instead of Mean Absolute Error (MAE)??
Hi
I've been investigating the error generated in a calculation - I initially calculated the error as a Root Mean Normalised Squared Error.
Looking a little closer, I…

user1665220
- 1,105
- 1
- 8
- 6
98
votes
13 answers
What is the best way to identify outliers in multivariate data?
Suppose I have a large set of multivariate data with at least three variables. How can I find the outliers? Pairwise scatterplots won't work as it is possible for an outlier to exist in 3 dimensions that is not an outlier in any of the 2 dimensional…

Rob Hyndman
- 51,928
- 23
- 126
- 178
98
votes
9 answers
Understanding "variance" intuitively
What is the cleanest, easiest way to explain someone the concept of variance? What does it intuitively mean? If one is to explain this to their child how would one go about it?
It's a concept that I have difficulty in articulating - especially when…

PhD
- 13,429
- 19
- 45
- 47
98
votes
30 answers
Is there a way to remember the definitions of Type I and Type II Errors?
I'm not a statistician by education, I'm a software engineer. Yet statistics comes up a lot. In fact, questions specifically about Type I and Type II error are coming up a lot in the course of my studying for the Certified Software Development…

Thomas Owens
- 1,091
- 1
- 10
- 19
98
votes
8 answers
Generate a random variable with a defined correlation to an existing variable(s)
For a simulation study I have to generate random variables that show a predefined (population) correlation to an existing variable $Y$.
I looked into the R packages copula and CDVine which can produce random multivariate distributions with a given…

Felix S
- 4,432
- 4
- 26
- 34
98
votes
3 answers
Can someone explain Gibbs sampling in very simple words?
I'm doing some reading on topic modeling (with Latent Dirichlet Allocation) which makes use of Gibbs sampling. As a newbie in statistics―well, I know things like binomials, multinomials, priors, etc.―,I find it difficult to grasp how Gibbs sampling…

Thea
- 983
- 1
- 7
- 4
97
votes
1 answer
Correlation between a nominal (IV) and a continuous (DV) variable
I have a nominal variable (different topics of conversation, coded as topic0=0 etc) and a number of scale variables (DV) such as the length of a conversation.
How can I derive correlations between the nominal and scale variables?

Paul Miller
- 971
- 2
- 7
- 3
96
votes
2 answers
Solving for regression parameters in closed-form vs gradient descent
In Andrew Ng's machine learning course, he introduces linear regression and logistic regression, and shows how to fit the model parameters using gradient descent and Newton's method.
I know gradient descent can be useful in some applications of…

Jeff
- 3,525
- 5
- 27
- 38
95
votes
4 answers
How to choose nlme or lme4 R library for mixed effects models?
I have fit a few mixed effects models (particularly longitudinal models) using lme4 in R but would like to really master the models and the code that goes with them.
However, before diving in with both feet (and buying some books) I want to be sure…

Chris Beeley
- 5,465
- 5
- 36
- 40