Refers to the conditions under which a statistics procedure yields valid estimates and/or inference. E.g., many statistical techniques require the assumption that the data are randomly sampled in some way. Theoretical results about estimators usually require assumptions about the data generating mechanism.
Questions tagged [assumptions]
878 questions
90
votes
10 answers
Is there a minimum sample size required for the t-test to be valid?
I'm currently working on a quasi-experimental research paper. I only have a sample size of 15 due to low population within the chosen area and that only 15 fit my criteria. Is 15 the minimum sample size to compute for t-test and F-test? If so, where…

Czarina Francoise
- 901
- 1
- 7
- 4
89
votes
10 answers
What is a complete list of the usual assumptions for linear regression?
What are the usual assumptions for linear regression?
Do they include:
a linear relationship between the independent and dependent variable
independent errors
normal distribution of errors
homoscedasticity
Are there any others?

tony
- 899
- 2
- 7
- 3
57
votes
3 answers
ANOVA assumption normality/normal distribution of residuals
The Wikipedia page on ANOVA lists three assumptions, namely:
Independence of cases – this is an assumption of the model that simplifies the statistical analysis.
Normality – the distributions of the residuals are normal.
Equality (or "homogeneity")…

Roman Luštrik
- 3,338
- 3
- 31
- 39
56
votes
5 answers
Regression when the OLS residuals are not normally distributed
There are several threads on this site discussing how to determine if the OLS residuals are asymptotically normally distributed. Another way to evaluate the normality of the residuals with R code is provided in this excellent answer. This is another…

Robert Kubrick
- 4,078
- 8
- 38
- 55
53
votes
3 answers
Why do we care so much about normally distributed error terms (and homoskedasticity) in linear regression when we don't have to?
I suppose I get frustrated every time I hear someone say that non-normality of residuals and /or heteroskedasticity violates OLS assumptions. To estimate parameters in an OLS model neither of these assumptions are necessary by the Gauss-Markov…

Zachary Blumenfeld
- 3,826
- 1
- 14
- 21
43
votes
2 answers
Interpreting the residuals vs. fitted values plot for verifying the assumptions of a linear model
Consider the following figure from Faraway's Linear Models with R (2005, p. 59).
The first plot seems to indicate that the residuals and the fitted values are uncorrelated, as they should be in a homoscedastic linear model with normally distributed…

Evan Aad
- 1,221
- 2
- 12
- 18
38
votes
10 answers
Why are survival times assumed to be exponentially distributed?
I am learning survival analysis from this post on UCLA IDRE and got tripped up at section 1.2.1. The tutorial says:
... if the survival times were known to be exponentially distributed, then the probability of observing a survival time ...
Why…

Haitao Du
- 32,885
- 17
- 118
- 213
37
votes
6 answers
Assumptions of linear models and what to do if the residuals are not normally distributed
I am a little bit confused on what the assumptions of linear regression are.
So far I checked whether:
all of the explanatory variables correlated linearly with the response variable. (This was the case)
there was any collinearity among the…

Stefan
- 705
- 2
- 8
- 9
35
votes
5 answers
What are the dangers of violating the homoscedasticity assumption for linear regression?
As an example, consider the ChickWeight data set in R. The variance obviously grows over time, so if I use a simple linear regression like:
m <- lm(weight ~ Time*Diet, data=ChickWeight)
My questions:
Which aspects of the model will be…

Dan M.
- 830
- 1
- 7
- 11
34
votes
2 answers
What are the assumptions of negative binomial regression?
I'm working with a large data set (confidential, so I can't share too much), and came to the conclusion a negative binomial regression would be necessary. I've never done a glm regression before, and I can't find any clear information about what the…

Carly
- 489
- 1
- 4
- 9
32
votes
6 answers
Sample size for logistic regression?
I want to make a logistic model from my survey data. It is a small survey of four residential colonies in which only 154 respondents were interviewed. My dependent variable is "satisfactory transition to work". I found that, of the 154 respondents,…

Braj-Stat
- 561
- 2
- 7
- 6
31
votes
2 answers
Are 50% confidence intervals more robustly estimated than 95% confidence intervals?
My question flows out of this comment on an Andrew Gelman's blog post in which he advocates the use of 50% confidence intervals instead of 95% confidence intervals, although not on the grounds that they are more robustly estimated:
I prefer 50% to…

user1205901 - Reinstate Monica
- 11,303
- 26
- 77
- 152
30
votes
4 answers
Checking assumptions lmer/lme mixed models in R
I ran a repeated design whereby I tested 30 males and 30 females across three different tasks. I want to understand how the behaviour of males and females is different and how that depends on the task. I used both the lmer and lme4 package to…

crazjo
- 752
- 1
- 10
- 19
30
votes
1 answer
How incorrect is a regression model when assumptions are not met?
When fitting a regression model, what happens if the assumptions of the outputs are not met, specifically:
What happens if the residuals are not homoscedastic? If the
residuals show an increasing or decreasing pattern in Residuals vs.
Fitted…

SpeedBirdNine
- 679
- 7
- 14
30
votes
3 answers
What does "independent observations" mean?
I'm trying to understand what the assumption of independent observations means. Some definitions are:
"Two events are independent if and only if $P(a \cap b) = P(a) * P(b)$." (Statistical Terms Dictionary)
"the occurrence of one event doesn't…

RubenGeert
- 605
- 1
- 5
- 11