Questions tagged [regression]

Techniques for analyzing the relationship between one (or more) "dependent" variables and "independent" variables.

"Regression" is a general term for a wide variety of techniques to analyze the relationship between one (or more) dependent variables and independent variables. Typically the dependent variables are modeled with probability distributions whose parameters are assumed to vary (deterministically) with the independent variables.

Ordinary least squares (OLS) regression affords a simple example in which the expectation of one dependent variable is assumed to depend linearly on the independent variables. The unknown coefficients in the assumed linear function are estimated by choosing values for them that minimize the sum of squared differences between the values of the dependent variable and the corresponding fitted values.

26055 questions
387
votes
18 answers

What happens if the explanatory and response variables are sorted independently before regression?

Suppose we have data set $(X_i,Y_i)$ with $n$ points. We want to perform a linear regression, but first we sort the $X_i$ values and the $Y_i$ values independently of each other, forming data set $(X_i,Y_j)$. Is there any meaningful interpretation…
arbitrary user
  • 3,541
  • 3
  • 9
  • 8
271
votes
2 answers

Interpretation of R's lm() output

The help pages in R assume I know what those numbers mean, but I don't. I'm trying to really intuitively understand every number here. I will just post the output and comment on what I found out. There might (will) be mistakes, as I'll just write…
Alexander Engelhardt
  • 4,161
  • 3
  • 21
  • 25
269
votes
6 answers

Is $R^2$ useful or dangerous?

I was skimming through some lecture notes by Cosma Shalizi (in particular, section 2.1.1 of the second lecture), and was reminded that you can get very low $R^2$ even when you have a completely linear model. To paraphrase Shalizi's example: suppose…
raegtin
  • 9,090
  • 12
  • 48
  • 53
204
votes
8 answers

In linear regression, when is it appropriate to use the log of an independent variable instead of the actual values?

Am I looking for a better behaved distribution for the independent variable in question, or to reduce the effect of outliers, or something else?
d_2
  • 2,191
  • 3
  • 14
  • 5
198
votes
3 answers

When should I use lasso vs ridge?

Say I want to estimate a large number of parameters, and I want to penalize some of them because I believe they should have little effect compared to the others. How do I decide what penalization scheme to use? When is ridge regression more…
Larry Wang
  • 2,091
  • 3
  • 13
  • 8
193
votes
10 answers

How to deal with perfect separation in logistic regression?

If you have a variable which perfectly separates zeroes and ones in target variable, R will yield the following "perfect or quasi perfect separation" warning message: Warning message: glm.fit: fitted probabilities numerically 0 or 1 occurred We…
user333
  • 6,621
  • 17
  • 44
  • 54
190
votes
5 answers

How exactly does one “control for other variables”?

Here is the article that motivated this question: Does impatience make us fat? I liked this article, and it nicely demonstrates the concept of “controlling for other variables” (IQ, career, income, age, etc) in order to best isolate the true…
160
votes
9 answers

When is it ok to remove the intercept in a linear regression model?

I am running linear regression models and wondering what the conditions are for removing the intercept term. In comparing results from two different regressions where one has the intercept and the other does not, I notice that the $R^2$ of the…
analyticsPierce
  • 1,793
  • 3
  • 12
  • 6
157
votes
3 answers

How are the standard errors of coefficients calculated in a regression?

For my own understanding, I am interested in manually replicating the calculation of the standard errors of estimated coefficients as, for example, come with the output of the lm() function in R, but haven't been able to pin it down. What is the…
ako
  • 1,673
  • 3
  • 11
  • 7
147
votes
3 answers

When is R squared negative?

My understanding is that $R^2$ cannot be negative as it is the square of R. However I ran a simple linear regression in SPSS with a single independent variable and a dependent variable. My SPSS output give me a negative value for $R^2$. If I was to…
Anne
  • 1,967
  • 6
  • 17
  • 13
141
votes
8 answers

Why L1 norm for sparse models

I am reading books about linear regression. There are some sentences about the L1 and L2 norm. I know the formulas, but I don't understand why the L1 norm enforces sparsity in models. Can someone give a simple explanation?
Yongwei Xing
  • 1,583
  • 3
  • 11
  • 7
136
votes
3 answers

What is the difference between linear regression and logistic regression?

What is the difference between linear regression and logistic regression? When would you use each?
B Seven
  • 2,873
  • 4
  • 24
  • 29
135
votes
9 answers

Why does a time series have to be stationary?

I understand that a stationary time series is one whose mean and variance is constant over time. Can someone please explain why we have to make sure our data set is stationary before we can run different ARIMA or ARM models on it? Does this also…
alex
  • 1,351
  • 3
  • 9
  • 3
134
votes
9 answers

What is the difference between linear regression on y with x and x with y?

The Pearson correlation coefficient of x and y is the same, whether you compute pearson(x, y) or pearson(y, x). This suggests that doing a linear regression of y given x or x given y should be the same, but I don't think that's the case. Can…
user9097
  • 2,973
  • 7
  • 18
  • 11
131
votes
3 answers

What if residuals are normally distributed, but y is not?

I've got a weird question. Assume that you have a small sample where the dependent variable that you're going to analyze with a simple linear model is highly left skewed. Thus you assume that $u$ is not normally distributed, because this would…
MarkDollar
  • 5,575
  • 14
  • 44
  • 60
1
2 3
99 100