Questions tagged [regression]

Techniques for analyzing the relationship between one (or more) "dependent" variables and "independent" variables.

"Regression" is a general term for a wide variety of techniques to analyze the relationship between one (or more) dependent variables and independent variables. Typically the dependent variables are modeled with probability distributions whose parameters are assumed to vary (deterministically) with the independent variables.

Ordinary least squares (OLS) regression affords a simple example in which the expectation of one dependent variable is assumed to depend linearly on the independent variables. The unknown coefficients in the assumed linear function are estimated by choosing values for them that minimize the sum of squared differences between the values of the dependent variable and the corresponding fitted values.

26055 questions

387

votes

18 answers

What happens if the explanatory and response variables are sorted independently before regression?

Suppose we have data set $(X_i,Y_i)$ with $n$ points. We want to perform a linear regression, but first we sort the $X_i$ values and the $Y_i$ values independently of each other, forming data set $(X_i,Y_j)$. Is there any meaningful interpretation…

regression correlation

asked Dec 07 '15 at 17:22

arbitrary user

3,541
3
9
8

271

votes

2 answers

Interpretation of R's lm() output

The help pages in R assume I know what those numbers mean, but I don't. I'm trying to really intuitively understand every number here. I will just post the output and comment on what I found out. There might (will) be mistakes, as I'll just write…

r regression interpretation

asked Dec 04 '10 at 11:28

Alexander Engelhardt

4,161
3
21
25

269

votes

6 answers

Is $R^2$ useful or dangerous?

I was skimming through some lecture notes by Cosma Shalizi (in particular, section 2.1.1 of the second lecture), and was reminded that you can get very low $R^2$ even when you have a completely linear model. To paraphrase Shalizi's example: suppose…

regression r-squared

asked Jul 20 '11 at 20:32

raegtin

9,090
12
48
53

204

votes

8 answers

In linear regression, when is it appropriate to use the log of an independent variable instead of the actual values?

Am I looking for a better behaved distribution for the independent variable in question, or to reduce the effect of outliers, or something else?

regression distributions data-transformation logarithm faq

asked Jul 20 '10 at 13:11

d_2

2,191
3
14
5

198

votes

3 answers

When should I use lasso vs ridge?

Say I want to estimate a large number of parameters, and I want to penalize some of them because I believe they should have little effect compared to the others. How do I decide what penalization scheme to use? When is ridge regression more…

regression lasso ridge-regression

asked Jul 28 '10 at 01:10

Larry Wang

2,091
3
13
8

193

votes

10 answers

How to deal with perfect separation in logistic regression?

If you have a variable which perfectly separates zeroes and ones in target variable, R will yield the following "perfect or quasi perfect separation" warning message: Warning message: glm.fit: fitted probabilities numerically 0 or 1 occurred We…

r regression logistic separation

asked May 22 '11 at 10:37

user333

6,621
17
44
54

190

votes

5 answers

How exactly does one “control for other variables”?

Here is the article that motivated this question: Does impatience make us fat? I liked this article, and it nicely demonstrates the concept of “controlling for other variables” (IQ, career, income, age, etc) in order to best isolate the true…

regression causality confounding controlling-for-a-variable statistics-in-media

asked Oct 20 '11 at 20:52

JackOfAll

2,597
6
20
16

160

votes

9 answers

When is it ok to remove the intercept in a linear regression model?

I am running linear regression models and wondering what the conditions are for removing the intercept term. In comparing results from two different regressions where one has the intercept and the other does not, I notice that the $R^2$ of the…

regression linear-model r-squared intercept

asked Mar 07 '11 at 09:14

analyticsPierce

1,793
3
12
6

157

votes

3 answers

How are the standard errors of coefficients calculated in a regression?

For my own understanding, I am interested in manually replicating the calculation of the standard errors of estimated coefficients as, for example, come with the output of the lm() function in R, but haven't been able to pin it down. What is the…

r regression standard-error lm

asked Dec 01 '12 at 10:16

ako

1,673
3
11
7

147

votes

3 answers

When is R squared negative?

My understanding is that $R^2$ cannot be negative as it is the square of R. However I ran a simple linear regression in SPSS with a single independent variable and a dependent variable. My SPSS output give me a negative value for $R^2$. If I was to…

regression spss r-squared

asked Jul 11 '11 at 17:07

Anne

1,967
6
17
13

141

votes

8 answers

Why L1 norm for sparse models

I am reading books about linear regression. There are some sentences about the L1 and L2 norm. I know the formulas, but I don't understand why the L1 norm enforces sparsity in models. Can someone give a simple explanation?

regression lasso regularization ridge-regression

asked Dec 11 '12 at 07:25

Yongwei Xing

1,583
3
11
7

136

votes

3 answers

What is the difference between linear regression and logistic regression?

What is the difference between linear regression and logistic regression? When would you use each?

regression logistic linear-model

asked May 28 '12 at 18:17

B Seven

2,873
4
24
29

135

votes

9 answers

Why does a time series have to be stationary?

I understand that a stationary time series is one whose mean and variance is constant over time. Can someone please explain why we have to make sure our data set is stationary before we can run different ARIMA or ARM models on it? Does this also…

regression time-series stationarity

asked Dec 12 '11 at 21:11

alex

1,351
3
9
3

134

votes

9 answers

What is the difference between linear regression on y with x and x with y?

The Pearson correlation coefficient of x and y is the same, whether you compute pearson(x, y) or pearson(y, x). This suggests that doing a linear regression of y given x or x given y should be the same, but I don't think that's the case. Can…

regression correlation linear-model pearson-r

asked Feb 13 '12 at 05:15

user9097

2,973
7
18
11

131

votes

3 answers

What if residuals are normally distributed, but y is not?

I've got a weird question. Assume that you have a small sample where the dependent variable that you're going to analyze with a simple linear model is highly left skewed. Thus you assume that $u$ is not normally distributed, because this would…

regression residuals error normality-assumption

asked Jun 23 '11 at 06:00

MarkDollar

5,575
14
44
60

2 3

…

99 100 Next