0

If I run a regression over my full data set in excel it provides a formula

y = 100x

or y= 100x +some other value if my example wasnt so simplistic

enter image description here

In this simplistic answer

If I have x = 20.

I would simply multiply 100 * 20 to give 2000.

My question is: In traditional regression calculations how do I obtain this slope formula that excel provides?

y=100X

is there a way to obtain this from my data?

  • Regression works if there is an exact linear relationship, but it is designed for cases where the original data includes noise in the dependent variable and so the original points only approximate a straight line – Henry Jul 02 '18 at 23:30
  • Ah yes - for sure I was just testing a really simple version – Andrew Bannerman Jul 02 '18 at 23:35
  • https://en.wikipedia.org/wiki/Simple_linear_regression .. or see almost any basic statistics text – Glen_b Jul 03 '18 at 03:13

2 Answers2

2

Simple linear regression will suggest calculating

$$\hat\beta = \frac{ \sum\limits_{i=1}^n \left(x_i - \frac1n\sum\limits_{j=1}^n x_j\right)\left(y_i - \frac1n \sum\limits_{j=1}^n y_j \right) }{ \sum\limits_{i=1}^n \left(x_i - \frac1n\sum_{j=1}^n x_j\right)^2 }$$ $$\hat\alpha = \frac1n \sum\limits_{j=1}^n y_j - \hat\beta\,\frac1n\sum\limits_{j=1}^n x_j$$

and then the regression line

$$\hat y_i = \hat\alpha + \hat\beta x_i $$

which passes through the point $(\bar x, \bar y)$ with optimial slope, so as to minimise the sum of squares of residuals $\sum\limits_{i=1}^n (y_i-\hat y _i)^2$

With your example it will give $\hat\beta=100$ and $\hat\alpha=0$

Henry
  • 30,848
  • 1
  • 63
  • 107
  • Ok thanks - so lets say I have 1000 data points. If I work out simple regression over the full sample. Obtain B and A. If I needed a slope data point on data point 10 for example. I could simple run the calculation at data point 10 y= 100X + a? – Andrew Bannerman Jul 02 '18 at 23:22
  • If you want the point on the regression line corresponding to a particular value for $x$, then just substitute that value for $x$ into $\hat y = \hat\alpha + \hat\beta x$ to find the corresponding value of $\hat{y}$ and so getting the point $(x, \hat y)$ – Henry Jul 02 '18 at 23:27
  • Ok and that will give me , for that given y,x position - the data point value for the slope right? – Andrew Bannerman Jul 02 '18 at 23:31
  • The slope of the regression line is $\hat \beta$ – Henry Jul 02 '18 at 23:32
  • Ok - is there a way with the output of a regression to find the x,y point of the slope at any given data point, post calculating it for the entire sample? Reason I ask - I want to run another data set correlation against the best fit line and obtain the correlation coefficient. I can increment the regression +1 point at a time to obtain a slope value for each data point for the whole sample but it becomes very computationally expensive. So wanted to check if there was a way to do 1x simlple regression for whole sample and back track to fill in – Andrew Bannerman Jul 02 '18 at 23:34
  • Ok it's not possible. As the fit is best fit at that point in time. Increases accuracy with more data points. So can only use the best know fit to estimate future not back track? – Andrew Bannerman Jul 02 '18 at 23:59
1

“Ok - is there a way with the output of a regression to find the x,y point of the slope at any given data point, post calculating it for the entire sample?”

The output of the simple linear regression model is a point on the fitted line and that is the predicted value of y. There is not an ‘x,y point’ of the slope.

“I can increment the regression +1 point at a time to obtain a slope value for each data point for the whole sample but it becomes very computationally expensive.”

The slope is fixed for every value of a given predictor variable.

“Ok it's not possible. As the fit is best fit at that point in time. Increases accuracy with more data points. So can only use the best know fit to estimate future not back track?”

There is no time dimension in the values ‘predicted’ by the model. It explains the sample data and estimates a level of the response variable as a function of the coefficient and independent variable.

This question here received a thorough answer explaining the similarities and differences between correlation and regression:

What is the difference between linear regression on y with x and x with y?