I have a point (x,y) that I need a linear regressor to pass through given a data set (X,Y). How do I implement this in R?
-
@cardinal Good answer! (Maybe you could post it as a reply?) – whuber Jun 29 '11 at 20:37
-
@cardinal. Right on the money. Please post this as an answer -- let's minimize the number of unanswered questions. :O) Ps. Up-votes headed your way. – M. Tibbits Jun 30 '11 at 01:30
-
Thank you cardinal. By the way, is there a way to force the regression line to have a negative slope? – reisner Jun 30 '11 at 02:31
-
1If the fitted line does not have a negative slope, the best you can do is a zero slope, which will pass through the point $(x,y)$, thereby uniquely determining it. – whuber Jun 30 '11 at 02:57
-
I have deleted my comment and expanded it slightly into a full answer. – cardinal Jun 30 '11 at 13:11
-
The general question "pass through a data set" (instead of just one point) was asked again a couple years later and answered fully at https://stats.stackexchange.com/questions/50447. – whuber Feb 09 '21 at 22:55
-
While this doesn't use R, for others who found this question and need a more general answer, here is an interactive example I made in Desmos: https://www.desmos.com/calculator/0ejtjkz6hh (by slope) https://www.desmos.com/calculator/389wfvp3w0 (by angle) – idealius Feb 09 '21 at 19:35
1 Answers
If $(x_0,y_0)$ is the point through which the regression line must pass, fit the model $y−y_0=\beta (x−x_0)+\varepsilon$, i.e., a linear regression with "no intercept" on a translated data set. In $R$, this might look like lm( I(y-y0) ~ I(x-x0) + 0)
. Note the + 0
at the end which indicates to lm
that no intercept term should be fit.
Depending on how easily you are convinced, there are multiple ways to demonstrate that this does, indeed, yield the correct answer. If you want to establish it formally, one simple method is to use Lagrange multipliers.
Whether or not it is actually a good idea to force a regression line to go through a particular point is a separate matter and is problem dependent. Generally, I would personally caution against this, unless there is a very good reason (e.g., very strong theoretical considerations). For one thing, fitting the full model can provide a means for measuring lack of fit. As a second matter, if you are mostly interested in evaluating model explanatory power for values of $x$ and $y$ "far away" from $(x_0,y_0)$, then the relevance of the fixed point becomes suspect.

- 24,973
- 8
- 94
- 128