The most common methods I've seen to find a line of best fit are Least Squares regression and median-median. Are there other good ways? Is there a way to minimize the absolute value difference and find a line of best fit that way? Or to find the distance straight to a line instead of the vertical distance to the line? Thoughts?
-
2Yes! Many posts on this site. One keyword to search for is L1. – Nick Cox Jul 11 '13 at 00:37
-
Could you clarify what you mean by 'median-median' please? – Glen_b Jul 11 '13 at 01:33
-
An effective method for more general curve fitting (advocated by John Tukey for exploratory data analysis) that is not as well known as it should be is described and illustrated at http://stats.stackexchange.com/questions/35711/box-cox-like-transformation-for-independent-variables/35717#35717. A simpler version of it is to pick two "representative" points near the extremes of a scatterplot and draw the line they determine, *provided* that line appears to be a reasonable first-order description of the points in the scatterplot. – whuber Jul 11 '13 at 01:36
-
@Glen_b, "Median-median" can mean the line through [median($x$), median($y$)] with slope = median( $y_i / x_i$ ). This is robust and takes only a few lines of code, but is afaik hard to analyze theoretically. – denis Jul 10 '15 at 10:15
-
@denis yes, thanks for that one -- there are a few things "median-median" could mean. My guess is the OP means something else, though it's hard to be sure. [If you know of a book or paper that discusses the one you mention, I'd be interested to take a look.] – Glen_b Jul 10 '15 at 11:00
-
@Glen_b, sorry, not really. There's [cran views Robust](http://cran.r-project.org/web/views/Robust.html) ff. -- more methods than test cases. For fun, how would you do this with $n$ predictors -- first PCA ? – denis Jul 10 '15 at 11:51
-
@denis The reason I ask is I expect the slope calculation is done on median-centered variables, but I wanted to be sure. – Glen_b Jul 10 '15 at 12:31
-
@Glen_b yes, that's right, or at least how I do it -- medianline.py under [gist.github.com/denis-bz](https://gist.github.com/denis-bz) . – denis Jul 10 '15 at 13:59
1 Answers
Minimizing the sum of absolute differences is quite common, as Nick Cox suggests, it's often called L1 regression or Least absolute deviations regression; it's also a specific case of quantile regression and many posts here relate to it.
http://en.wikipedia.org/wiki/Least_absolute_deviations
http://en.wikipedia.org/wiki/Quantile_regression
The orthogonal distance (what I assume you mean by "straight-line distance") would correspond to a particular case of Deming regressing, itself a particular case of the total least squares line, called orthogonal regression, which will give the line of the first principal component.
https://en.wikipedia.org/wiki/Principal_component_analysis
http://en.wikipedia.org/wiki/Deming_regression
http://en.wikipedia.org/wiki/Total_least_squares
There are many, many other lines that might be fitted; a couple of examples include Theil-Sen regression or more generally, robust regression, which includes many different techniques.
Some discussion of robust regression (including some comparison of Theil-Sen and L1 regression) is here.
There's some interesting discussion relating correlation measures to straight-line fits here

- 257,508
- 32
- 553
- 939