3

Context

I must find the general form of a linear equation. The $X$ and $Y$ values are the location coordinates of touches on a screen. I want to find the best fit line, described by equation $mX + nY + p = 0$ , so I need to find $m, n, p$.

I want to be able to make regression of horizontal lines as well as vertical lines, or diagonal lines. I decided to make Orthogonal Distance Regression (ODR) instead of Ordinary Least Square (OLS) one.

I am using SciPy library (python langage), and I've written below function to make the job. It is using Single Value Decomposition (SVD). I've tested it, and it is giving excellent results. Now, I want to get the coefficient of determination $R^2$.

Coefficient of determination

According to Wikipedia, the definition is $$ R^2 = 1 - \frac{SS_{res}}{SS_{tot}} $$ where $SS_{res}$ is the sum of squares of residuals, and $SS_{tot}$ is the total sum of squares. To apply it in my case I considered: $$ SS_{res} = \sum_id((x_i,y_i);(\hat{x}, \hat{y}))^2 $$ where $ d((x_i,y_i);(\hat{x}, \hat{y}))$ is the distance of point $(x_i,y_i)$ to the best fit line.

and $$ SS_{tot} = \sum_id((x_i,y_i);(\overline{x}, \overline{y}))^2 $$ where $\overline{x}$ is the mean of all the $x_i$ and $\overline{y}$ is the mean of all the $y_i$.

My question

Is it the correct way to compute the coefficient of determination?

I am asking because it looks optimistic, in the sense that it is very often closed to 1.

gung - Reinstate Monica
  • 132,789
  • 81
  • 357
  • 650
RawBean
  • 256
  • 2
  • 8
  • I don't follow your study or analysis. What is your response variable, where someone touched the screen? What are the predictor variables? – gung - Reinstate Monica Feb 13 '14 at 18:04
  • It could sounds weird, but the predictor are not Y, neither X. I'm gathering a list of (X,Y) coordinates, and I need to compute the max distance of the points to the best fit line. So I need to compute the best fit line first. – RawBean Feb 13 '14 at 18:22
  • The best fit line of what? I have no idea what this is supposed to be about. – gung - Reinstate Monica Feb 13 '14 at 18:23
  • 3
    @gung I believe the objective is to identify an affine linear subspace of the plane which minimizes the sum of squared distances from the touchpoints to that line. To put the question into a more familiar language, the fitting is done by PCA (by means of a singular value decomposition of the covariance matrix of the centered coordinates, as is usual). The question asks how to compute the proportion of variance explained by the first principal component. – whuber Feb 13 '14 at 20:14
  • @whuber Yes, you're right. That's exactly what I am doing. Thanks for your clear wording. – RawBean Feb 13 '14 at 21:27

0 Answers0