2

I have a set of data with 6 feature and an attached score for each data point within the set.

I have build a linear regressor for my data.

The thing is that the output values are pretty huge, approximately 10000 and I obtain values between 9600 and 10000. How can I measure the error of my system such that it falls between 0 and 1?

chl
  • 50,972
  • 18
  • 205
  • 364
Simon
  • 301
  • 1
  • 2
  • 7
  • Are you saying you fit a linear regression model to your data? When you say "How can I measure the error of my system such that it falls between 0 and 1?" are you just asking for a measure of how well your model describes the data? Have you heard of $R^2$? – Macro Jun 29 '12 at 12:46
  • note that _regressor_ is another term for the _predictor variable_ (see http://en.wikipedia.org/wiki/Dependent_and_independent_variables#Alternative_terminology_in_statistics) – Pardis Jun 29 '12 at 16:35

1 Answers1

1

This is what $R^2$ does. Perhaps even better would be adjusted $R^2_{adj}$.

$$ R^2 = 1 - \dfrac{\sum(\hat y_i - y_i)^2}{\sum (\hat y_i - \bar y)^2} $$

$$ R^2_{adj} = 1 - \dfrac{n-1}{n-p-1}(1-R^2) $$

(In these equations, $n$ is the sample size, and $p$ is the number of parameters (not counting the intercept).)

$R^2_{adj}$ gives a penalty for having many parameters without much improvement in $R^2$ compared to a model with fewer parameters. This is nice, because we can make $R^2 = 1$ by fitting a model that just connects the dots, but that would not be expected to generalize. A common term you might hear about this is overfitting to the training data.

There are a few warnings.

  1. It is easy to think of these scores as being like grades in school, where $95\%$ is an A that will make you happy, and $65\%$ is a D that will make you sad. A score of even much lower than $65\%$ could be quite splendid, and a score of higher than $95\%$ could be quite pedestrian.

  2. There is an interpretation of $R^2$ as being a proportion of variance explained by the model. This breaks down when the model is nonlinear$^{\dagger}$ or linear but estimated with a method other than least squares.

For these reasons, I am skeptical of how useful $R^2$ is as a measure of absolute performance.

$^{\dagger}$The derivation of $R^2$ as a proportion of variance explained relies on that "other" term in my link being zero.

Dave
  • 28,473
  • 4
  • 52
  • 104