1

I am trying to really understand Sum of Squares, but what I read is either the summation formula for each one or answers like this.

Could some one point me towards a good reference that gives a bit more detail, to explain the purpose and how each of them are used, and how to think about $S_{xx}$ or explain how and why it relates to $S_{b}$ intuitively.

MaoYiyi
  • 529
  • 3
  • 7
  • 15
  • 1
    Have you searched our site? Some very closely related posts appear at http://stats.stackexchange.com/questions/22501/is-there-an-intuitive-interpretation-of-ata and http://stats.stackexchange.com/questions/1447/r-squared-i-have-never-fully-grasped-the-interpretation/1448#1448. The first one is about $A'A$, which is the sum-of-squares matrix, while the second is about $R^2$. – whuber Jan 20 '13 at 10:47
  • @whuber I typed in the key terms and read many of the questions. how does everyone else seem to find these questions that i have missed? I not trying to be rude, and your comment is helpful. – MaoYiyi Jan 21 '13 at 11:31
  • 1
    Many of us remember those questions :-). When I really need to search carefully, I use both the site search engine (trying various combinations of keywords) and Google (including the `site:stats.stackexchange.com` term). In this particular case, it took some creativity: remembering there is a nice geometrical interpretation of sums of squares, I included `geometry` in the site search. – whuber Jan 21 '13 at 15:28

1 Answers1

4

Let's say you have a distribution of observations

1 2 3 4 5 6 7 8

To figure out the mean you would sum the values and divide by the number of values

1 + 2 + 3 + 4 + 5 + 6 + 7 + 8
_____________________________  =  4.5
              8

Consider one data point

3

How much does 3 deviate from the mean?

The answer is

4.5 - 3 = 1.5

Next, consider the data point

6

Again, how much does 6 deviate from the mean?

6 - 4.5 = 1.5

We just calculated the absolute deviation from the mean. Both 3 and 6 are 1.5 data points away from the mean 4.5.

If we were to describe a data point in our data set, we could say: "This data point is 1.5 units away from the mean". However, we then face a problem. Are we describing the value 3 or are we describing 6?

To solve this problem, we have to use the +/- signs to indicate the relative deviation from the mean. Is the number to the left or to the right of the mean?

3 + 1.5 = 4.5          3 is to the left of the mean

6 - 1.5 = 4.5          6 is to the right

This means

4.5 - 3 = 1.5

4.5 - 6 = -1.5

Next, consider the distribution

3 4.5 6

Not only do we want to know how each number deviates from the mean. We would like to represent the deviation of all numbers in the set from the mean using a single number: the standard deviation (SD).

Let's try adding up the numbers of deviation we calculated above

3 is 1.5 units away from 4.5

6 is -1.5 units away from 4.5

4.5 is 0 units away from 4.5

We have

1,5 + (-1,5) + 0 = 0

So the average deviation is 0? That can't be right.

Solution

SQR(1.5) + SQR(-1.5) + SQR(0) = 2.25

This is the sum of squares. When we square each number, we end up converting the - sign into +. Now, 2.25 does not really describe the average of 1.5, 1.5, 0 does it? Because it's an average we have to divide by the number of values. For reasons I cannot explain in brevity, we have to take the number of values n minus 1.

2.25 / (3 - 1) = 2.25 / 2 = 1.125

That sounds more reasonable when we have two deviations at 1.5 and one at 0. This is the variance.

Next, to undo the squaring we did, we should take the square root of that number

SQRT(1.125) ~= 1.06

And that's how we arrive at the standard deviation.

In summary, the sum of squares is necessary to describe the relative deviation from the mean. I haven't touched upon the relevance of sum of squares to statistical testing, but if you consider the sum of squares as a measure of deviation, the formulas will make more sense. I am afraid that the best way to grasp these formulas is to calculate tests on smaller data sets by hand. Keep doing it until you get fluent at the task. When you come to the point that you can visualize the operations, what numbers go where, you will develop an intuitive feel for the formulas.

[EDIT below]

Sxx, Syy, Sxy

Sxx is the distance from the sample (x) to the mean (x bar). Because Sxx is written with Sigma we say "For every x, sum the results of...". Thus, Sxx takes one sample at a time, subtracts the mean, and squares the result. The results for each mean is then summed up to give Sxx.

Data set (x,y)

x = 3, 4.5, 6 
The mean of x (x bar) = 4.5

y = 1, 4, 12
The mean of y (y bar) = 5.67

Sxx = SQR(3 - 4.5) + SQR(4.5 - 4.5) + SQR(6 - 4.5)

Similarly

Syy = SQR(1 - 5.67) + SQR(4 - 5.67) + SQR(12 - 5.67)

And

Sxy = (3 - 4.5)*(1 - 5.67) + (4.5 - 4.5)*(4 - 5.67) +(6 - 4.5)*(12 - 5.67)

SST and SSR in Linear Regression

In a graph with data points along the axes x and y, the total sum of squares SST represents how much the position of each observed data point deviates from the mean. In the same plot, we could draw a line that fits between all the data points. This line approximates the original observations. If we only had access to the line, but not the original observations, we would find that all the estimations of the original data points are - by definition - present on this line. The deviation of our newly estimated points from the mean of the observed values would give us an error term: the SSR.

noumenal
  • 632
  • 5
  • 14
  • so for Sxx is the distance of x from x? – MaoYiyi Jan 21 '13 at 11:33
  • 1
    Sxx is the distance from the sample (x) to the mean (x bar). Because Sxx is written with Sigma we say "For every x, sum the results of...". Thus, Sxx takes one sample at a time, subtracts the mean, and squares the result. Data set x = 3, 4.5, 6 The mean of x (x bar) = 4.5 y = 1, 4, 12 The mean of y (y bar) = 5.67 Sxx = SQR(3 - 4.5) + SQR(4.5 - 4.5) + SQR(6 - 4.5) Syy = SQR(1 - 5.67) + SQR(4 - 5.67) + SQR(12 - 5.67) Sxy = (3 - 4.5)*(1 - 5.67) + (4.5 - 4.5)*(4 - 5.67) +(6 - 4.5)*(12 - 5.67) I've edited the answer. See above. It's should be easier to read. – noumenal Jan 21 '13 at 14:10
  • what is it used for? Most of these just seem to be just measurements, really want to know what is the purpose and uses of each part of SST is. – MaoYiyi Jan 21 '13 at 19:36
  • In a graph with data points along the axes x and y, the total sum of squares SST represents how much the position of each observed data point deviates from the mean. In the same plot, we could draw a line that fits between all the data points. This line approximates the original observations. If we only had access to the line, but not the original observations, we would find that all the estimations of the original data points are - by definition - present on this line. The deviation of our newly estimated points from the mean of the observed values would give us an error term: the SSR. – noumenal Jan 22 '13 at 07:47
  • My textbook said nothing about this, do you know of one that does explain regression in this amount of detail? – MaoYiyi Jan 22 '13 at 14:16
  • Few textbooks that I've come across introduce regression by using the term sum of squares. It's more common to introduce the term when referring to analysis of variance (ANOVA). If you're taking a course in statistics I would recommend working on the practicals rather than theory. There will always be more time for theory later :) Try browsing a few introductory text books in statistics at your library. I could provide you with several recommendations on general books, but none of them introduces sum of squares in the context of regression. – noumenal Jan 22 '13 at 17:52
  • Good books on regression that I have read are listed below. A good introduction: Miles, J., & Shevlin, M. (2001). Applying Regression and Correlation: A guide for students and researchers. LA: Sage. More advanced (lucidly written nevetheless): Cohen, J., Cohen, P., West, S. G., & Aiken, L. S. (2003). Applied multiple regression/correlation analysis for the behavioral sciences (3rd ed.). Mahwah, NJ: Lawrence and Erlbaum. – noumenal Jan 22 '13 at 18:13