Let's say you have a distribution of observations
1 2 3 4 5 6 7 8
To figure out the mean you would sum the values and divide by the number of values
1 + 2 + 3 + 4 + 5 + 6 + 7 + 8
_____________________________ = 4.5
8
Consider one data point
3
How much does 3 deviate from the mean?
The answer is
4.5 - 3 = 1.5
Next, consider the data point
6
Again, how much does 6 deviate from the mean?
6 - 4.5 = 1.5
We just calculated the absolute deviation from the mean. Both 3 and 6 are 1.5 data points away from the mean 4.5.
If we were to describe a data point in our data set, we could say: "This data point is 1.5 units away from the mean". However, we then face a problem. Are we describing the value 3 or are we describing 6?
To solve this problem, we have to use the +/- signs to indicate the relative deviation from the mean. Is the number to the left or to the right of the mean?
3 + 1.5 = 4.5 3 is to the left of the mean
6 - 1.5 = 4.5 6 is to the right
This means
4.5 - 3 = 1.5
4.5 - 6 = -1.5
Next, consider the distribution
3 4.5 6
Not only do we want to know how each number deviates from the mean. We would like to represent the deviation of all numbers in the set from the mean using a single number: the standard deviation (SD).
Let's try adding up the numbers of deviation we calculated above
3 is 1.5 units away from 4.5
6 is -1.5 units away from 4.5
4.5 is 0 units away from 4.5
We have
1,5 + (-1,5) + 0 = 0
So the average deviation is 0? That can't be right.
Solution
SQR(1.5) + SQR(-1.5) + SQR(0) = 2.25
This is the sum of squares. When we square each number, we end up converting the - sign into +. Now, 2.25 does not really describe the average of 1.5, 1.5, 0 does it? Because it's an average we have to divide by the number of values. For reasons I cannot explain in brevity, we have to take the number of values n minus 1.
2.25 / (3 - 1) = 2.25 / 2 = 1.125
That sounds more reasonable when we have two deviations at 1.5 and one at 0. This is the variance.
Next, to undo the squaring we did, we should take the square root of that number
SQRT(1.125) ~= 1.06
And that's how we arrive at the standard deviation.
In summary, the sum of squares is necessary to describe the relative deviation from the mean. I haven't touched upon the relevance of sum of squares to statistical testing, but if you consider the sum of squares as a measure of deviation, the formulas will make more sense. I am afraid that the best way to grasp these formulas is to calculate tests on smaller data sets by hand. Keep doing it until you get fluent at the task. When you come to the point that you can visualize the operations, what numbers go where, you will develop an intuitive feel for the formulas.
[EDIT below]
Sxx, Syy, Sxy
Sxx is the distance from the sample (x) to the mean (x bar). Because Sxx is written with Sigma we say "For every x, sum the results of...". Thus, Sxx takes one sample at a time, subtracts the mean, and squares the result. The results for each mean is then summed up to give Sxx.
Data set (x,y)
x = 3, 4.5, 6
The mean of x (x bar) = 4.5
y = 1, 4, 12
The mean of y (y bar) = 5.67
Sxx = SQR(3 - 4.5) + SQR(4.5 - 4.5) + SQR(6 - 4.5)
Similarly
Syy = SQR(1 - 5.67) + SQR(4 - 5.67) + SQR(12 - 5.67)
And
Sxy = (3 - 4.5)*(1 - 5.67) + (4.5 - 4.5)*(4 - 5.67) +(6 - 4.5)*(12 - 5.67)
SST and SSR in Linear Regression
In a graph with data points along the axes x and y, the total sum of squares SST represents how much the position of each observed data point deviates from the mean. In the same plot, we could draw a line that fits between all the data points. This line approximates the original observations. If we only had access to the line, but not the original observations, we would find that all the estimations of the original data points are - by definition - present on this line. The deviation of our newly estimated points from the mean of the observed values would give us an error term: the SSR.