I was studying linear regression lately and checking the assumptions for Ordinary Least Squares method for the regression problem. I was not sure about the intuition behind the difference of squares being used as the error measure (instead of something like x^4 or x^6) and I found an answer somewhere on stackoverflow that said:
The main idea of OLS is to not minimising the error but making it equal to zero. The errors (residuals) follow normal distribution with a variance in the order of the square. So to minimise the errors we need to use least squares.
And then I got to the square difference of values being used in variance. So why is square function employed in the definition of variance. Is it just convenience or is there any reason for not using functions in the order of x^4 or x^6.