Why is covariance defined the way it is? $$\sigma(x,y)=\mathbb{E}[(X-\mathbb{E}[X])(Y-\mathbb{E}[Y])]$$ How do we know that this definition behaves in the following way?
Covariance is a measure of how much two random variables change together. If the greater values of one variable mainly correspond with the greater values of the other variable, and the same holds for the smaller values, i.e., the variables tend to show similar behavior, the covariance is positive. In the opposite case, when the greater values of one variable mainly correspond to the smaller values of the other, i.e., the variables tend to show opposite behavior, the covariance is negative. The sign of the covariance therefore shows the tendency in the linear relationship between the variables.
Is there any justification for correctness of this definition or history of its development? Do we just take this interpretation as an axiom? Obviously definitions cannot be wrong, but still they might somehow not agree with our intentions on how they are supposed to work.