0

I'm new to Statistics. How do we calculate the correlation coefficient from covariance, by Standardization or Normalization?

I understand, that for better interpretation, we calculate correlation coef. Since we are dividing covariance by the standard deviations of 2 RV, it makes sense to say that correlation is a "standardized value of covariance". But, in some text books, and blogs, I could see that people referring "correlation coef" as "normalized value of covariance". Are they using these terms interchangeably?

(To my knowledge, Normalization changes the values to be in the range of 0-1, and Standardization converts the values, such that it has 0 mean and 1 standard deviation.)

chqdrian
  • 127
  • 4
  • Normalization is a broader term than you say. For example, making mean or sum of _squares_ of the data values is also called "normalization" (i.e. bringing "norm" to unit value). – ttnphns Oct 25 '19 at 16:52
  • Even if you are a new to statistics I would recommend you to read [this](https://stats.stackexchange.com/a/22520/3277) which explains how and why covariance and correlation are kin. – ttnphns Oct 25 '19 at 16:56

1 Answers1

1

"Normalization" and "standardization" are used in ambiguous ways in statistics and machine learning literature. They don't have any fixed meaning.

Pearson's correlation coefficient is defined as

$$ \operatorname{corr}(X,Y) = \frac{\operatorname{cov}(X,Y)}{\sigma(X) \; \sigma(Y)} $$

It is a "normalized covariance" in the sense that this operation transforms covariance to $[-1, 1]$ range, it has unified units that do not depend on the scaling (standard deviations) of $X$ and $Y$.

Tim
  • 108,699
  • 20
  • 212
  • 390