I have been reading about Principal Components Analysis, and I think it is in general trying to extract as much "variance" out of the predictors $ \vec{X} = (X_1, X_2, ..., X_n)$ by selecting an optimal loading vector $\vec{\phi} = (\phi_1, ..., \phi_n)$ such that
$$Z_1 = \vec{X}^T \vec{\phi} = \phi_1 X_1 + \cdots + \phi_n X_n $$
has maximal variance. We want maximal variance because (usually), the variance in the predictors potentially can explain the variance in some response $Y$ that might be analysed in the future.
However, I have heard that you must standardize the predictors (for example, to have mean 0 and variance 1) if they are not in the same units, and also restrict the loading vector such that $\|\phi\|=1$. This is so the variance of any predictor is not arbitrarily large.
But after I standardize, if all predictors have variance 1, how will principal components analysis identify the most "explantory" predictors (those with high variance) if they are all the same now?
(How will we choose a loading vector and weight the predictors if all of them have the same variance - what would make one variable more favourable than another?)
Thanks in advance