Let $X_i=(x_{i1},...,x_{ip})$ be a vector of $p$-dimensional $n$ observations. Suppose that we apply principal component analysis and find the $p$ principal components. Also, suppose that the $p$ loadings of the first component $u_1$ are all positive and roughly of the same size.
Why can we regard the first component to be an approximate scaled average of the $p$ original variables?
(I suppose this question is related to Can averaging all the variables be seen as a crude form of PCA?, in addition to the answer given to that question I would like to see why the above claim holds in a more formal fashion)