I am fairly familiar with the practical application of principal component analysis (PCA). PCA tries to find the first PC, for example, by minimization of the sum of squared perpendicular distances of x1, x2, . . . , xn from the lower-dimensional subspace.
However, as far as I understand PCA does not add any 'value' as a data-reduction technique if the variables x1, x2, . . . , xn in the original design matrix X hardly correlate. In case of zero correlations, for instance, the orthonormal matrix B ends up being the identity matrix and the transformation Z = XB yields the initial design matrix X.
However, how could I show mathematically that PCA performs still poor in cases where the correlations between the original variables x1, x2, . . . , xn in the design matrix X is low but still larger than zero? My hunch is that in those cases the rotations of the original coordinate axes is rather insubstantial. Or in other words, the angle between the original coordinate axes and the transformed coordinate axes is small. But how to prove this? Thanks