In principal components analysis, the principal components are chosen according to three criteria. The first component is chosen to be the direction in the data with greatest variance. The second is chosen to be the direction with greatest variance GIVEN that it is orthogonal to the first.
I am wondering if there is an intuitive way to understand this without having to resort to the proof, of which is hard for me to extract intuition from? Thanks.