I understand that when we are doing PCA, we are choosing an axis where the data has the maximum amount of variance. Then, we are restricted to choosing only axes which are orthogonal to the first axis we chose.
That means that if we want to maximize some kind of average or sum of the variances of all axes that we chose, we may not get to an optimal solution with PCA, right?
From what I understand, PCA chooses the axis with most amount of variance at each step. So does that make it greedy?