i) What is the main role of "only" trying to find orthogonal components in PCA?
I can understand, that we would not want a zero-solution as well as find directions that are orthogonal in order to explain most of the variance. When we look at the problem in terms of finding a projection matrix that preserves the gram matrix of mean centered data,
ii) What would an optimization under a non orthonormal constraint produce, as long as we make sure the solution is non-zero? Why would it be or not be useful?
I am aware of the concept of non-orthogonal eigen-functions as well, just in case you want to generalize the answer to a Kernel PCA, that would be fine as well.