I think there are different opinions or views about PCA, but basically we often think of it as either a reduction technique (you reduce your features space to a smaller one, often much more "readable" providing you take care of properly centering/standardizing the data when it is needed) or a way to construct latent factors or dimensions that account for a significant part of the inter-individual dispersion (here, the "individuals" stand for the statistical units on which data are collected; this may be country, people, etc.). In both case, we construct linear combinations of the original variables that account for the maximum of variance (when projected on the principal axis), subject to a constraint of orthogonality between any two principal components. Now, what has been described is purely algebrical or mathematical and we don't think of it as a (generating) model, contrary to what is done in the factor analysis tradition where we include an error term to account for some kind of measurement error. I also like the introduction given by William Revelle in his forthcoming handbook on applied psychometrics using R (Chapter 6), if we want to analyze the structure of a correlation matrix, then
The first [approach, PCA] is a model
that approximates the correlation
matrix in terms of the product of
components where each component is a
weighted linear sum of the variables,
the second model [factor analysis] is
also an approximation of the
correlation matrix by the product of
two factors, but the factors in this
are seen as causes rather than as
consequences of the variables.
In other words, with PCA you are expressing each component (factor) as a linear combination of the variables whereas in FA these are the variables that are expressed as a linear combination of the factors. It is well acknowledged that both methods will generally yield quite similar results (see e.g. Harman, 1976 or Catell, 1978), especially in the "ideal" case where we have a large number of individuals and a good ratio factor:variables (typically varying between 2 and 10 depending on the authors you consider!). This is because, by estimating the diagonals in the correlation matrix (as is done in FA, and these elements are known as the communalities), the error variance is eliminated from the factor matrix. This is the reason why PCA is often used as a way to uncover latent factors or psychological constructs in place of FA developed in the last century. But, as we go on this way, we often want to reach an easier interpretation of the resulting factor structure (or the so-called pattern matrix). And then comes the useful trick of rotating the factorial axis so that we maximize loadings of variables on specific factor, or equivalently reach a "simple structure". Using orthogonal rotation (e.g. VARIMAX), we preserve the independence of the factors. With oblique rotation (e.g. OBLIMIN, PROMAX), we break it and factors are allowed to correlate. This has been largely debated in the literature, and has lead some authors (not psychometricians, but statisticians in the early 1960's) to conclude that FA is an unfair approach due to the fact that researchers might seek the factor solution that is the more convenient to interpret.
But the point is that rotation methods were originally developed in the context of the FA approach and are now routinely used with PCA. I don't think this contradicts the algorithmic computation of the principal components: You can rotate your factorial axes the way you want, provided you keep in mind that once correlated (by oblique rotation) the interpretation of the factorial space becomes less obvious.
PCA is routinely used when developing new questionnaires, although FA is probably a better approach in this case because we are trying to extract meaningful factors that take into account measurement errors and whose relationships might be studied on their own (e.g. by factoring out the resulting pattern matrix, we get a second-order factor model). But PCA is also used for checking the factorial structure of already validated ones. Researchers don't really matter about FA vs. PCA when they have, say 500 representative subjects who are asked to rate a 60-item questionnaire tackling five dmensions (this is the case of the NEO-FFI, for example), and I think they are right because in this case we aren't very much interested in identifying a generating or conceptual model (the term "representative" is used here to alleviate the issue of measurement invariance).
Now, about the choice of rotation method and why some authors argue against the strict use of orthogonal rotation, I would like to quote Paul Kline, as I did in response to the following question, FA: Choosing Rotation matrix, based on “Simple Structure Criteria”,
(...) in the real world, it is not
unreasonable to think that factors, as
important determiners of behavior,
would be correlated. -- P. Kline,
Intelligence. The Psychometric View, 1991, p. 19
I would thus conclude that, depending on the objective of your study (do you want to highlight the main patterns of your correlation matrix or do you seek to provide a sensible interpretation of the underlying mechanisms that may have cause you to observe such a correlation matrix), you are up to choose the method that is the most appropriate: This doesn't have to do with the construction of linear combinations, but merely on the way you want to interpret the resulting factorial space.
References
- Harman, H.H. (1976). Modern Factor Analysis. Chicago, University of Chicago Press.
- Cattell, R.B. (1978). The Scientific Use of Factor Analysis. New York, Plenum.
- Kline, P. (1991). Intelligence. The Psychometric View. Routledge.