The beginning of the Wikipedia article on PCA seems completely wrong to me (my italics):
“The principal components of a collection of points in a real coordinate space are a sequence of p unit vectors, where the i-th vector is the direction of a line that best fits the data while being orthogonal to the first i-1 vectors. Here, a best-fitting line is defined as one that minimizes the average squared distance from the points to the line.”
This sounds like regression but with PCA principal components are chosen that maximize the variance of the transformed data; no line is being “fit.” Am I missing something?