In PCA we start with a dataset and we reduce its dimensions by giving it new features that are each a linear combination of the original features of the dataset, and only keeping the ones with maximum variance.
These new features of our reduced dataset are eigenvectors of the covariance matrix of our original dataset. For some reason, making the new features the eigenvectors of the original covariance matrix does two things:
These new features will have a much larger variance than any of the features in the original dataset had. I don’t see why…
These new features will all have zero covariance with one another…I also don’t see why.
Update - I understand why they have zero covariance with one a other: it's because the covariance matrices is symmetric, meaning that its eigenvectors are perpendicular.
Covariance can be thought of as a dot product between two vectors, and if the vectors are perpendicular, the dot product is zero.
But...I'm still confused as to why we use THOSE directions.