I came across this paragraph about logistic regression with PCA in Kevin P Murphy's book on Machine Learning.
If we use PCA first, then use logistic regression afterwards, although overall, this is still representable as a logistic regression problem, the problem is constrained since we have forced linear regression to work in the subspace spanned by the PCA vectors. Consider 100 training vectors randomly positioned in a 1000 dimensional space each with a random class 0 or 1. With very high probability, these 100 vectors will be linearly separable. Now project these vectors onto a 10 dimensional space: with very high probability, 100 vectors plotted in a 10 dimensional space will not be linearly separable. Hence, arguably, we should not use PCA first since we could potentially transform a linearly separable problem into a non-linearly separable problem.
a)Please explain how to understand/visualize that "100 vectors randomly postitioned in 1000D space will be linearly separable vs 100 vectors plotted in 10D space will be non linearly separable".
b) How to apply this intuition to other problems, if applicable?