PCA (Principal Component Analysis) is often used to represent 2d or 3d plot of the data, where y=PC2 and x=PC1 (eventually z=PC3). Given that there is an 'order' between components, it makes sense to use the first two (three) to represent data (since the first one is the direction which maximizes the data variance, the second one is the second best-uncorrelated direction and so on).
LDA (Linear Discriminant Analysis) is also sometimes used to plot data. In cases in which more than 3 classes are involved (so that k>2 LDs are produced), should one assume that the first two (or three) linear discriminant are the ones which better represent data (as one does with PCA) or not? Why?