Resources about LDA usually say the number of components is bounded by the number of classes - 1. E.g, in the binary case, only one component can be found.
In LDA, the first discriminant direction $\Phi_1$ is calculated as argmax of $\frac{\Phi_1^T S_b \Phi_1}{\Phi_1^T S_w \Phi_1}$ where $S_b$ and $S_w$ are the between-class and within-class covariance matrices, respectively. Why can't we continue this way and compute the $i$th direction $\Phi_i$ to be the argmax of $\frac{\Phi_i^T S_b \Phi_i}{\Phi_i^T S_w \Phi_i}$ under the constraint of orthogonality to $\Phi_1, \Phi_2 \dots \Phi_{i-1}$, as is done in PCA?
Ostenbily, in the binary case, where each $\Phi_i$ is a vector, one can do it $n$ times, if the inputs are $n$ dimensional vectors.