Computational complexity for linear discriminant analysis

Question

The linear discriminant analysis algorithm is as follows:

LDA algorithm

I want to conduct a computational complexity for it. For each step, the complexity is as follows:

For each $c$, there are $N_cd$ additions and $1$ division. Thus, in total, there are $Nd+C$ operations.
$Nd$ additions and $1$. Thus, in total, there are $(Nd+1)$ operations.
There are $N(d+d^2)$, the first $d$ is for $x_j-m_c$, and the second $d^2$ is for ${(x_j-m_c)}{(x_j-m_c)}^T$.
There are $C(d+d^2+d^2)$, the first $d$ is for $m_c-m$, the second $d^2$ is for ${(m_c-m)}{(m_c-m)}^T$, and the third $d^2$ is for the multiplication between $N_c$ and the matrix.
There are $d^3+d^3+d^3$, the first $d^3$ is for ${S_w}^{-1}$ and the second $d^3$ is for the multiplication between ${S_w}^{-1}$ and $S_b$, and the third $d^3$ is for the eigen-decomposition.

In general, we have $N>d>C$, thus the dominated term is $Nd^2$. To sum up, the computational time complexity is $O(Nd^2)$.

My question is: is the analysis right? If not, please help correct it. I have this question because I've read some papers on this subject, but there are different results among them. Any comments would be highly appreciated.

What's about the complexity of the asymmetric eigen-problem (point 5)? I think that this question is not statistical and would better go to a computer science forum. — ttnphns, Mar 20 '15 at 06:44
Hi, @ttnphns, thank you for replying! In this case, the matrix is symmetric, thus I just consider the symmetric eigen-problem. In fact, the complexity of an eigen-problem have several complexity that depends on the solution. A description is referred to [http://mathoverflow.net/questions/62904/complexity-of-eigenvalue-decomposition]. Thus I just use the ordinal algorithms, not the fastest speeding algorithms. In addition, because I think the linear discriminant analysis is on statistical, I post it here. — mining, Mar 20 '15 at 06:56
Are you sure that it is symmetric? For all that I know it is not ([see](http://stats.stackexchange.com/a/48859/3277) LDA algo in footnote 1). — ttnphns, Mar 20 '15 at 07:04
@ttnphns, oh, yes! I made a mistake, the $S_w$ may not be symmetric. Thus, the eigen-problem is for a asymmetric matrix. Thank you very much for pointing out this problem and the reference. — mining, Mar 20 '15 at 07:14
I'm a little surprised at the close votes--this seems like a pretty reasonable question for Cross Validated, though it'd probably be on topic at CS too.... — Matt Krause, Mar 21 '15 at 01:21
@MattKrause,thanks for your support! In fact, I've also post this question in the Theoretical Computer Science [http://cstheory.stackexchange.com/questions/30868/computational-complexity-for-linear-discriminant-analysis]. Maybe this problem is too simple or basic. Thus, I've been reading some books on the algorithmic analysis. — mining, Mar 21 '15 at 01:48
Just FYI, people generally don't like cross-posting the same question to multiple sites (at least, not right away). If you want to move it to CS, I think a mod can make that happen for you. — Matt Krause, Mar 21 '15 at 03:23
For general (i.e. k-class, also called "canonical") LDA algorithm please see https://stats.stackexchange.com/a/48859/3277 and further links there. — ttnphns, Jan 11 '18 at 17:12

score 3 · Accepted Answer · answered Mar 21 '15 at 03:17

There are two potential bottlenecks:

As you pointed out, finding the within-class scatter in step 3 takes $N(d+d^2)$ steps, or $O(Nd^2)$.
Both the matrix multiplication and eigendecomposition in step 5 take approximately $O(d^3)$. Depending on the algorithm, matrix multiplication can be a bit faster: there are algorirthms that are $\sim O(n^{2.4})$ and you can also cheat a bit because you don't need to compute every single eigenvector.

The other steps are relatively trivial, so your answer depends on whether $N>d$. If you have more features than examples, it's $O(d^3)$, otherwise it's $O(Nd^2)$

This paper by Cai, He, and Han walks through the space/time complexity in more detail. Annoyingly, they use $m$ to mean the number of examples, which you called $N$, while using $N$ to refer to the number of features, which you call $d$, so beware.

The paper also shows a related algorithm that is more time and space efficient.

Hi, @MattKrause, thank you very much for affirming this description. In fact, I've read Cai's paper that you referenced. Because they proposed a speeding algorithm for LDA, there are some steps that are different from the steps I posted. They also discussed the cases $m>n$ and $m — mining, Mar 21 '15 at 04:28

Computational complexity for linear discriminant analysis

1 Answers1

Linked