4

I'm trying to understand the computational bounds of t-SNE. It's learned with SGD, so it'll have to go through some number of gradient-descent iterations. We can ignore that here, and focus on the time for each iteration. Barnes-Hut changes it from $O(N^2)$ to $O(N log(N))$ on each iteration, with $N$ data points. But how does it scale with $M$ dimensions?

It would have to be at least $O(N log(N) M)$. But is there an $M^2$ component anywhere?

amoeba
  • 93,463
  • 28
  • 275
  • 317
Leopd
  • 179
  • 5

0 Answers0