I'm trying to understand the computational bounds of t-SNE. It's learned with SGD, so it'll have to go through some number of gradient-descent iterations. We can ignore that here, and focus on the time for each iteration. Barnes-Hut changes it from $O(N^2)$ to $O(N log(N))$ on each iteration, with $N$ data points. But how does it scale with $M$ dimensions?
It would have to be at least $O(N log(N) M)$. But is there an $M^2$ component anywhere?