First a theoretical question. I know that natively, an hierarchical clustering algorithm is of complexity on the cube of number of samples N. This is due to the fact that in each iteration, one has to go over the entire distance matrix to find the smallest value.
But, it is possible to implement it in lower complexity (of N square). How it is done and can you refer me to an article or description of the implementation?
Finally, I used scikit library to do HClustering
. It was fine but it is limited to 15,000 samples or so. I want to cluster much more. can someone refer me to an implementation that fits larger numbers (and preferably runs at square root N)?