Assume we divide a large data set D
into m
different partitions of data in a distributed learning case. We do the training in the clusters and they are local experts. Now, we have some new test points that we want to assign to the partitions and obtain the predictions. I want to know which distance metric works faster when the number of the new points increases? Also, except distance metrics, it there any similarity-based measures that can be used to make a connection between new entries and available partitions?
Asked
Active
Viewed 9 times
0

Ham82
- 113
- 5
-
It is a bit strange to start asking what measure is faster to compute, without first deciding on what can conceptually serve a proximity between a cluster and a point. Will that be a distance to centroid? medoid? nearest neighbour? etc. – ttnphns Mar 05 '22 at 08:56