"Cluster of rocks", "cluster of islands", "cluster of factories" etc. can easily be traced back to the 19th century (and probably much longer). Of course statistics early on started to look for a way to formalize this. So good luck, you will likely need to walk to a lot of libraries (the physical one, not the software library)!
Don't look at "machine learning". ML did not invent cluster analysis; and most cluster analysis research happens outside the ML community.
The term "cluster analysis" dates back to the 1930s statistics; but you can imagine that "cluster" in the notion above was used much earlier - but cluster analysis attempts at discovery exactly this notion of "clusters".
Many of the early usage was on clustering observations in nature, such as species; either by location or by similarity. No computers involved: it probably wasn't until 1957 when the first algorithms for "cluster analysis" arrived (before that, cluster analysis was "pen & paper")
P. H. Sneath: The application of computers to taxonomy. In:Journal of general microbiology. 17(1), 1957, S. 201–226.