5

I'm trying to track down the origins of the word "cluster" and its usage in the context of cluster analysis.

Please, does anyone know when and by whom it was first used? Perhaps there was a paper or a book which coined these terms?

Alternatively, maybe there is some literature describing the beginnings of the fields which work with the term "cluster" such as machine learning or statistics of some kind?

amoeba
  • 93,463
  • 28
  • 275
  • 317
Ecir Hana
  • 161
  • 3
  • 2
    Cross-posted at http://english.stackexchange.com/questions/274166/etymology-of-cluster-analysis-why-cluster. – whuber Sep 15 '15 at 13:57
  • Upon looking at @Whuber's comment, in what perspective do you want to know? Is it the first usage of "cluster" in ML or in English? If latter, then it might be off-topic. – Dawny33 Sep 15 '15 at 14:21
  • @Dawny33 Both, that's way I posted it there as well. I understand cross-posting is generally discouraged. – Ecir Hana Sep 15 '15 at 19:51
  • 2
    A number of things, including [this page](https://www.fmi.uni-sofia.bg/fmi/statist/education/textbook/ENG/stcluan.html) suggests that the first use of the term in statistics was Tryon's book (Tryon, R. C., 1939, *Cluster analysis*. Ann Arbor: Edwards Brothers) It looks like this paper by Roger Blashfield might be worth a look: Blashfield, R.K. (1980), "The Growth Of Cluster Analysis: Tryon, Ward, and Johnson", *Multivariate Behavioral Research*, **15**:(4), 439-458 – Glen_b Sep 16 '15 at 00:39
  • 1
    @Glen_b, +1, these are interesting references. A [google ngram search](https://books.google.com/ngrams/graph?content=cluster+analysis&year_start=1900&year_end=2000&corpus=15&smoothing=3&share=&direct_url=t1%3B%2Ccluster%20analysis%3B%2Cc0#t1%3B%2Ccluster%20analysis%3B%2Cc1) confirms that the term "cluster analysis" appears in the end of 1930s. Unfortunately, the Blashfield's paper is behind the paywall; here is a paragraph about Tryon's 1939 book: "Tryon's major area of interest in psychology concerned individual differences. During the early 1930's, the major theory ..." – amoeba Sep 16 '15 at 09:39
  • 1
    "... about individual differences among human abilities was Spearman's two factor theory of intelligence. Tryon challenged this theory (1932, 1935). He was influenced by Thurstone who also was concerned with individual differences and who was developing factor analysis as a method for studying content area. However, Tryon did not like factor analysis because it involved "a complicated mathematics which few psychologists understand," "certain undesirable assumptions," and a great deal of labor to solve (Tryon, 1939, p. 2); ..." – amoeba Sep 16 '15 at 09:41
  • 1
    "... so Tryon proposed a simpler and more direct method of finding "clusters" of variables. In his 1939 monograph, Tryon perceived of cluster analysis as a "poor man's factor analysis'' (Wrigley, 1970)". – amoeba Sep 16 '15 at 09:41

2 Answers2

9

"Cluster of rocks", "cluster of islands", "cluster of factories" etc. can easily be traced back to the 19th century (and probably much longer). Of course statistics early on started to look for a way to formalize this. So good luck, you will likely need to walk to a lot of libraries (the physical one, not the software library)!

Don't look at "machine learning". ML did not invent cluster analysis; and most cluster analysis research happens outside the ML community.

The term "cluster analysis" dates back to the 1930s statistics; but you can imagine that "cluster" in the notion above was used much earlier - but cluster analysis attempts at discovery exactly this notion of "clusters". Many of the early usage was on clustering observations in nature, such as species; either by location or by similarity. No computers involved: it probably wasn't until 1957 when the first algorithms for "cluster analysis" arrived (before that, cluster analysis was "pen & paper")

P. H. Sneath: The application of computers to taxonomy. In:Journal of general microbiology. 17(1), 1957, S. 201–226.

Has QUIT--Anony-Mousse
  • 39,639
  • 7
  • 61
  • 96
  • 2
    "Cluster" was present in almost its modern form in many proto-germanic languages by the 14th century! – whuber Sep 15 '15 at 14:03
  • Could you please expand on that "Sneath"? Is a book? – Ecir Hana Sep 15 '15 at 20:01
  • P. H. Sneath: The application of computers to taxonomy. In:Journal of general microbiology. 17(1), 1957, S. 201–226. is one of the earliest works on clustering that I know of (thanks, Wikipedia). – Has QUIT--Anony-Mousse Sep 15 '15 at 20:30
3

According to Oxford Dictionary the word cluster is derived from the Old English word 'clyster' and was "probably related to clot [or clott]" and is derived from the Germanic 'klotz'.

user89547
  • 31
  • 1
  • 2
    Yes, this information can be found almost anywhere. The reason the question is allowed on this particular site concerns its connection with *statistics.* What information can you provide about the early occurrences of this word in statistical analysis specifically? – whuber Sep 15 '15 at 17:28