I have text data from customer inquiries, and want to figure out what are the main topics customers enquire about.
I am approaching this by using a pre-trained BERT SentenceTransformer model ('paraphrase-MiniLM-L6-v2') to embed the sentences first, and then use HDBSCAN to cluster the embeddings.
My question is - What metric should i be using to evaluate the accuracy of clusters? By accuracy I mean having inquiries of the same topic fall in the same cluster. This is also important so that I know if i am making improvements while fine-tuning the pretrained model or changing the clustering parameters.
My attempt so far - One idea I have had to measure the accuracy is to compute what percent of our replies to customer inquiries get assigned to the same cluster that their corresponding inquiry was assigned to. The hypothesis here is that an inquiry and its reply belong to the same topic, so they should get assigned to the same cluster. Does this sound reasonable?