The Adjusted Rand index could work. It's a popular method for measuring the similarity of two ways of assigning discrete labels to the data, ignoring permutations of the labels themselves. Instead of checking whether the raw class/cluster labels match, you'd look at pairs of points and ask: to what extent are pairs in the same class assigned to the same cluster, and pairs in different classes assigned to different clusters?
To compute the Rand index, you'd measure:
- $a$ = Number of pairs that have the same class label and same cluster assignment
- $b$ = Number of pairs that have different class labels and different cluster assignments
The raw Rand index is:
$$RI = \frac{a + b}{\binom{n}{2}}$$
where $\binom{n}{2}$ is the number of possible pairs of points. $RI$ ranges from 0 to 1, with 1 indicating total agreement.
However, a random assignment of labels probably wouldn't produce a Rand index of zero. Therefore, it's better to use the adjusted Rand index (ARI), which makes it easier to identify this type of null result. ARI ranges from -1 to 1, where negative and near-zero values indicate chance-level labelings, positive values indicate similar labelings, and 1 indicates perfect agreement.
You can also take a look at other clustering performance metrics here. The metrics that might be useful to you are the ones that compare cluster assignments to ground truth labels (i.e. your class labels): normalized/adjusted mutual information, homogeneity/completeness/v-measure, Fowlkes-Mallows score.