(Not an expert in anomaly detection.)
I'd like to experiment with per-class anomaly detection.
That is, we have a feature vector $x$, and a classifier that predicts its class $\hat{y}$. I'd like to see if the combination $(x, \hat{y})$ is an anomaly, given some training set of non-anomalous $(x, y)$ pairs.
It seems that I can train one joint anomaly detector on $P(x,y)$, or multiple independent detectors on $P(x|y)$.
I think the latter is easier and sufficient. Are there any downsides? Also, is there a name for this technique?