6

From the definition of one-class classification in wikipedia:

In machine learning, one-class classification, also known as unary classification or class-modelling, tries to identify objects of a specific class amongst all objects, by learning from a training set containing only the objects of that class.

And:

A similar problem is PU learning, in which a binary classifier is learned in a semi-supervised way from only positive and unlabeled sample points.

I'm looking for examples of this in the context of Bayesian Networks. I imagine a case in which for one node of the network there are only positive examples available, but a binary classifier is desired. I want to know of any examples of such a case, if possible with associated code. I'm mostly using the bnlearn package in R but I could use anything.

Note: although this has been used extensively for outlier detection, the problem I would like to tackle isn't really an outlier detection one. I want to model animal species distribution and I only have positive examples: places where an animal species was observed. But naturally cannot be sure of the species absence if it wasn't observed during a certain sampling effort.

JEquihua
  • 3,525
  • 2
  • 24
  • 44

1 Answers1

2

Quite simple, yet actionable approach:

  • Collect your data, preprocess them to get categorical features $X$.
  • Create, tune, optimize the Bayesian network for $X$ with bnlearn. As a result we have practically a probability distribution $p(X)$.
  • Take all your observations and calculate their likelihoods $L_i=p(x_i)$.
  • Based on the likelihoods define a threshold $\theta$ for false negatives, i.e. if the desired sensitivity is e.g. 95%, you should take the likelihood that corresponds to the 5th quantile.

The resulting classifier is then: $p(X)>\theta$.

The trick is that you never know how similar are the unobserved counter examples. However, based on some ex-post observations, you can tune the threshold also with the respect to specificity.

Karel Macek
  • 2,463
  • 11
  • 23
  • Sounds interesting. I'm testing it out right now. Do you by any chance know of literature were an approach like this has been used? Thank you! – JEquihua Oct 09 '18 at 22:01
  • It seems it's not possible, out of the box, to fit a bayesian network with a node with only 1 level: Error in check.data(data, allow.missing = TRUE) : variable Species must have at least two levels. How would you fit the bn in this context? – JEquihua Oct 10 '18 at 12:32
  • I would not involve Species in the model. – Karel Macek Oct 11 '18 at 17:38