I'm trying to apply binary classification with OpenNLP. I could already successfully classify movies by different genres. The data sample has the form:
Genre1 Description 1
Genre1 Description 2
Genre2 Description 3
etc.
Evaluating my model I get a correct classification of another movie description:
Thriller : 2.1301979251908337E-14
Romantic : 0.9999999999999787
---------------------------------
Romantic : is the predicted category for the given sentence.
But how can I apply a binary classification, where the result should be either yes or no? So in my case, I would like to know if a movie belongs to the genre romantic or not. I'm not interested in the other genres.
When I just delete all the other genres from my data sample:
Genre1 Description 1
Genre1 Description 2
I get the following error:
Training data must contain more than one outcome
This is my code (source):
public static void main(String[] args) {
try {
// read the training data
Path path = Paths.get("C:", "Users");
InputStreamFactory dataIn = new MarkableFileInputStreamFactory(new File(path + File.separator + "training" + File.separator + "en-movie-categoryNew.train"));
ObjectStream lineStream = new PlainTextByLineStream(dataIn, "UTF-8");
ObjectStream sampleStream = new DocumentSampleStream(lineStream);
// define the training parameters
TrainingParameters params = new TrainingParameters();
params.put(TrainingParameters.ITERATIONS_PARAM, 10 + "");
params.put(TrainingParameters.CUTOFF_PARAM, 0 + "");
params.put(AbstractTrainer.ALGORITHM_PARAM, NaiveBayesTrainer.NAIVE_BAYES_VALUE);
// create a model from traning data
DoccatModel model = DocumentCategorizerME.train("en", sampleStream, params, new DoccatFactory());
System.out.println("\nModel is successfully trained.");
// save the model to local
BufferedOutputStream modelOut = new BufferedOutputStream(new FileOutputStream(path + File.separator + "model" + File.separator + "en-movie-classifier-maxent.bin"));
model.serialize(modelOut);
System.out.println("\nTrained Model is saved locally at : " + "model" + File.separator + "en-movie-classifier-maxent.bin");
// test the model file by subjecting it to prediction
DocumentCategorizer doccat = new DocumentCategorizerME(model);
String[] docWords = "Afterwards Stuart and Charlie notice Kate in the photos Stuart took at Leopolds ball and realise that her destiny must be to go back and be with Leopold That night while Kate is accepting her promotion at a company banquet he and Charlie race to meet her and show her the pictures Kate initially rejects their overtures and goes on to give her acceptance speech but it is there that she sees Stuarts picture and realises that she truly wants to be with Leopold".replaceAll("[^A-Za-z]", " ").split(" ");
double[] aProbs = doccat.categorize(docWords);
// print the probabilities of the categories
System.out.println("\n---------------------------------\nCategory : Probability\n---------------------------------");
for (int i = 0; i < doccat.getNumberOfCategories(); i++) {
System.out.println(doccat.getCategory(i) + " : " + aProbs[i]);
}
System.out.println("---------------------------------");
System.out.println("\n" + doccat.getBestCategory(aProbs) + " : is the predicted category for the given sentence.");
} catch (IOException e) {
System.out.println("An exception in reading the training file. Please check.");
e.printStackTrace();
}
}