1

I couldn't find any information in the documentation of rapidminer. I have a data set with the following attributes: a,b,c,d,e. The types are: numerical, binomial, binomial, binomial, binomial. Binomial values are given as {true, false}.

The last one is the label I want to be able to predict. So the value to predict is a true/false decision. I understand that I have to use logistic regression for that.

My module chain looks like this.

Read CSV -> Set Role -> Nominal To Binary -> Classification by Regression

Set Role: I can only set one attribute as the predictor attribute. How can I set all other input attributes as predictor attribute too?

Nominal to Binary: The binomial values are given as Strings "true" and "false". That's why I do the conversion here.

The output for MultiModelByRegression is:

MultiModelByRegression (prediction model for label is_helpful)

      16.889 * b ^ 1.000
    + 9.553 * c ^ 2.000
    + 0.102 * a ^ 1.000
    - 71.438
      76.078 * b ^ 4.000
    + 38.618 * c ^ 1.000
    + 0.082 * a ^ 1.000
    - 88.701

The Performance Vector Output is:

    true false  true true   class precision
pred. false 2706    129 95.45%
pred. true  636 40  5.92%
class recall    80.97%  23.67%  

I know how to interpret the above.

All of this is done with the training set, which is already labelled with the correct classes. How do I apply the test set at this point? I suppose I need the test set to somehow evaluate the results of my classifier, right? Anyway, I am really confused here and I would appreciate any kind of help.

funkywon
  • 11
  • 1
  • 3
  • question: You say "I know how to interpret the above." but the title of the question states otherwise. What is correct here ? Up to this point I answered only the questions appearing in the body. – mlwida Feb 20 '12 at 12:24

1 Answers1

1

Regarding "Set Role"

"prediction" is referring to the predicted label AFTER the application of a model to an exampleset. By default, all attributes with role "regular" are used as predictors. So the operator Set Role can be skipped here.

Regarding application of a model You have to load the test-set separately and apply the model to it. Something like this

<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<process version="5.2.000">
  <context>
    <input/>
    <output/>
    <macros/>
  </context>
  <operator activated="true" class="process" compatibility="5.2.000" expanded="true" name="Process">
    <process expanded="true" height="443" width="636">
      <operator activated="true" class="retrieve" compatibility="5.2.000" expanded="true" height="60" name="load_train" width="90" x="45" y="30">
        <parameter key="repository_entry" value="//Samples/data/Golf"/>
      </operator>
      <operator activated="true" class="naive_bayes" compatibility="5.2.000" expanded="true" height="76" name="Naive Bayes" width="90" x="179" y="30"/>
      <operator activated="true" class="retrieve" compatibility="5.2.000" expanded="true" height="60" name="load_test" width="90" x="45" y="210">
        <parameter key="repository_entry" value="//Samples/data/Golf-Testset"/>
      </operator>
      <operator activated="true" class="apply_model" compatibility="5.2.000" expanded="true" height="76" name="Apply Model" width="90" x="313" y="120">
        <list key="application_parameters"/>
      </operator>
      <operator activated="true" class="performance_binominal_classification" compatibility="5.2.000" expanded="true" height="76" name="Performance" width="90" x="447" y="120"/>
      <connect from_op="load_train" from_port="output" to_op="Naive Bayes" to_port="training set"/>
      <connect from_op="Naive Bayes" from_port="model" to_op="Apply Model" to_port="model"/>
      <connect from_op="load_test" from_port="output" to_op="Apply Model" to_port="unlabelled data"/>
      <connect from_op="Apply Model" from_port="labelled data" to_op="Performance" to_port="labelled data"/>
      <connect from_op="Performance" from_port="performance" to_port="result 1"/>
      <connect from_op="Performance" from_port="example set" to_port="result 2"/>
      <portSpacing port="source_input 1" spacing="0"/>
      <portSpacing port="sink_result 1" spacing="0"/>
      <portSpacing port="sink_result 2" spacing="0"/>
      <portSpacing port="sink_result 3" spacing="0"/>
    </process>
  </operator>
</process>

Simply copy and paste the code to the XML tab in the design-perspective to make it work.

However, in order to make a solid statement about the accuracy of your classifier you should perform a xvalidation. See e.g. the process under /Samples/03_Validation/XValidation_Nominal

Additionally, in order to reduce the confusion about the two types of model-applications, i.e. XValidation on the first dataset and application of the final model on the second /holdout-set, I recommend this question: Why only three partitions? (training, validation, test)

mlwida
  • 9,922
  • 2
  • 45
  • 74
  • Hi steffen! Thank you for your answer. Sorry for the ambiguity. I meant that I know how to interpret the performance vector output. But I dont know how to interpret the output from multimodel by regression.. – funkywon Feb 23 '12 at 03:14
  • I tried your approach and adjust it a little bit, but I constantly get the error "input example set does not have a predicted attribute." Could you take a look at the code to see what might be the issue? I'd really appreciate that... How can I add a larger part of xml in here? I wasnt able to do so... :( – funkywon Feb 23 '12 at 05:20
  • @funkywon 1. I suggest to edit your question, not to add an answer 2. Just copy the xml code into the editor, select it and click on the code - symbol ({}). If this does not, what is the actual error ? Where does the "xml-insertion" fail ? – mlwida Feb 23 '12 at 07:15
  • unfortunately it doesnt work, I have to insert 4 whitespaces before every line :((..the xml seems to be fine, the problems section doesnt show that there is anything wrong. the error appears when I try to run.. Could I email you the xml text so that you could take a look at it? long text insertion on stackexchange doesnt seem to do it... – funkywon Feb 24 '12 at 00:14
  • @funkywon hm strange, but you can send me the file. I added my email-address to my profile. – mlwida Feb 24 '12 at 08:31