0

I am trying to train a C&R Tree in IBM SPSS Modeler. The issue I have is that when I click on the Model nugget (generated model) and view the constructed tree, I see that the algorithm uses only 70% of my data in training the model. I have tried this on multiple data sets and still got the same thing. I Also checked classifiers in SPSS Modeler (e.g. CHAID) but did not face this issue. Would highly appreciate you help.

AliCivil
  • 133
  • 1
  • 6

2 Answers2

1

The C&R Tree Node is using an "Overfit prevention set", with default value of 30%.

The documentation in "ModelerModelingNodes.pdf" expains this as:

Overfit prevention set. The algorithm internally separates records into a model building set and an overfit prevention set, which is an independent set of data records used to track errors during training in order to prevent the method from modeling chance variation in the data. Specify a percentage of records. The default is 30.

enter image description here

tomaz
  • 206
  • 1
  • 4
0

In addition to the previous, please, have in mind, that when constructing a regression, it only allows you to use full records, so if you have empty cells, the algo most likely deletes those records.

Julian
  • 1
  • 3