1

I encountered a question that I really can't figure out:

Suppose your hypothesis class(H) consists of decision trees with 7 nodes that splits on only one feature. How to calculate the VC dimension of H?

I understand all the trees in the hypothesis class have 7 binary splits. But I really don't understand how to calculate the VC dimension of H?

Thank you!

1 Answers1

0

The first tree can assigns the elements of a set to its 8 leaves, so it can shatter a set of 8. With a larger set, it can subdivide it into 8 sets, some with a score of +1 and some with a score of -1.

Suppose without loss of generality that the first tree gave a score of +1 to at most 4 leaves. The second tree also contributes scores, so we end up with scores of -2, 0, or +2. We get better discriminating power by using a threshold of ">=0" so we are subdividing the smaller set. Then the second tree can again divide this into 8 subsets, giving us a shattering power of 12.

Continuing this reasoning, it seems to me that the VC dimension of n trees with 7 splits each is 4n+4.

chrishmorris
  • 820
  • 5
  • 5