0

I am dealing with a dataset in which there are 493 observations spanned over 30 predictors. My intention is to fit a model to make accurate predictions.

It seems to me that the ratio $\frac{n}{p}$ is relatively small to fit a regression model (correct me if I'm wrong about this); therefore, I am trying to fit a tree model (bagging, random forest, or boosting) to the dataset.

My question is does tree-based models also suffer from the stability-issues that result from a low $\frac{\text{Number of observations}}{\text{Number of predictors}}$ ratio as regression models do? Is this ratio an important factor in tree-based methods (assume it matters at all)? Why or why not?

Any suggestions about relevant reading/literatures would also be appreciated.

Jack Shi
  • 521
  • 1
  • 3
  • 14
  • This could be a duplicate; you could benefit from searching earlier threads. Also, here is a very recent [related thread](http://stats.stackexchange.com/questions/225383/required-sample-size-and-degrees-of-freedom-for-a-var) (specific to VAR models but can be generalized). – Richard Hardy Jul 25 '16 at 08:43
  • Thanks for the suggestion. I guess I'm more interested in the role $\frac{\text{Number of observations}}{\text{Number of predictors}}$ plays in tree-based method, which I did not find any reference from earlier threads. Already edited the question. – Jack Shi Jul 25 '16 at 08:54

0 Answers0