6

I want to predict the outcome of a particular treatment (remitted or not) using demographic, plasma biomarker, genetic, and clinical data. IS a neural network model the best way of doing this? What advantages does this have over traditional logistic regression model building? How limited am I with only 120 cases and up to 40 covariates, depending on collinearity? How do I pare these down? I would normally tend towards factor analysis but will a neural net combine collinear variables sensibly? Any ideas on combining multimodal data like that would be helpful, or a starting point for reading - already have Ripley's MASS.

chl
  • 50,972
  • 18
  • 205
  • 364
  • I really think you should post the follow up as separate question. You can link back to this one if you want. The second question is really a separate issue, and I have a lot I want to say about it, but it belongs in a separate topic. – Zach Dec 04 '11 at 01:30
  • Reposted the followup question as I had it written here with minor tweaks. –  Dec 04 '11 at 15:35

2 Answers2

2

It's often a good idea to do PCA before fitting a neural network, so your instinct could be right there. The only way you are going to determine which model is better for a given problem is to cross-validate both and compare out-of-sample error.

The caret package in R is a good way to compare models using this technique (specifically the train function). As a bonus, it includes a model call pcaNNet which calculates principle components before fitting a neural network.

Zach
  • 22,308
  • 18
  • 114
  • 158
  • 1
    Doing PCA before training a neural net can have two effects: (a) making the training go faster and find better minima due to numerical reasons. This can be improved upon by taking zscores (centering and dividing by standard deviation) after PCA. (b) If you don't keep all of the principal components (but say those accounting for 95% of the variance) you substantially reduce the risk of overfitting. I would definately try to do this if you have so little samples. – bayerj Nov 29 '11 at 22:30
2

General rules for when to use a neural network:

1) you can tell, relatively easily, what the right answer is, but not describe how you know that's the right answer; if you know what steps to take to get the right answer, then code it rather than training a NN, and if you can't tell what the right answer is likely to be, likely a NN won't be able to either 2) 90% accuracy is good enough (e.g. when other techniques give substantially less); NN by their nature do not give watertight 100% accuracy 3) you just need the right answer, not an understanding of how; NN's do not, by their nature, tend to give much insight into the nature of the system

By the way, giving a NN both the raw data and transforms of it (averages, deltas, etc.) and letting the learning algorithm decide which are useful for prediction is better than figuring it out yourself; if you determine everything about which factors are important and how to code them, you have done most of the work (not all) which a NN can do for you anyway.

p.s. running a NN many times and taking the best result is a good idea; any good NN implementation is stochastic, and different runs may be better or worse by a substantial amount.

rossdavidh
  • 490
  • 4
  • 11