I have been designing a neural network to perform predictions on construction item costs. I've developed a core set of predictors that seem to describe the problem space well - they appear to be highly correlated with the dependent variable (the construction item cost) and generally uncorrelated with each other.
These predictors cover a range of features including:
- Quantity of work
- Time of year (month / quarter)
- Location (by county or larger regions)
- National construction cost index
- Union worker prevailing wages
- Material cost indices
The neural network architecture is multi-layer preceptron (MLP) trained using resilient back-propagation (RProp). Though I typically use less than ten inputs, the network has around 60-70, due to conversion of several inputs to 1-of-N form. I'm using only a single hidden layer and have generally restricted the neuron count from 9 to the number of inputs in the network. I iterate the training anywhere from 1000 - 8000 times, depending. I typically split the data 85% / 15%, test / train. I've not done cross-validation because I'm not yet comfortable that I really have the right number / combination of predictors to start comparing models for performance.
For evaluation, I use the SSE to keep an eye on bias / variance changes, and an overall "success" metric which records the number of times the network successfully predicts a cost within a given percentage of the actual value, typically 15%.
To this point, I've found a couple "sweet spots" in learning rates and hidden nodes that yield good results with the test data without overfitting the training data greatly. However, I seem to have hit a limit.
Right now, I can achieve 65% "success" on the test data (i.e., the network successfully predicts within 15% of the actual value 65% of the time), which is encouraging. I've achieved higher results, but the model starts to overfit the training data. That is, once the model hits about 75% success on the training data, generalization starts to degrade.
It seems I should be able to achieve better success on my test data than I am. But at this point, I have to admit that my testing has gotten a bit ad hoc and I'm not really sure how to go about more thoroughly searching for better parameters or identifying performance characteristics that might give me a clue as to what's wrong...