Yes, the new data have to be pre-processed as well.
EDIT (based on your last comment):
For your fist code block, I am not sure whether the new data are automatically pre-processed, just because you used the preProc
argument.
For your second code block, yes, nnet()
does not provide any functionality to pre-process the data.
I would recommend to use the preProcess()
function of caret
. Actually, when you use preProc
as your input argument the preProcess()
function is called. You can define the kind of pre-processing you need in the preProcess()
function, and then using the predict()
function you actually pre-process the data in question. Now, the advantage of using preProcess()
is that you can either use the predict()
function to pre-process new data, or use the newdata
input argument of the preProcess
function, which actually does the same thing. Refer to the documentation for more details.
Of course you can pre-process just a single observation. In your example, you center and scale the training set. This means that you compute the mean value and standard deviation of the training set, and then you subtract the mean value and divide by the standard deviation, so as the transformed training set has now mean value of 0, and standard deviation of 1. If you want to pre-process just a single observation, you can just subtract and divide this observation with the aforementioned mean value, and standard deviation, respectively.
As a simple example on how to use the preProcess()
function (taken from the documentation):
data(BloodBrain)
preProc <- preProcess(bbbDescr[1:100,-3])
training <- predict(preProc, bbbDescr[1:100,-3])
test <- predict(preProc, bbbDescr[101:208,-3])
One last thing; you mention
If I do preprocessing by myself with the testdata and use that preprocessed testdata as input for fitting the nnet...
- Just to make this clear, you should fit the training data, and then use the predict()
function to generate predictions for new data.
Hope it helps!