0

First, I simulated some data according to the OLS condition:

n <- 500
x.ols <- runif(n, min=0,max=50)
y.ols <- (1/3)*x.ols +rnorm(n,0,1)
train <- data.frame(x=x.ols, y=y.ols)
test <- data.frame(x=runif(100, min=0,max=50))

Then, I scaled the data and fit a neural network:

range11 <- function(x){2*(x-min(x))/(max(x)-min(x))-1}
unrange <- function(x,train){0.50 * (min(train)*(-x) + min(train) + max(train)*x + max(train)) }
anntrain <- train
anntrain$y <- range11(anntrain$y)
d <- ncol(train)-1
annSize<-ceiling(2*d/3)
ann.mod <- nnet(y ~ x, anntrain, size=annSize)
ann.pred <- predict(ann.mod, newdata = test)
ann.pred <- unrange(ann.pred, train$y)

Unfortunately, the predictions are flat and don't capture the line. This code works really well for a non-linear pattern and so I'm confused as to why it won't work in this simpler case.

Interestingly, if I scale differently, it works like a charm here and is awful in the non-linear case.

anntrain <- train
anntrain$y <- anntrain$y/max(anntrain$y)
d <- ncol(train)-1
annSize<-ceiling(2*d/3)
ann.mod <- nnet(y ~ x, anntrain, size=annSize)
ann.pred <- predict(ann.mod, newdata = test)
ann.pred <- ann.pred * max(anntrain$y)

The reason I scaled to (-1,1) was that I had an issue in a non-linear case and I found this post to be helpful. In fact, scaling to (-1,1) helped in that case but hurts in this case. Is there a consistent way to scale that works ``well" for most cases or did I just happen upon a weird case?

1 Answers1

3

Linearity vs. nonlinearity might be a red herring. The actual problem might be with bounding the predictions to lie within $(0,1)$.

The nnet function of the nnet package uses a sigmoid transformation in the output layer by default; see the argument linout = FALSE of nnet. This is what you need for modelling probabilities, which is what people often do when using a neural network for binary classification. This makes the fitted values lie within $(0,1)$.

But you are doing regression, so you do not want that. Try supplying linout = TRUE as an additional argument to the nnet function. This will remove the sigmoid transformation from the output layer and thus allow for fitted values outside the $(0,1)$ range.

Richard Hardy
  • 54,375
  • 10
  • 95
  • 219