17

I understand the concept of scaling the data matrix to use in a linear regression model. For example, in R you could use:

scaled.data <- scale(data, scale=TRUE)

My only question is, for new observations for which I want to predict the output values, how are they correctly scaled? Would it be, scaled.new <- (new - mean(data)) / std(data)?

gung - Reinstate Monica
  • 132,789
  • 81
  • 357
  • 650
SamuelNLP
  • 544
  • 1
  • 6
  • 21
  • 1
    To get the values back just do `y = y_esc * sd(y) + mean(y)`, but that would mess with the model properties i guess, so i'm also waiting a more technical answer too! – Fernando Mar 07 '14 at 15:29
  • I don't want the values back, I want to know how new instances can be correctly scaled in the same way. I've edited my question based on your comment. – SamuelNLP Mar 07 '14 at 15:31

2 Answers2

16

The short answer to your question is, yes - that expression for scaled.new is correct (except you wanted sd instead of std).

It may be worth noting that scale has optional arguments which you could use:

scaled.new <- scale(new, center = mean(data), scale = sd(data))

Also, the object returned by scale (scaled.data) has attributes holding the numeric centering and scalings used (if any), which you could use:

scaled.new <- scale(new, attr(scaled.data, "scaled:center"), attr(scaled.data, "scaled:scale"))

The advantage of that appears when the original data has more than one column, so there are multiple means and/or standard deviations to consider.

user20637
  • 706
  • 4
  • 11
2

There are now simpler ways to do this. For example, the preprocess function of the caret package

library(caret)
preproc <- preProcess(data, method = c("center", "scale")
scaled.new <- predict(preproc, newdata = new)

or scale_by in the standardize package

or using the receipes package

library(recipes); library(dplyr)
rec <- recipe(~ ., data) %>% step_normalize(all_numeric()) %>% prep()
scaled.new <- rec %>% bake(new)
nigelhenry
  • 181
  • 3
  • Upvoted. You might want to correct case in "the preprocess function of the caret package" and spelling in "using the receipes package" – user20637 Jan 19 '21 at 20:43