Questions tagged [centering]

Centering involves subtracting the overall sample mean score from the original score; standardizing does the same followed by dividing by the overall sample standard deviation.

Centering is the act of positioning something at the midpoint of a space. For two-dimensional spaces (lines), this usually involves taking the length of the line, dividing it by two, and placing the object at the point denoted by this value. For higher dimensions, the same technique is also applied for each additional dimension, as each each dimension can be considered discrete.

In many statistical procedures, it is helpful to center the mean of a variable's distribution. Examples include general linear models with interaction or quadratic terms.

151 questions
380
votes
7 answers

When conducting multiple regression, when should you center your predictor variables & when should you standardize them?

In some literature, I have read that a regression with multiple explanatory variables, if in different units, needed to be standardized. (Standardizing consists in subtracting the mean and dividing by the standard deviation.) In which other cases…
mathieu_r
  • 4,211
  • 3
  • 14
  • 5
51
votes
3 answers

How does centering make a difference in PCA (for SVD and eigen decomposition)?

What difference does centering (or de-meaning) your data make for PCA? I've heard that it makes the maths easier or that it prevents the first PC from being dominated by the variables' means, but I feel like I haven't been able to firmly grasp the…
Zenit
  • 1,586
  • 2
  • 17
  • 19
50
votes
1 answer

How does centering the data get rid of the intercept in regression and PCA?

I keep reading about instances where we center the data (e.g., with regularization or PCA) in order to remove the intercept (as mentioned in this question). I know it's simple, but I'm having a hard time intuitively understanding this. Could someone…
Alec
  • 2,185
  • 4
  • 17
  • 14
34
votes
3 answers

Why could centering independent variables change the main effects with moderation?

I have a question related to multiple regression and interaction, inspired by this CV thread: Interaction term using centered variables hierarchical regression analysis? What variables should we center? When checking for a moderation effect I do…
Marc Schubert
  • 341
  • 1
  • 4
  • 3
21
votes
2 answers

Does random forest need input variables to be scaled or centered?

My input variables have different dimensions. Some variables are decimal while some are hundreds. Is it essential to center (subtract mean) or scale (divide by standard deviation) these input variables in order to make the data dimensionless when…
YQ.Wang
  • 409
  • 1
  • 4
  • 11
21
votes
1 answer

Standardized VS centered variables

I have found many useful posts about standardized independent variables and centered independent variables on stats.stackexchange.com, but I am still a bit confused. I am asking you an evaluation of what I have understood. Also, if what follows is…
18
votes
2 answers

Converting standardized betas back to original variables

I realise this is probably a very simple question but after searching I can't find the answer I am looking for. I have a problem where I need to standardize the variables run the (ridge regression) to calculate the ridge estimates of the betas. I…
Baz
  • 1,583
  • 3
  • 13
  • 26
18
votes
3 answers

centering and scaling dummy variables

I have a data set that contains both categorical variables and continuous variables. I was advised to transform the categorical variables as binary variables for each level (ie, A_level1:{0,1}, A_level2:{0,1}) - I think some have called this "dummy…
user2300643
  • 741
  • 2
  • 5
  • 13
14
votes
2 answers

Imputation of missing data before or after centering and scaling?

I want to impute missing values of a dataset for machine learning (knn imputation). Is it better to scale and center the data before the imputation or afterwards? Since the scaling and centering might rely on min and max values, in the first case…
13
votes
1 answer

Is centering needed when bootstrapping the sample mean?

When reading about how to approximate the distribution of the sample mean I came across the nonparametric bootstrap method. Apparently one can approximate the distribution of $\bar{X}_n-\mu$ by the distribution of $\bar{X}_n^*-\bar{X}_n$, where…
Christin
  • 139
  • 4
10
votes
3 answers

Why standardization of the testing set has to be performed with the mean and sd of the training set?

In pre-processing the data set before applying a machine learning algorithm the data can be centered by subtracting the mean of the variable, and scaled by dividing by the standard deviation. This is a straightforward process in the training set,…
Antoni Parellada
  • 23,430
  • 15
  • 100
  • 197
10
votes
3 answers

Zero-centering the testing set after PCA on the training set

I have a training set of data on which I do principal components analysis (PCA) and save the loadings/eigenvectors/coefficient matrix. I want to use the eigenvectors to transform my testing data into the same principal component space, I know I just…
PatEugene
  • 203
  • 2
  • 5
9
votes
3 answers

How to include $x$ and $x^2$ into regression, and whether to center them?

I want to include the term $x$ and its square $x^2$ (predictor variables) into a regression because I assume that low values of $x$ have a positive effect on the dependent variable and high values have a negative effect. The $x^2$ should capture the…
Peter
  • 223
  • 2
  • 7
9
votes
2 answers

How to include a linear and quadratic term when also including interaction with those variables?

When adding a numeric predictor with categorical predictors and their interactions, it is usually considered necessary to center the variables at 0 beforehand. The reasoning is that the main effects are otherwise hard to interpret as they are…
Henrik
  • 13,314
  • 9
  • 63
  • 123
9
votes
1 answer

Interaction term using centered variables hierarchical regression analysis? What variables should we center?

I'm running a hierarchical regression analysis and I have some little doubts: Do we calculate the interaction term using the centered variables? Do we have to center ALL the continuous variables we have in the dataset, except the dependent…
PhDstudent
  • 93
  • 1
  • 3
1
2 3
10 11