9

If I have two variables following two different distributions and having different standard deviations... How do I need to transform two variables so that when I sum the two result is not "driven" by more volatile one.

For example... Variable A is less volatile than variable B (ranges from 0 to 3000) and variable B goes fro. 300 to 350.

If simply add the two variables together the result will obviously be driven by A.

Macro
  • 40,561
  • 8
  • 143
  • 148
user333
  • 6,621
  • 17
  • 44
  • 54

1 Answers1

15

A common practice is to standardize the two variables, $A,B$, to place them on the same scale by subtracting the sample mean and dividing by the sample standard deviation. Once you've done this, both variables will be on the same scale in the sense that they each have a sample mean of 0 and sample standard deviation of 1. Thus, they can be added without one variable having an undue influence due simply to scale.

That is, calculate

$$ \frac{ A - \overline{A} }{ {\rm SD}(A) }, \ \ \frac{ B - \overline{B} }{ {\rm SD}(B) } $$

where $\overline{A}, {\rm SD}(A)$ denotes the sample mean and standard deviation of $A$, and similarly for B. The standardized versions of the variables are interpreted as the number of standard deviations above/below the mean a particular observation is.

Macro
  • 40,561
  • 8
  • 143
  • 148
  • 1
    will this work if variable are not normally distributed? – user333 Jul 19 '11 at 21:25
  • 1
    standardizing has nothing to do with the normal distribution - it is merely a means of putting the variables on the same scale. So, yes. – Macro Jul 19 '11 at 21:31
  • If I divide by sd and not subtract the mean... I will get same volatilities, but different ranges right? – user333 Jul 20 '11 at 06:16
  • 1
    Yes - if you only scale them (divide by the standard deviations) then they with both end up with the same variance, but their mean and range will be different. – Macro Jul 20 '11 at 14:02
  • @Macro What if I do not have data but only have sequential data for the variables. So, the sum of two variables acts more like a score. I believe there are some bad implications such as scores very early in the sequence. Do you know of another way? – tintinthong Jul 07 '17 at 16:47
  • @Macro: I have an index for which standardization is useful but negative values are a problem (see [here](https://stats.stackexchange.com/questions/526187/standardization-of-index-components-without-mean-centering)). I see from your answer that if I do not subtract the mean, then the scale is different. Does this mean that I cannot sum variables in an index? Any idea on how to standardize while keeping the final index with positive values (as my original variables)? Thanks!! – Forinstance May 28 '21 at 10:10