How to calculate Helmert Coding

Question

I am trying to understand how Helmert Coding works

I know it compares levels of a variable with the mean of the subsequent levels of the variable, but what are these levels and how can I calculate this mean?

This is the example I am using:

Can someone explain how the cells in yellow are calculated?

I answered your last question at the beginning of https://stats.stackexchange.com/a/259223/919. The yellow columns are the last three rows of the Helmert matrix in that answer, successively divided by -2, -3, and -4 (from right to left). — whuber, Jun 01 '19 at 16:16
Thanks @whuber but I still dont get it.. in the reference you mentioned in the previous question https://Shrinx.it/2nrm it says Pk' = [k(k-1)]^-0.5 (1,1,.., 1-k,0,..,0) which means in my question P1' = [1*0]^-0.5 (0,0,0,0) ==> P1' = 0,0,0,0 and P2'=[2*1]^-0.5 (1,-1,0,0) ==> P2' = 1/ Sqrt(2) (1,-1,0,0) = (1/ Sqrt(2), -1/ Sqrt(2), 0 , 0) and so on .. which is not same at all.. can you plz explain what i am doing wrong? — asmgx, Jun 02 '19 at 03:09
You may want to read https://stats.stackexchange.com/a/221868/3277. About C (contrast coding) and L (contrast coefficient) matrices, including for "Helmert contrasts" as a particular case. — ttnphns, Jun 06 '19 at 18:49

StatsStudent · Accepted Answer · 2020-10-15T14:34:29.123

I think you are generally trying to understand how Helmert Contrasts work. I think the answer provided by Peter Flom is great, but I'd like to take a bit of a different approach and show you how Helmert Contrasts end up comparing means of factor "levels." I think this should improve your understanding.

To start the understanding, it's instructive to review the general model structure. We can assume the following standard multiple regression model:

\begin{eqnarray*} \hat{\mu}_{i}=E(Y_{i}) & = & \hat{\beta}_{0}+\hat{\beta}_{1}X_{1}+\hat{\beta}_{2}X_{2}+\hat{\beta}_{3}X_{3} \end{eqnarray*}

where $i=$ {$H$ for Hispanic, $A$ for Asian, $B$ for Black, and $W$ for White}.

Contrasts are purposefully chosen methods of coding or ways to numerically represent factor levels (e.g. Hispanic, Asian, Black, and White) so that when you regress them onto your dependent variable, you will obtain estimated beta coefficients that represent useful comparisons without doing any additional work. You may be familiar with the traditional treatment contrasts or dummy coding for example, which assigns a value of 0 or 1 to each observation depending on whether or not the observation is a Hispanic, Asian, Black, or White. That coding appears as:

So, if an observation corresponds to someone who is Hispanic, then, $X_{1}=X_{2}=X_{3}=0$. If the observation corresponds to someone who is black, then $X_{1}=0,\,X_{2}=1,\,X_{3}=0$. Recall with this coding, then the estimate corresponding to $\hat{\beta}_{0}$ corresponds to the estimated mean response for Hispanics only. Then $\hat{\beta}_{1}$ would represent the difference in the estimated mean response between Asian and Hispanic (i.e. $\hat{\mu}_{A}-\hat{\mu}_{H})$, $\hat{\beta}_{2}$ would represent the difference in the estimated mean response between Black and Hispanic (i.e. $\hat{\mu}_{B}-\hat{\mu}_{H})$, and $\hat{\beta}_{3}$ would represent the difference in estimated mean response between White and Hispanic (i.e. $\hat{\mu}_{W}-\hat{\mu}_{H})$.

With this in mind recall that we can use the same model as presented above, but use Helmert codings to obtain useful comparisons of these mean responses of the races. If instead of treatment contrasts, we use Helmert contrasts, then the resulting estimated coefficients change meaning. Instead of $\hat{\beta}_{1}$ corresponding to the difference in the mean response between Asian and Hispanic, under the Helmert coding you presented, it would represent the difference between the mean response for Hispanic and and the "mean of the mean" response for the Asian, Black and White group (i.e. $\hat{\mu}_{H}-\frac{\hat{\mu}_{A}+\hat{\mu}_{B}+\hat{\mu}_{W}}{3}$).

To see how this coding ``turns'' into these estimates. We can simply set up the Helmert matrix (only I'm going to include the constant column which is sometimes excluded in texts) and augment it with the estimated mean response for each race, $\hat{\mu}_{i}$, then use Gauss-Jordan Elimination to put the matrix in row-reduced echelon form. This will allow us to simply read-off the interpretations of each estimated parameter from the model. I'll demonstrate this below:

\begin{eqnarray*} \begin{bmatrix}1 & \frac{3}{4} & 0 & 0 & | & \mu_{H}\\ 1 & -\frac{1}{4} & \frac{2}{3} & 0 & | & \mu_{A}\\ 1 & -\frac{1}{4} & -\frac{1}{3} & \frac{1}{2} & | & \mu_{B}\\ 1 & -\frac{1}{4} & -\frac{1}{3} & -\frac{1}{2} & | & \mu_{W} \end{bmatrix} & \sim & \begin{bmatrix}1 & \frac{3}{4} & 0 & 0 & | & \mu_{H}\\ 0 & 1 & -\frac{2}{3} & 0 & | & \mu_{H}-\mu_{A}\\ 0 & -1 & -\frac{1}{3} & \frac{1}{2} & | & \mu_{B}-\mu_{H}\\ 0 & -1 & -\frac{1}{3} & -\frac{1}{2} & | & \mu_{W}-\mu_{H} \end{bmatrix}\\ & \sim & \begin{bmatrix}1 & \frac{3}{4} & 0 & 0 & | & \mu_{H}\\ 0 & 1 & -\frac{2}{3} & 0 & | & \mu_{H}-\mu_{A}\\ 0 & 0 & 1 & -\frac{1}{2} & | & \mu_{A}-\mu_{B}\\ 0 & 0 & -1 & -\frac{1}{2} & | & \mu_{W}-\mu_{A} \end{bmatrix}\\ & \sim & \begin{bmatrix}1 & \frac{3}{4} & 0 & 0 & | & \mu_{H}\\ 0 & 1 & -\frac{2}{3} & 0 & | & \mu_{H}-\mu_{A}\\ 0 & 0 & 1 & -\frac{1}{2} & | & \mu_{A}-\mu_{B}\\ 0 & 0 & 0 & 1 & | & \mu_{B}-\mu_{W} \end{bmatrix}\\ & \sim & \begin{bmatrix}1 & 0 & 0 & 0 & | & \mu_{H}-\frac{3}{4}\left\{ \mu_{H}-\mu_{A}+\frac{2}{3}\left[\mu_{A}-\mu_{B}+\frac{1}{2}\left(\mu_{B}-\mu_{W}\right)\right]\right\} \\ 0 & 1 & 0 & 0 & | & \mu_{H}-\mu_{A}+\frac{2}{3}\left[\mu_{A}-\mu_{B}+\frac{1}{2}\left(\mu_{B}-\mu_{W}\right)\right]\\ 0 & 0 & 1 & 0 & | & \mu_{A}-\mu_{B}+\frac{1}{2}\left(\mu_{B}-\mu_{W}\right)\\ 0 & 0 & 0 & 1 & | & \mu_{B}-\mu_{W} \end{bmatrix} \end{eqnarray*} So, now we simply read off the pivot positions. This implies that:

\begin{eqnarray*} \hat{\beta}_{0} & = & \mu_{H}-\frac{3}{4}\left\{ \mu_{H}-\mu_{A}+\frac{2}{3}\left[\mu_{A}-\mu_{B}+\frac{1}{2}\left(\mu_{B}-\mu_{W}\right)\right]\right\} \\ & = & \frac{1}{4}\hat{\mu}{}_{H}+\frac{1}{4}\hat{\mu}{}_{A}+\frac{1}{4}\hat{\mu}{}_{B}+\frac{1}{4}\hat{\mu}{}_{W} \end{eqnarray*}

that:

\begin{eqnarray*} \hat{\beta}_{1} & = & \mu_{H}-\mu_{A}+\frac{2}{3}\left[\mu_{A}-\mu_{B}+\frac{1}{2}\left(\mu_{B}-\mu_{W}\right)\right]\\ & = & \hat{\mu}{}_{H}-\hat{\mu}{}_{A}+\frac{2}{3}\hat{\mu}{}_{A}-\frac{1}{3}\left(\hat{\mu}{}_{B}-\hat{\mu}{}_{W}\right)\\ & = & \hat{\mu}{}_{H}-\frac{\hat{\mu}{}_{A}+\hat{\mu}{}_{B}+\hat{\mu}{}_{W}}{3} \end{eqnarray*}

that:

\begin{eqnarray*} \hat{\beta}_{2} & = & \mu_{A}-\mu_{B}+\frac{1}{2}\left(\mu_{B}-\mu_{W}\right)\\ & = & \mu_{A}-\frac{\mu_{B}+\mu_{W}}{2} \end{eqnarray*}

and finally that:

\begin{eqnarray*} \hat{\beta}_{3} & = & \hat{\mu}{}_{B}-\hat{\mu}{}_{W} \end{eqnarray*}

As you can see, by using the Helmert contrasts, we end up with betas that represent the difference between the estimated mean at the current level/race and the mean of the subsequent levels/races.

Let's take a look at this in R to drive the point home:

 hsb2 = read.table('https://stats.idre.ucla.edu/stat/data/hsb2.csv', header=T, sep=",")
 hsb2$race.f = factor(hsb2$race, labels=c("Hispanic", "Asian", "African-Am", "Caucasian"))
 cellmeans = tapply(hsb2$write, hsb2$race.f, mean)
 cellmeans
  Hispanic      Asian African-Am  Caucasian 
  46.45833   58.00000   48.20000   54.05517 
 
 helmert2 = matrix(c(3/4, -1/4, -1/4, -1/4, 0, 2/3, -1/3, -1/3, 0, 0, 1/2,
 -1/2), ncol = 3)
 contrasts(hsb2$race.f) = helmert2
 model.helmert2 =lm(write ~ race.f, hsb2)
 model.helmert2

Call:
lm(formula = write ~ race.f, data = hsb2)

Coefficients:
(Intercept)      race.f1      race.f2      race.f3  
     51.678       -6.960        6.872       -5.855  

 
 #B0=51.678 shoud correspond to the mean of the means of the races:
 cellmeans = tapply(hsb2$write, hsb2$race.f, mean)
 mean(cellmeans)
[1] 51.67838
 
 #B1=-6.960 shoud correspond to the difference between the mean for Hispanics
 #and the the mean for (Asian, Black, White):
 mean(race.means[c("Hispanic")]) - mean(race.means[c("Asian", "African-Am","Caucasian")])
[1] -6.960057
 
 #B2=6.872 shoud correspond to the difference between the mean for Asian and
 #the the mean for (Black, White):
 mean(race.means[c("Asian")]) - mean(race.means[c("African-Am","Caucasian")])
[1] 6.872414
 
 #B3=-5.855 shoud correspond to the difference between the mean for Black
 #and the the mean for (White):
 mean(race.means[c("African-Am")]) - mean(race.means[c("Caucasian")])
[1] -5.855172

If you are looking for a method to create a Helmert matrix or are trying to understand how the helmert matrices are generated, you may use this code too that I put together:

#Example with Race Data from OPs example
hsb2 = read.table('https://stats.idre.ucla.edu/stat/data/hsb2.csv', header=T, sep=",")
hsb2$race.f = factor(hsb2$race, labels=c("Hispanic", "Asian", "African-Am", "Caucasian"))
levels<-length(levels(hsb2$race.f))
categories<-seq(levels, 2)
basematrix=matrix(-1, nrow=levels, ncol=levels)
diag(basematrix[1:levels, 2:levels])<-seq(levels-1, 1)
sub.basematrix<-basematrix[,2:levels]
sub.basematrix[upper.tri(sub.basematrix-1)]<-0
contrasts<-sub.basematrix %*% diag(1/categories)
rownames(contrasts)<-levels(hsb2$race.f)
contrasts
                [,1]       [,2] [,3]
    Hispanic    0.75  0.0000000  0.0
    Asian      -0.25  0.6666667  0.0
    African-Am -0.25 -0.3333333  0.5
    Caucasian  -0.25 -0.3333333 -0.5

Here is an example with five levels of a factor:

levels<-5
categories<-seq(levels, 2)
basematrix=matrix(-1, nrow=levels, ncol=levels)
diag(basematrix[1:levels, 2:levels])<-seq(levels-1, 1)
sub.basematrix<-basematrix[,2:levels]
sub.basematrix[upper.tri(sub.basematrix-1)]<-0
contrasts<-sub.basematrix %*% diag(1/categories)
contrasts

   [,1]  [,2]       [,3] [,4]
[1,]  0.8  0.00  0.0000000  0.0
[2,] -0.2  0.75  0.0000000  0.0
[3,] -0.2 -0.25  0.6666667  0.0
[4,] -0.2 -0.25 -0.3333333  0.5
[5,] -0.2 -0.25 -0.3333333 -0.5

`function(n) contr.helmert(n) %*% diag(1/seq(n)[-1])` works, too. If you must have the output arranged exactly as in the question, `function(n) apply(apply(contr.helmert(n) %*% diag(1/seq(n)[-1]), 1, rev), 1, rev)` will do that. — whuber, Jun 06 '19 at 17:06
Right, but I wanted to provide more detailed steps so the OP could see the actual steps to create the matrices. — StatsStudent, Jun 06 '19 at 17:10
Okay, then if you want to do it from scratch, `function(n) {t((diag(seq(n-1, 0)) - upper.tri(matrix(1, n, n)))[-n,] / seq(n, 2))}` works pretty well ;-). — whuber, Jun 06 '19 at 17:12
Excellent. I admit, I didn't work on making this code the most efficient. I prefer your approach, of course! Nicely done, @whuber. — StatsStudent, Jun 06 '19 at 17:18
It's also instructive to inspect the code for `stats::contr.helmert`. It's short and straightforward. — whuber, Jun 06 '19 at 17:20
It occurs to me that for conveying the *concept* -- regardless of efficiency -- then an algorithm that *obviously* generates orthogonal contrasts corresponding to an ordered sequence of variables might be the most instructive. Here is one that makes it clear how the Helmert contrasts are obtained by successively orthogonalizing an "interesting" basis. (The last line removes the constant term and the final zero vector.) `f — whuber, Jun 06 '19 at 18:28
@whuber, I like that approach much better. It's easier to see what's going on when forming the matrices. BTW, I've gone ahead an added significantly more to my post in hopes of helping the OP understand. I find most the explanations of Helmert Contrasts are quite lacking on the web (and elsewhere). — StatsStudent, Jun 06 '19 at 21:20
+1 Very nice. Here is an `R` function that carries out the operations you describe: `f — whuber, Jun 06 '19 at 22:01
You lost me twice.. 1) when you said "Black and White (i.e. μA − (μB+μW)/2)" how did this happen?! and 2) when you started building the matrix, in the 3rd matrix you said "μB−μH" shouldn't that be -0.25 - 0.75 = -1 ?! — asmgx, Jun 08 '19 at 09:20
Hi, @asmgx, I think I made a couple of typographical errors when I transfered my handwritten work to latex on CV. Please check again and you'll see this mistakes have been corrected. Sorry about that and good catch. The fact that you were able to catch these likely indicates a pretty good understanding of how these contrast codings work! — StatsStudent, Jun 08 '19 at 16:27

score 5 · Answer 2 · edited Jun 06 '19 at 17:56

With Helmert coding, each level of the variable is compared to "later" levels of the variable.

The weights depend on the number of levels of the variable.

If there are L levels then the first comparison is of level vs. $(L-1)$ other levels. The weights are then $(L-1)/L$ for the first level and $-1/L$ for each of the other levels. In your case L = 4 so the weights are .75 and -.25 (3 times).

The next comparison has only $L-1$ levels (the first level is no longer part of the comparisons), so now the weights are $(L-2)/(L-1)$ for the first level and $-1/(L-1)$ for the others (in your case, $2/3$ and -$1/3$. And so on.

Why are you using Helmert coding here? As this page notes, Helmert coding and its inverse, difference coding, really only make sense when the variable is ordinal.

Clearly, this coding system does not make much sense with our example of race because it is a nominal variable. However, this system is useful when the levels of the categorical variable are ordered in a meaningful way. For example, if we had a categorical variable in which work-related stress was coded as low, medium or high, then comparing the means of the previous levels of the variable would make more sense.

Personally, I find them hard to interpret, even in that case. But, you are comparing "White" to the average of the other three groups. Is that what you want?

How to calculate Helmert Coding

2 Answers2

Linked