What do the coefficients of the crossproduct of regression mean?

Question

How can I interpret the coefficients of the crossproduct of each of the following codes? What do they mean? How can I deduce that they correspond to our expectation? Also which crossproduct is correct? (1) or (2)? What's the difference? The data are at the bottom:

fit.model = lm(formula = CO ~ ., data = cigarettes)
Call:
lm(formula = CO ~ ., data = cigarettes)

Coefficients:
(Intercept)          Tar     Nicotine       Weight  
     3.2022       0.9626      -2.6317      -0.1305  
X = model.matrix(fit.model)
> X
                 (Intercept)  Tar Nicotine Weight
Alpine                     1 14.1     0.86 0.9853
Benson&Hedges              1 16.0     1.06 1.0938
BullDurham                 1 29.8     2.03 1.1650
CamelLights                1  8.0     0.67 0.9280
Carlton                    1  4.1     0.40 0.9462
Chesterfield               1 15.0     1.04 0.8885
GoldenLights               1  8.8     0.76 1.0267
Kent                       1 12.4     0.95 0.9225
Kool                       1 16.6     1.12 0.9372
L&M                        1 14.9     1.02 0.8858
LarkLights                 1 13.7     1.01 0.9643
Marlboro                   1 15.1     0.90 0.9316
Merit                      1  7.8     0.57 0.9705
MultiFilter                1 11.4     0.78 1.1240
NewportLights              1  9.0     0.74 0.8517
Now                        1  1.0     0.13 0.7851
OldGold                    1 17.0     1.26 0.9186
PallMallLight              1 12.8     1.08 1.0395
Raleigh                    1 15.8     0.96 0.9573
SalemUltra                 1  4.5     0.42 0.9106
Tareyton                   1 14.5     1.01 1.0070
True                       1  7.3     0.61 0.9806
ViceroyRichLight           1  8.6     0.69 0.9693
VirginiaSlims              1 15.2     1.02 0.9496
WinstonLights              1 12.0     0.82 1.1184
attr(,"assign")
[1] 0 1 2 3

(1) result=t(X) %*% X

            (Intercept)       Tar  Nicotine    Weight
(Intercept)     25.0000  305.4000  21.91000  24.25710
Tar            305.4000 4501.2000 314.67100 302.17874
Nicotine        21.9100  314.6710  22.21050  21.63176
Weight          24.2571  302.1787  21.63176  23.72096
(2) XbyX <- crossprod(X)
            (Intercept)       Tar  Nicotine    Weight
(Intercept)     25.0000  305.4000  21.91000  24.25710
Tar            305.4000 4501.2000 314.67100 302.17874
Nicotine        21.9100  314.6710  22.21050  21.63176
Weight          24.2571  302.1787  21.63176  23.72096

> dput(cigarettes)
structure(list(Tar = c(14.1, 16, 29.8, 8, 4.1, 15, 8.8, 12.4, 
16.6, 14.9, 13.7, 15.1, 7.8, 11.4, 9, 1, 17, 12.8, 15.8, 4.5, 
14.5, 7.3, 8.6, 15.2, 12), Nicotine = c(0.86, 1.06, 2.03, 0.67, 
0.4, 1.04, 0.76, 0.95, 1.12, 1.02, 1.01, 0.9, 0.57, 0.78, 0.74, 
0.13, 1.26, 1.08, 0.96, 0.42, 1.01, 0.61, 0.69, 1.02, 0.82), 
    Weight = c(0.9853, 1.0938, 1.165, 0.928, 0.9462, 0.8885, 
    1.0267, 0.9225, 0.9372, 0.8858, 0.9643, 0.9316, 0.9705, 1.124, 
    0.8517, 0.7851, 0.9186, 1.0395, 0.9573, 0.9106, 1.007, 0.9806, 
    0.9693, 0.9496, 1.1184), CO = c(13.6, 16.6, 23.5, 10.2, 5.4, 
    15, 9, 12.3, 16.3, 15.4, 13, 14.4, 10, 10.2, 9.5, 1.5, 18.5, 
    12.6, 17.5, 4.9, 15.9, 8.5, 10.6, 13.9, 14.9)), .Names = c("Tar", 
"Nicotine", "Weight", "CO"), class = "data.frame", row.names = c("Alpine", 
"Benson&Hedges", "BullDurham", "CamelLights", "Carlton", "Chesterfield", 
"GoldenLights", "Kent", "Kool", "L&M", "LarkLights", "Marlboro", 
"Merit", "MultiFilter", "NewportLights", "Now", "OldGold", "PallMallLight", 
"Raleigh", "SalemUltra", "Tareyton", "True", "ViceroyRichLight", 
"VirginiaSlims", "WinstonLights"))

score 1 · Answer 1 · edited Apr 13 '17 at 12:44

This seems rather confused to me. It is probably best for you to simply think of $X'X$ as being a computational step in the process of calculating your beta estimates that does not have any independent meaning. Here are some answers to your specific questions:

There are no coefficients in your model fit for the sum of squares and crossproducts of your design matrix (i.e., crossprod(X)). You are fitting a model with an intercept and slopes for Tar, Nicotine, and Weight.
Since there are no such coefficients, there can be no meaningful answer to what they mean.
I do not understand what is meant by whether "they correspond to our expectation".
There is no difference between (1) and (2). Both are correct (for what they are).
```
> identical(crossprod(X), t(X) %*% X)
[1] TRUE
```

I wonder if you are trying to think through the nature of interactions in multiple regression. To form an interaction term, you would multiply two variables and enter their product as a new variable in the model. For example:

within(cigarettes, new.var = Tar*Nicotine)
fit.model2 = lm(CO~Weight+  Tar + Nicotine + new.var, data=cigarettes)
 # or:
fit.model2 = lm(CO~Weight + Tar*Nicotine,             data=cigarettes)
summary(fit.model2)
# ...
# Coefficients:
#              Estimate Std. Error t value Pr(>|t|)    
# ... 
# Tar:Nicotine -0.20997    0.06211  -3.380  0.00297 ** 
# ...

The meaning of the coefficient on the interaction term (Tar:Nicotine -0.20997) is difficult for people to interpret in isolation, and it may be best for you not to try. The fact that the interaction is significant implies that the effect of Tar, e.g., depends on the level of Nicotine. Beyond that, to see how this plays out in a specific case, it is best to hold one of those constant at a specific level that seems meaningful in your situation and examine (e.g., plot) the relationship between the other interacting variable and the response. To understand interactions further, some of the information in my answers below may be helpful for you:

What do the coefficients of the crossproduct of regression mean?

1 Answers1