9

I am fitting a logistic model to data using the glm function in R. I have attempted to specify interaction terms in two ways:

fit1 <- glm(y ~ x*z, family = "binomial", data = myData) 
fit2 <- glm(y ~ x/z, family = "binomial", data = myData) 

I have 3 questions:

  1. What is the difference between specifying my interaction terms as x*z compared to x/d?

When I call summary(fit1) the report includes results for the intercept, x, z, and x:z while summary(fit2) only includes results for intercept, x, and x:z.

I did look at Section 11.1 in "An Introduction to R" but couldn't understand the meaning.

  1. How do I interpret the fit equation mathematically? Specifically, how do I interpret the interaction terms formulaically?

Moving to math instead of R, do I interpret the equation as:

logit(y) = (intercept) + (coeff_x)*x + (coeff_z)*x + (coeff_xz)*x*z
?

This interpretation may differ in the two specifications fit1 and fit2. What is the interpretation of each?

  1. Assuming the above interpretation is correct, how to I fit the model of x*(1/z) in R? Do I need to just create another column with these values?
kjetil b halvorsen
  • 63,378
  • 26
  • 142
  • 467
user1785104
  • 93
  • 1
  • 1
  • 3

2 Answers2

9

x/z expands to x + x:z and so far I have used this only to model nested random effects.

set.seed(42)
x <- rnorm(100)
z <- rnorm(100)
y <- sample(c(0,1),100,TRUE)

fit2 <- glm(y ~ x/z, family = "binomial") 
fit3 <- glm(y ~ x + z %in% x, family = "binomial")
identical(summary(fit2)$coefficients,summary(fit3)$coefficients)
#TRUE
fit4 <- glm(y ~ x + x:z, family = "binomial")
identical(summary(fit2)$coefficients,summary(fit4)$coefficients)
#TRUE

fit5 <- glm(y ~ I(x/z), family = "binomial")    
a <- x/z
fit6 <- glm(y ~ a, family = "binomial")
all.equal(summary(fit5)$coefficients,summary(fit6)$coefficients)
#[1] "Attributes: < Component 2: Component 1: 1 string mismatch >"
#which means that only the rownames don't match, but values are identical
Roland
  • 5,758
  • 1
  • 28
  • 60
1

I have never seen x/d in any formula. Can you give a link to such a page? The best way to specify a formula is using + and :, for e.g., if you want to model y on x1 and x2 and interaction of x1 and x2, you will need to give: y ~ x1 + x2 + x1:x2 or x1 * x2 (which is a shortcut).

Now comes the question of interpreting coeff when you have interaction terms. Imagine a simple linear model: y ~ x1 + x2. The coeff of x1 or x2 indicates the increase in y with a unit increase in x1 or x2 respectively.

However, the moment you add an interaction term, interpretation is not so easy. If you increase x1 by 1 unit in a model: y = b0 + b1 x1 + b2 x2 + b3 x1:x2, the increase in y is : b1 + b3*x2. As you see the increase is not linear, it depends on the level of x2. What you can possibly do is plot response curves for various levels of x2, and plot y vs x1, to show change in response.

Hope this helps. I will try and answer the rest of the questions in another post.

Indrajit
  • 131
  • 3
  • 1
    In section 11.1 of the "An Introduction to R" guide that comes with the installation, it says x*z specifies "cross classification" while x/z specifies "nested classification". I'm not really sure what that means. It also describes it in terms of matrix classifiers instead of covariates. I'm not sure if that would change the interpretation. –  Oct 30 '12 at 17:09