Question

The difference between \(lm(y\) ~ \(x*z)\) and \(lm(y\) ~ \(I(x*z))\), assuming x and z be numeric values is:

  1. eq 1 has interaction term
  2. eq 2 has interaction term
  3. Both are same, diff by chance
  4. I don’t know

Answer

Equation 1 has interaction term whereas equation 2 doesn’t. \(x*z\) indicates the cross of x and z which is equivalent to \(x + z + x:z\) where \(x:z\) indicates the set of terms obtained by taking the interactions of all terms in fist with all terms in second (Source: lm documentation). Equation 2 doesn’t have interaction term. \(I(x*z)\) is equivalent to the product of x and z. In function formula, I is used to inhibit the interpretation of operators as formula operators, so they are used as arithmetical operators only.

An Example

library(MASS)

model1 = lm(medv~lstat*age, data=Boston)
model2 = lm(medv~I(lstat*age), data=Boston)

Model 1 coefficients

summary(model1)$coefficients
##                  Estimate  Std. Error     t value     Pr(>|t|)
## (Intercept) 36.0885359346 1.469835463 24.55277263 4.907116e-88
## lstat       -1.3921168406 0.167455532 -8.31335236 8.780730e-16
## age         -0.0007208595 0.019879171 -0.03626205 9.710878e-01
## lstat:age    0.0041559518 0.001851795  2.24428275 2.524911e-02

Model 2 coefficients

summary(model2)$coefficients
##                    Estimate   Std. Error   t value      Pr(>|t|)
## (Intercept)    30.158863119 0.4828239803  62.46347 1.985512e-239
## I(lstat * age) -0.007714613 0.0003798643 -20.30886  1.922769e-67

As we can see from the above example, the models are giving different coefficients because of the difference in formula.

If you like the question, how about some love and coffee: Buy me a coffee