The difference between \(lm(y\) ~ \(x*z)\) and \(lm(y\) ~ \(I(x*z))\), assuming x and z be numeric values is:
Equation 1 has interaction term whereas equation 2 doesn’t. \(x*z\) indicates the cross of x and z which is equivalent to \(x + z + x:z\) where \(x:z\) indicates the set of terms obtained by taking the interactions of all terms in fist with all terms in second (Source: lm documentation). Equation 2 doesn’t have interaction term. \(I(x*z)\) is equivalent to the product of x and z. In function formula, I is used to inhibit the interpretation of operators as formula operators, so they are used as arithmetical operators only.
An Example
library(MASS)
model1 = lm(medv~lstat*age, data=Boston)
model2 = lm(medv~I(lstat*age), data=Boston)
Model 1 coefficients
summary(model1)$coefficients
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 36.0885359346 1.469835463 24.55277263 4.907116e-88
## lstat -1.3921168406 0.167455532 -8.31335236 8.780730e-16
## age -0.0007208595 0.019879171 -0.03626205 9.710878e-01
## lstat:age 0.0041559518 0.001851795 2.24428275 2.524911e-02
Model 2 coefficients
summary(model2)$coefficients
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 30.158863119 0.4828239803 62.46347 1.985512e-239
## I(lstat * age) -0.007714613 0.0003798643 -20.30886 1.922769e-67
As we can see from the above example, the models are giving different coefficients because of the difference in formula.
If you like the question, how about some love and coffee: Buy me a coffee