Question

Which of the following points if removed from data could improve our linear model (represented by straight line)?

  1. A
  2. B
  3. C

Answer

Correct answer:B

Linear regression models are sensitive to outliers. For a simple linear regression of the form \(y = \beta_0 + \beta_1x\), the coefficient estimate \(\hat \beta_1\) which defines the slope the line is given by: \[\beta_1 = \frac {\sum (x_i - \bar x)(y_i - \bar y)}{\sum (x_i - \bar x)^2}\] Notice that \(\beta_1\) is dependent on \((y_i - \bar y)\), whose absolute value increases as the outliers count increases, which then affects the slope of our linear model.

Thanks for reading. If you find a correction or would like to know more, hit me up on Twitter and if you wish to support my work: Buy a coffee