Question

Which of the following is true about the sum of residuals of the two linear regression models shown in following graph? (s(x): sum of residuals of x)

  1. s(A) > s(B)
  2. s(A) = s(B)
  3. s(A) < s(B)

Reference: StackExchange

Answer

Sum of residuals are zero, so option 2 is correct.

Take for example, a simple linear regression of the form \(\hat{y} = \hat{\beta_0} + \hat\beta_1x\). The unknown coeffiecients are estimated by minimzing the sum of squares of residuals i.e. \[ <\hat \beta_0, \hat \beta_1> = minimize(\sum(y -\hat y)^2)\] To minimize, partially differentiate the RHS with respect to \(\hat\beta_0\) and \(\hat\beta_1\) and equate it to zero. When differentiating with respect to \(\hat\beta_0\): \[\begin{aligned} \frac {d}{d\hat\beta_0}(\sum(y -\hat y)^2) &= 0 \\ \frac {d}{d\hat\beta_0}(\sum(y - \hat{\beta_0} + \hat\beta_1x)^2 &= 0 \\ -2\sum(y - \hat{\beta_0} + \hat\beta_1x) &= 0 \\ -2\sum(y - \hat y) &= 0 \\ -2\sum residuals &= 0 \Rightarrow \sum residuals = 0 \end{aligned} \] This would fail i.e. sum of residuals not equal to zero, if there is no constant term in our linear regression model.

Thanks for reading. If you find a correction, hit me up on Twitter and if you like the question, how about a tip: PayPal TipJar