Interlude: Robust Regression in R

This section explains how to use the robust regression methods discussed in sections 17, 18 and 19 in R.

M-Estimators

The rlm() function from the MASS package implements the iteratively reweighted least squares algorithm from section 18. The function takes a formula interface similar to lm(), and the psi argument specifies which \(\psi\)-function to use:

library(MASS)

rlm1 <- rlm(y ~ x, psi = psi.huber)      # Huber's method
rlm2 <- rlm(y ~ x, psi = psi.bisquare)   # Tukey's bisquare
rlm3 <- rlm(y ~ x, psi = psi.hampel)     # Hampel's method

The available psi functions correspond to the objective functions discussed in section 18:

  • psi.huber implements Huber’s \(t\)-function from section 18.3.3.
  • psi.bisquare implements Tukey’s bisquare method.
  • psi.hampel implements Hampel’s method.

Information can be extracted from the fitted model object:

  • coef(rlm1) returns the estimated regression coefficients \(\hat\beta\).
  • rlm1$resid returns the residuals.
  • rlm1$w returns the weights assigned to each observation.

The fitted line can be plotted using abline(rlm1).

LAV Estimation

The rq() function from the quantreg package computes the least absolute values (LAV) estimator from section 18.1:

library(quantreg)

rq1 <- rq(y ~ x)

The rq() function uses specialized algorithms (interior point methods) which are more efficient and have better convergence properties than the general-purpose optim() approach shown in section 18.1. For pedagogical purposes, section 18.1 shows the optim() implementation to illustrate what the LAV estimator actually minimises, but for practical work the rq() function should be used.

Coefficients can be extracted using coef(rq1) and residuals using resid(rq1). The fitted line can be plotted using abline(rq1).

High Breakdown Point Methods

The lqs() function from the MASS package implements both the least median of squares (LMS) and least trimmed squares (LTS) estimators from section 19:

library(MASS)

lms1 <- lqs(y ~ x, method = "lms")
lts1 <- lqs(y ~ x, method = "lts")

For the LTS estimator, the quantile argument can be used to specify the value of \(h\) (the number of observations to use in the objective function):

lts1 <- lqs(y ~ x, method = "lts", quantile = floor(n/2) + 1)

As discussed in section 19.2.2, the value of \(h\) controls the trade-off between robustness and efficiency. Smaller values of \(h\) give higher breakdown points but lower efficiency.

Coefficients can be extracted using coef(lms1) or coef(lts1), and the fitted lines can be plotted using abline(lms1) or abline(lts1).

Summary

The following table summarises the R functions for robust regression:

Method Function Package Section
LAV rq(y ~ x) quantreg 18.1
Huber rlm(y ~ x, psi = psi.huber) MASS 18.3.3
Bisquare rlm(y ~ x, psi = psi.bisquare) MASS 18.3.5
Hampel rlm(y ~ x, psi = psi.hampel) MASS 18.3.4
LMS lqs(y ~ x, method = "lms") MASS 19.2.1
LTS lqs(y ~ x, method = "lts") MASS 19.2.2

All of these functions return model objects that work with coef(), resid(), and abline().