Interlude: Robust Regression in R

This section explains how to use the robust regression methods discussed in sections 17, 18 and 19 in R.

M-Estimators in R

As discussed in section 18, M-estimators include methods such as Huber, Bisquare, Hampel, and also LAV (which uses \(\rho(\varepsilon) = |\varepsilon|\)). In R, these methods are implemented in two different packages: the MASS package provides rlm() for Huber, Bisquare and Hampel, whilst the quantreg package provides rq() for LAV.

Huber, Bisquare and Hampel

The rlm() function from the MASS package implements the iteratively reweighted least squares algorithm from section 18. The function takes a formula interface similar to lm(), and the psi argument specifies which \(\psi\)-function to use:

library(MASS)

rlm1 <- rlm(y ~ x, psi = psi.huber)      # Huber's method
rlm2 <- rlm(y ~ x, psi = psi.bisquare)   # Tukey's bisquare
rlm3 <- rlm(y ~ x, psi = psi.hampel)     # Hampel's method

The available psi functions correspond to the objective functions discussed in section 18:

  • psi.huber implements Huber’s \(t\)-function from section 18.3.3.
  • psi.bisquare implements Tukey’s bisquare method.
  • psi.hampel implements Hampel’s method.

Information can be extracted from the fitted model object:

  • coef(rlm1) returns the estimated regression coefficients \(\hat\beta\).
  • rlm1$resid returns the residuals.
  • rlm1$w returns the weights assigned to each observation.

The fitted line can be plotted using abline(rlm1).

LAV

The rq() function from the quantreg package computes the least absolute values (LAV) estimator from section 18.1:

library(quantreg)

rq1 <- rq(y ~ x)

The rq() function uses specialised algorithms (interior point methods) which are more efficient and have better convergence properties than the general-purpose optim() approach shown in section 18.1.

Coefficients can be extracted using coef(rq1) and residuals using resid(rq1). The fitted line can be plotted using abline(rq1).

High Breakdown Point Methods

The lqs() function from the MASS package implements both the least median of squares (LMS) and least trimmed squares (LTS) estimators from section 19:

library(MASS)

lms1 <- lqs(y ~ x, method = "lms")
lts1 <- lqs(y ~ x, method = "lts")

For the LTS estimator, the quantile argument can be used to specify the value of \(k\) (the number of observations to use in the objective function):

lts1 <- lqs(y ~ x, method = "lts", quantile = floor(n/2) + 1)

As discussed in section 19.2.2, the value of \(k\) controls the trade-off between robustness and efficiency. Smaller values of \(k\) give higher breakdown points but lower efficiency.

Coefficients can be extracted using coef(lms1) or coef(lts1), and the fitted lines can be plotted using abline(lms1) or abline(lts1).

Summary

The following table summarises the R functions for robust regression:

Method Section Function Package
LAV 18.3.2 rq(y ~ x) quantreg
Huber 18.3.3 rlm(y ~ x, psi = psi.huber) MASS
Bisquare 18.3.5 rlm(y ~ x, psi = psi.bisquare) MASS
Hampel 18.3.4 rlm(y ~ x, psi = psi.hampel) MASS
LMS 19.2.1 lqs(y ~ x, method = "lms") MASS
LTS 19.2.2 lqs(y ~ x, method = "lts") MASS

All of these functions return model objects that work with coef(), resid(), and abline().