Interlude: Robust Regression in R
This section explains how to use the robust regression methods discussed in sections 17, 18 and 19 in R.
M-Estimators
The rlm() function from the MASS package implements the iteratively
reweighted least squares algorithm from section 18. The
function takes a formula interface similar to lm(), and the psi argument
specifies which \(\psi\)-function to use:
library(MASS)
rlm1 <- rlm(y ~ x, psi = psi.huber) # Huber's method
rlm2 <- rlm(y ~ x, psi = psi.bisquare) # Tukey's bisquare
rlm3 <- rlm(y ~ x, psi = psi.hampel) # Hampel's methodThe available psi functions correspond to the objective functions discussed
in section 18:
psi.huberimplements Huber’s \(t\)-function from section 18.3.3.psi.bisquareimplements Tukey’s bisquare method.psi.hampelimplements Hampel’s method.
Information can be extracted from the fitted model object:
coef(rlm1)returns the estimated regression coefficients \(\hat\beta\).rlm1$residreturns the residuals.rlm1$wreturns the weights assigned to each observation.
The fitted line can be plotted using abline(rlm1).
LAV Estimation
The rq() function from the quantreg package computes the least absolute
values (LAV) estimator from section 18.1:
The rq() function uses specialized algorithms (interior point methods) which
are more efficient and have better convergence properties than the
general-purpose optim() approach shown in section 18.1. For
pedagogical purposes, section 18.1 shows the optim()
implementation to illustrate what the LAV estimator actually minimises, but for
practical work the rq() function should be used.
Coefficients can be extracted using coef(rq1) and residuals using resid(rq1).
The fitted line can be plotted using abline(rq1).
High Breakdown Point Methods
The lqs() function from the MASS package implements both the least median
of squares (LMS) and least trimmed squares (LTS) estimators from
section 19:
For the LTS estimator, the quantile argument can be used to specify
the value of \(h\) (the number of observations to use in the objective function):
As discussed in section 19.2.2, the value of \(h\) controls the trade-off between robustness and efficiency. Smaller values of \(h\) give higher breakdown points but lower efficiency.
Coefficients can be extracted using coef(lms1) or coef(lts1), and the
fitted lines can be plotted using abline(lms1) or abline(lts1).
Summary
The following table summarises the R functions for robust regression:
| Method | Function | Package | Section |
|---|---|---|---|
| LAV | rq(y ~ x) |
quantreg |
18.1 |
| Huber | rlm(y ~ x, psi = psi.huber) |
MASS |
18.3.3 |
| Bisquare | rlm(y ~ x, psi = psi.bisquare) |
MASS |
18.3.5 |
| Hampel | rlm(y ~ x, psi = psi.hampel) |
MASS |
18.3.4 |
| LMS | lqs(y ~ x, method = "lms") |
MASS |
19.2.1 |
| LTS | lqs(y ~ x, method = "lts") |
MASS |
19.2.2 |
All of these functions return model objects that work with coef(), resid(),
and abline().