Interlude: Robust Regression in R
This section explains how to use the robust regression methods discussed in sections 17, 18 and 19 in R.
M-Estimators
The rlm()
function from the MASS
package implements the iteratively
reweighted least squares algorithm from section 18. The
function takes a formula interface similar to lm()
, and the psi
argument
specifies which \(\psi\)-function to use:
library(MASS)
rlm1 <- rlm(y ~ x, psi = psi.huber) # Huber's method
rlm2 <- rlm(y ~ x, psi = psi.bisquare) # Tukey's bisquare
rlm3 <- rlm(y ~ x, psi = psi.hampel) # Hampel's method
The available psi
functions correspond to the objective functions discussed
in section 18:
psi.huber
implements Huber’s \(t\)-function from section 18.3.3.psi.bisquare
implements Tukey’s bisquare method.psi.hampel
implements Hampel’s method.
Information can be extracted from the fitted model object:
coef(rlm1)
returns the estimated regression coefficients \(\hat\beta\).rlm1$resid
returns the residuals.rlm1$w
returns the weights assigned to each observation.
The fitted line can be plotted using abline(rlm1)
.
LAV Estimation
The rq()
function from the quantreg
package computes the least absolute
values (LAV) estimator from section 18.1:
The rq()
function uses specialized algorithms (interior point methods) which
are more efficient and have better convergence properties than the
general-purpose optim()
approach shown in section 18.1. For
pedagogical purposes, section 18.1 shows the optim()
implementation to illustrate what the LAV estimator actually minimises, but for
practical work the rq()
function should be used.
Coefficients can be extracted using coef(rq1)
and residuals using resid(rq1)
.
The fitted line can be plotted using abline(rq1)
.
High Breakdown Point Methods
The lqs()
function from the MASS
package implements both the least median
of squares (LMS) and least trimmed squares (LTS) estimators from
section 19:
For the LTS estimator, the quantile
argument can be used to specify
the value of \(h\) (the number of observations to use in the objective function):
As discussed in section 19.2.2, the value of \(h\) controls the trade-off between robustness and efficiency. Smaller values of \(h\) give higher breakdown points but lower efficiency.
Coefficients can be extracted using coef(lms1)
or coef(lts1)
, and the
fitted lines can be plotted using abline(lms1)
or abline(lts1)
.
Summary
The following table summarises the R functions for robust regression:
Method | Function | Package | Section |
---|---|---|---|
LAV | rq(y ~ x) |
quantreg |
18.1 |
Huber | rlm(y ~ x, psi = psi.huber) |
MASS |
18.3.3 |
Bisquare | rlm(y ~ x, psi = psi.bisquare) |
MASS |
18.3.5 |
Hampel | rlm(y ~ x, psi = psi.hampel) |
MASS |
18.3.4 |
LMS | lqs(y ~ x, method = "lms") |
MASS |
19.2.1 |
LTS | lqs(y ~ x, method = "lts") |
MASS |
19.2.2 |
All of these functions return model objects that work with coef()
, resid()
,
and abline()
.