Section 20 Examples

In this section we compare the robust regression methods from sections 17, 18 and 19 by applying them to synthetic data. We will see how methods with different breakdown points behave in the presence of outliers, how the weights assigned by different M-estimators differ in practice, and how robust methods produce cleaner residual patterns than least squares regression.

20.1 Comparing Robust Methods

Example 20.1 We compare several robust estimators on a synthetic dataset with outliers. This example illustrates how methods with high breakdown points resist contamination, whilst methods with low breakdown points can be severely affected.

We start by generating data from a linear relationship with three outliers:

set.seed(20251006)

n <- 30
x <- runif(n, 0, 10)
y <- 2 + 0.5*x + rnorm(n, sd = 0.5)

# Add three contaminants:
outliers <- c(5, 15, 25)
x[outliers] <- c(9, 8.5, 9.5)
y[outliers] <- c(-2, -1.5, -2.5)

Now we fit several different regression methods to this data:

library(MASS)     # for rlm() and lqs()
library(quantreg) # for rq()
Loading required package: SparseM
lm1   <- lm(y ~ x)                           # LSQ
rq1   <- rq(y ~ x)                           # LAV
rlm1  <- rlm(y ~ x, psi = psi.huber)         # Huber
rlm2  <- rlm(y ~ x, psi = psi.bisquare)      # Bisquare
rlm3  <- rlm(y ~ x, psi = psi.hampel)        # Hampel
lms1  <- lqs(y ~ x, method = "lms")          # LMS
lts1  <- lqs(y ~ x, method = "lts")          # LTS

The left plot in figure 20.1 shows the fitted lines from all seven methods. As expected, only those methods with high breakdown point (LMS and LTS) have resisted the gross outliers. The least squares, LAV and M-estimator lines all pass between the bulk of the data and the outliers, demonstrating their susceptibility to contamination.

The right plot shows how the LTS estimator behaves for different values of the quantile parameter \(h\) (called quantile in R). For smaller values of \(h\), the method uses only the observations with smallest residuals and has higher breakdown point. As \(h\) increases towards \(n = 30\), the method gradually converges to the least squares estimate.

On the left we compare several methods. As expected, only those with high breakdown point have resisted the gross outliers. On the right plot, we compare different values of the quantile parameter $h$ for the LTS estimator.On the left we compare several methods. As expected, only those with high breakdown point have resisted the gross outliers. On the right plot, we compare different values of the quantile parameter $h$ for the LTS estimator.

Figure 20.1: On the left we compare several methods. As expected, only those with high breakdown point have resisted the gross outliers. On the right plot, we compare different values of the quantile parameter \(h\) for the LTS estimator.

We see that for small \(h\) the LTS line passes through the bulk of the data and ignores the outliers, whilst for larger \(h\) it is increasingly influenced by the outliers until it converges to the least squares line.

20.2 Residual Analysis

Example 20.2 Continuing with the same dataset, we now examine the residual patterns from different methods. When outliers influence the fitted line, the residuals from the remaining observations often exhibit systematic patterns.

Figure 20.2 shows the residuals from three methods, excluding the three known outliers. The least squares residuals (left plot) show a clear curved pattern: residuals are positive for small and large \(x\)-values, and negative for intermediate values. This pattern indicates that the outliers have pulled the regression line away from the bulk of the data.

The LAV and Huber methods (centre and right plots) show much less pronounced patterns. The residuals are more evenly scattered around zero, indicating that these methods have been less influenced by the outliers. This demonstrates one advantage of robust methods: they not only provide better parameter estimates in the presence of outliers, but also produce more reliable residual plots for model diagnostics.

Residuals from non-outlier observations for three methods (left: LSQ; centre: LAV; right: Huber). The least squares residuals show a clear pattern due to the influence of the outliers.Residuals from non-outlier observations for three methods (left: LSQ; centre: LAV; right: Huber). The least squares residuals show a clear pattern due to the influence of the outliers.Residuals from non-outlier observations for three methods (left: LSQ; centre: LAV; right: Huber). The least squares residuals show a clear pattern due to the influence of the outliers.

Figure 20.2: Residuals from non-outlier observations for three methods (left: LSQ; centre: LAV; right: Huber). The least squares residuals show a clear pattern due to the influence of the outliers.

20.3 Weight Comparisons

Example 20.3 We now examine the weights assigned by different M-estimators. Recall from section 18 that M-estimators assign weights to observations based on the magnitude of their residuals, with larger residuals receiving smaller weights.

Figure 20.3 shows the weights assigned by three different M-estimators: Huber, Bisquare and Hampel. The left plot shows the weights for all 30 observations. We can clearly see that observations 5, 15 and 25 (the outliers) receive very low weights from all three methods, confirming that these are correctly identified as outliers.

The centre and right plots compare the weights pairwise between methods. Most observations lie on or near the diagonal line, indicating that different methods assign similar weights to these observations. However, for the outliers (points far from the diagonal), the methods differ:

  • The Bisquare and Hampel methods assign more extreme weights than the Huber method, with some observations receiving weights very close to zero.
  • The Huber method assigns intermediate weights even to the outliers, which reflects the fact that the Huber \(\psi\)-function does not completely down-weight large residuals.

These differences reflect the different shapes of the weight functions \(w(\varepsilon) = \psi(\varepsilon) / \varepsilon\) discussed in section 18. The Bisquare and Hampel methods are “redescending” M-estimators, meaning that sufficiently large residuals receive weight zero. The Huber method is not redescending, so all observations retain some influence.

Comparisons of weights assigned by different M-estimators. The left plot shows weights for all observations. The centre and right plots compare weights pairwise between methods, with outliers shown as red points.Comparisons of weights assigned by different M-estimators. The left plot shows weights for all observations. The centre and right plots compare weights pairwise between methods, with outliers shown as red points.Comparisons of weights assigned by different M-estimators. The left plot shows weights for all observations. The centre and right plots compare weights pairwise between methods, with outliers shown as red points.

Figure 20.3: Comparisons of weights assigned by different M-estimators. The left plot shows weights for all observations. The centre and right plots compare weights pairwise between methods, with outliers shown as red points.

Summary

  • We have compared several robust regression methods on synthetic data with outliers.
  • Methods with high breakdown points (LMS, LTS) resist outliers, whilst methods with low breakdown points (least squares, LAV, M-estimators) can be severely affected.
  • Robust methods produce cleaner residual patterns than least squares when outliers are present, making diagnostics more reliable.
  • M-estimators assign weights to observations based on residual magnitude, with different methods (Huber, Bisquare, Hampel) assigning weights differently.
  • Redescending M-estimators (Bisquare, Hampel) assign zero weight to sufficiently large residuals, whilst the Huber method retains some influence for all observations.
  • The choice of robust method involves a trade-off between breakdown point and efficiency, as discussed in sections 1719.