Section 6 Estimating Coefficients Simultaneously

In this section we will study how to assess the uncertainty in two or more regression coefficients simultaneously. This is needed since the estimates for different coefficients will usually not be independent.

6.1 Linear Combinations of Coefficients

As a general setup which allows to describe which coefficients we are interested in, we consider the image Kβ, where KRk×(p+1) with kp+1.

Example 6.1 If p=3 and K=(01000010), then Kβ=(01000010)(β0β1β2β3)=(β1β2). Thus, this choice of K would be appropriate if we are interested in β1 and β2 only.

The setup allows for more general questions than just selecting components of β. We can also derive statements about linear combination of the βi, e.g. we will be able to derive confidence intervals for quantities like β1β2.

Since ˆβ is an estimator for β, we can use Kˆβ as an estimator for Kβ. From lemma 4.1 we know that ˆβN(β,σ2(XX)1) and thus we get KˆβN(Kβ,σ2K(XX)1K).

Given an invertible matrix Q, we define the shorthand notation v2Q:=vQ1v. Using this notation for Q=K(XX)1K, we define F:=KˆβKβ2K(XX)1Kkˆσ2 as a measure for the distance between Kˆβ and Kβ. This quantity plays the role of T (or more precisely, T2) from the previous section. We also need to introduce the F-distribution, which will take the place of the t-distribution from the previous section.

Definition 6.1 The F-distribution with ν1 and ν2 degrees of freedom is the distribution of X=S1/ν1S2/ν2, where S1 and S2 are independent random variables with chi-square distributions with ν1 and ν2 degrees of freedom, respectively.

With these preparations in place, we can state the main result.

Lemma 6.1 Assume that the data follows the model (4.1) and that Q:=K(XX)1K is invertible. Then FFk,np1, i.e. F follows a F-distribution with k and np1 degrees of freedom.

Proof. We have KˆβKβ2Q=K(ˆββ)2Q=K(XX)1Xε2Q=εX(XX)1KQ1K(XX)1Xε=:εRε. It is tedious but easy to check that R is idempotent: R2=(X(XX)1KQ1K(XX)1X)(X(XX)1KQ1K(XX)1X)=X(XX)1KQ1K(XX)1(XX(XX)1)KQ1K(XX)1X=X(XX)1K(Q1K(XX)1K)Q1K(XX)1X=X(XX)1KQ1K(XX)1X=R. When we checked in the proof of Cochran’s theorem that εHε was chi-squared distributed, the only property of H we used was that H is idempotent. (If you don’t remember the details, it would be a good idea to re-read the proof before continuing.) Thus, the same argument gives that εRε/σ2 is chi-squared distributed, and as before the number of degrees of freedom of this chi-squared distribution equals the rank of R. Using the assumption that Q is invertible, one can show (we skip this part of the proof again) that the rank of Q equals min(k,p+1)=k and thus we find that S1:=1σ2KˆβKβ2Qχ2(k).

Similarly, from the direct application of Cochran’s theorem in equation (4.8), we know S2:=1σ2(np1)ˆσ2χ2(np1).

Since S1 is a function of ˆβ and S2 is a function of ˆσ2, we can use lemma 5.1 to conclude that S1 and S2 are independent. Combining these results we find F=KˆβKβ2K(XX)1Kkˆσ2=1σ2KˆβKβ2K(XX)1K/k1σ2(np1)ˆσ2/(np1)=S1/kS2/(np1)Fk,np1. This completes the proof.

6.2 Confidence Regions

Using F as a distance between the unknown true Kβ and the estimator ˆβ, it is easy to find a region of Rk which covers Kβ with high probability. Since we now have an k-dimensional parameter vector, this region will no longer be an interval. Instead, we will get an k-dimensional ellipsoid.

6.2.1 Result

Define E:={zRk|zKˆβK(XX)1Kkˆσ2fk,np1(α)}, where fk,np1(α) is the (1α)-quantile of the Fk,np1-distribution. This is a “ball” around Kˆβ in Rk, where distance is measured using the norm K(XX)1K introduced above. The following lemma shows that kˆσ2fk,np1(α) is the correct choice of “radius” to make the ball cover the true value Kβ with probability 1α.

Lemma 6.2 We have P(KβE)=1α, i.e. the set E is a (1α)-confidence region for Kβ.

Proof. We have KβEKβKˆβK(XX)1Kkˆσ2fk,np1(α)KˆβKβ2K(XX)1Kkˆσ2fk,np1(α)KˆβKβ2K(XX)1Kkˆσ2fk,np1(α)Ffk,np1(α) Now the claim follows, since fk,np1(α) is the (1α)-quantile of F.

6.2.2 Numerical Experiments

We start by fitting a linear model to the stackloss dataset as before:

m <- lm(stack.loss ~ ., data = stackloss)
summary(m)

Call:
lm(formula = stack.loss ~ ., data = stackloss)

Residuals:
    Min      1Q  Median      3Q     Max 
-7.2377 -1.7117 -0.4551  2.3614  5.6978 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept) -39.9197    11.8960  -3.356  0.00375 ** 
Air.Flow      0.7156     0.1349   5.307  5.8e-05 ***
Water.Temp    1.2953     0.3680   3.520  0.00263 ** 
Acid.Conc.   -0.1521     0.1563  -0.973  0.34405    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 3.243 on 17 degrees of freedom
Multiple R-squared:  0.9136,    Adjusted R-squared:  0.8983 
F-statistic:  59.9 on 3 and 17 DF,  p-value: 3.016e-09

Here I want to consider the two regressions coefficients β1 (Air.Flow) and β2 (Water.Temp). For this we need to construct a matrix K with two rows, where each row selects one of the two coefficients:

X <- model.matrix(m)
n <- nrow(X)
p <- ncol(X) - 1

K <- matrix(c(0, 1, 0, 0,   # indices in R start at 1, so beta_1 is col. 2
              0, 0, 1, 0),  # ... and beta_2 is column 3.
            byrow = TRUE,
            nrow = 2, ncol = 4)
k <- nrow(K)
K.beta.hat <- as.vector(K %*% coef(m))

As we have seen in the previous section, for this K the covariance matrix for the ellipse is Q=K(XX)1K:

Q <- K %*% solve(t(X) %*% X) %*% t(K)

6.2.2.1 A Single Point

To try out the method, we first consider the test point m=Kβ=(1,1) and we compute the F value for this point to see whether the point is inside or outside the ellipse:

a <- c(1, 1)

sigma.hat <- summary(m)$sigma
d <- a - K.beta.hat
F <- t(d) %*% solve(Q, d) / (k * sigma.hat^2)
F
         [,1]
[1,] 2.834083

The F value, measuring the distance from the centre of the ellipse is 2.834 for this test point. This has to be compared to the critical value:

alpha <- 0.05
f.crit <- qf(1 - alpha, k, n - p - 1)
f.crit
[1] 3.591531

Since the F value is less than the critical value, the point (1,1) is inside the ellipse. As the next step, we will plot a picture of the full ellipse.

6.2.2.2 Points on a Grid

An easy way to show the ellipse is to repeat the above procedure for all points on a rectangular grid, and then colour the points depending on whether they are inside or outside the ellipse. We start by making a list of grid points.

x.min <- -1
x.max <- 3
y.min <- -0.5
y.max <- 3
L <- 200

xx <- seq(x.min, x.max, length.out = L)
yy <- seq(y.min, y.max, length.out = L)

Z <- as.matrix(expand.grid(x = xx - K.beta.hat[1],
                           y = yy - K.beta.hat[2],
                           KEEP.OUT.ATTRS = FALSE))
dim(Z)
[1] 40000     2

The matrix Z now has two columns, containing the x and y coordinates respectively of the 200×200 points in our grid. Now we need to compute the F value for every grid point:

F <- rowSums(Z * t(solve(Q, t(Z)))) / (k * sigma.hat^2)
F <- matrix(F, byrow = TRUE, L, L)
dim(F)
[1] 200 200

The resulting matrix contains the F value for every grid point. Finally, we can mark all the points where F exceeds the critical value in a plot:

image(x = xx, y = yy, t(F > f.crit), asp = 1,
      col = c("green", "white"),
      xlab = expression(beta[1]), ylab = expression(beta[2]))
points(K.beta.hat[1], K.beta.hat[2], pch = "+")

The green ellipse in this plot is the 95% confidence ellipse for the vector (β1,β2).

6.2.2.3 Coordinates of the Outline

This sub-section is non-examinable (but hopefully interesting).

Using some more linear algebra, we can find a formula for the coordinates of the points on the ellipse which forms the boundary of the confidence region. For this, we use the Singular Value Decomposition of the matrix Q. This allows to write Q as Q=UDV where U and V are orthogonal matrices and D is a diagonal matrix:

svd <- La.svd(Q)
svd
$d
[1] 0.0138678012 0.0007364967

$u
           [,1]      [,2]
[1,] -0.2749061 0.9614711
[2,]  0.9614711 0.2749061

$vt
           [,1]      [,2]
[1,] -0.2749061 0.9614711
[2,]  0.9614711 0.2749061

The R output shows the diagonal elements dii of D and the matrices U and V. Since Q is symmetric, we have U=V and thus Q=UDU in this case, i.e. we have found a diagonalisation of Q.

Using the SVD we can simplify the norm used in the definition of the ellipse. We write D1/2 for the matrix which has 1/dii on the diagonal. Then we have Q(D1/2U)(D1/2U)=QUD1/2D1/2U=UDUUD1U=UDD1U=UU=I. Thus, (D1/2U)(D1/2U) is the inverse of Q and we get zKˆβ2Q=(zKˆβ)Q1(zKˆβ)=(zKˆβ)(D1/2U)(D1/2U)(zKˆβ)=D1/2U(zKˆβ)2, where the norm on the right-hand side is the usual Euclidean norm. Thus, the boundary of the ellipse consists of all points of the form Kˆβ+d, where e:=D1/2Ud is a vector of length e=kˆσ2fk,np1(α). Finally, using polar coordinates, we find the points on the boundary to be d=UD1/2e=UD1/2kˆσ2fk,np1(α)(cos(φ)sin(φ)) with φ[0,2π]. This allows to plot the boundary line in R.

phi <- seq(0, 2*pi, length.out = 201)
circ <- rbind(cos(phi), sin(phi)) * sqrt(f.crit * k * sigma.hat^2)

ellipse <- svd$u %*% (circ * sqrt(svd$d)) + K.beta.hat

image(x = xx, y = yy, t(F > f.crit), asp = 1,
      col = c("green", "white"),
      xlab = expression(beta[1]), ylab = expression(beta[2]))
lines(ellipse[1,], ellipse[2,])

To verify that everything is consistent, we have plotted the line on top of the image from the previous subsection. Since the black line perfectly surrounds the green area, both approaches are consistent.

6.3 Hypothesis Tests

We can easily derive a hypothesis test to test the hypothesis H0:Kβ=m against the alternative H1:Kβm, where mRk.

We redefine F as F:=Kˆβm2K(XX)1Kkˆσ2 using m in place of the Kβ above. Then the new definition of F is the same as (6.1) if H0 is true.

Lemma 6.3 The test which rejects H0 if and only if |F|>fk,np1(α) has significance level α.

Proof. Assume that H0 is true. Then we have Kβ=m and thus the F defined in this section coincides with the expression from equation (6.1). From lemma 6.1 we know that FFk,np1. Thus we have P(type I error)=P(F>fk,np1(α))=1P(Ffk,np1(α))=1(1α)=α. This completes the proof.

Summary

  • We have learned how to measure the distance between a subset of coefficients and the corresponding estimates.
  • We have introduced the F-distribution.
  • We have learned how to compute confidence regions for subsets of the regression coefficients.
  • We have learned how to perform hypothesis tests for subsets of the regression coefficients.