Section 6 Estimating Coefficients Simultaneously
In this section we will study how to assess the uncertainty in two or more regression coefficients simultaneously. This is needed since the estimates for different coefficients will usually not be independent.
6.1 Linear Combinations of Coefficients
As a general setup which allows to describe which coefficients we are interested in, we consider the image Kβ, where K∈Rk×(p+1) with k≤p+1.
Example 6.1 If p=3 and K=(01000010), then Kβ=(01000010)(β0β1β2β3)=(β1β2). Thus, this choice of K would be appropriate if we are interested in β1 and β2 only.
The setup allows for more general questions than just selecting components of β. We can also derive statements about linear combination of the βi, e.g. we will be able to derive confidence intervals for quantities like β1−β2.
Since ˆβ is an estimator for β, we can use Kˆβ as an estimator for Kβ. From lemma 4.1 we know that ˆβ∼N(β,σ2(X⊤X)−1) and thus we get Kˆβ∼N(Kβ,σ2K(X⊤X)−1K⊤).
Given an invertible matrix Q, we define the shorthand notation ‖v‖2Q:=v⊤Q−1v. Using this notation for Q=K(X⊤X)−1K⊤, we define F:=‖Kˆβ−Kβ‖2K(X⊤X)−1K⊤kˆσ2 as a measure for the distance between Kˆβ and Kβ. This quantity plays the role of T (or more precisely, T2) from the previous section. We also need to introduce the F-distribution, which will take the place of the t-distribution from the previous section.
Definition 6.1 The F-distribution with ν1 and ν2 degrees of freedom is the distribution of X=S1/ν1S2/ν2, where S1 and S2 are independent random variables with chi-square distributions with ν1 and ν2 degrees of freedom, respectively.
With these preparations in place, we can state the main result.
Lemma 6.1 Assume that the data follows the model (4.1) and that Q:=K(X⊤X)−1K⊤ is invertible. Then F∼Fk,n−p−1, i.e. F follows a F-distribution with k and n−p−1 degrees of freedom.
Proof. We have ‖Kˆβ−Kβ‖2Q=‖K(ˆβ−β)‖2Q=‖K(X⊤X)−1X⊤ε‖2Q=ε⊤X(X⊤X)−1K⊤Q−1K(X⊤X)−1X⊤ε=:ε⊤Rε. It is tedious but easy to check that R is idempotent: R2=(X(X⊤X)−1K⊤Q−1K(X⊤X)−1X⊤)(X(X⊤X)−1K⊤Q−1K(X⊤X)−1X⊤)=X(X⊤X)−1K⊤Q−1K(X⊤X)−1(X⊤X(X⊤X)−1)K⊤Q−1K(X⊤X)−1X⊤=X(X⊤X)−1K⊤(Q−1K(X⊤X)−1K⊤)Q−1K(X⊤X)−1X⊤=X(X⊤X)−1K⊤Q−1K(X⊤X)−1X⊤=R. When we checked in the proof of Cochran’s theorem that ε⊤Hε was chi-squared distributed, the only property of H we used was that H is idempotent. (If you don’t remember the details, it would be a good idea to re-read the proof before continuing.) Thus, the same argument gives that ε⊤Rε/σ2 is chi-squared distributed, and as before the number of degrees of freedom of this chi-squared distribution equals the rank of R. Using the assumption that Q is invertible, one can show (we skip this part of the proof again) that the rank of Q equals min(k,p+1)=k and thus we find that S1:=1σ2‖Kˆβ−Kβ‖2Q∼χ2(k).
Similarly, from the direct application of Cochran’s theorem in equation (4.8), we know S2:=1σ2(n−p−1)ˆσ2∼χ2(n−p−1).
Since S1 is a function of ˆβ and S2 is a function of ˆσ2, we can use lemma 5.1 to conclude that S1 and S2 are independent. Combining these results we find F=‖Kˆβ−Kβ‖2K(X⊤X)−1K⊤kˆσ2=1σ2‖Kˆβ−Kβ‖2K(X⊤X)−1K⊤/k1σ2(n−p−1)ˆσ2/(n−p−1)=S1/kS2/(n−p−1)∼Fk,n−p−1. This completes the proof.
6.2 Confidence Regions
Using F as a distance between the unknown true Kβ and the estimator ˆβ, it is easy to find a region of Rk which covers Kβ with high probability. Since we now have an k-dimensional parameter vector, this region will no longer be an interval. Instead, we will get an k-dimensional ellipsoid.
6.2.1 Result
Define E:={z∈Rk|‖z−Kˆβ‖K(X⊤X)−1K⊤≤√kˆσ2fk,n−p−1(α)}, where fk,n−p−1(α) is the (1−α)-quantile of the Fk,n−p−1-distribution. This is a “ball” around Kˆβ in Rk, where distance is measured using the norm ‖⋅‖K(X⊤X)−1K⊤ introduced above. The following lemma shows that √kˆσ2fk,n−p−1(α) is the correct choice of “radius” to make the ball cover the true value Kβ with probability 1−α.
Lemma 6.2 We have P(Kβ∈E)=1−α, i.e. the set E is a (1−α)-confidence region for Kβ.
Proof. We have Kβ∈E⟺‖Kβ−Kˆβ‖K(X⊤X)−1K⊤≤√kˆσ2fk,n−p−1(α)⟺‖Kˆβ−Kβ‖2K(X⊤X)−1K⊤≤kˆσ2fk,n−p−1(α)⟺‖Kˆβ−Kβ‖2K(X⊤X)−1K⊤kˆσ2≤fk,n−p−1(α)⟺F≤fk,n−p−1(α) Now the claim follows, since fk,n−p−1(α) is the (1−α)-quantile of F.
6.2.2 Numerical Experiments
We start by fitting a linear model to the stackloss
dataset as
before:
Call:
lm(formula = stack.loss ~ ., data = stackloss)
Residuals:
Min 1Q Median 3Q Max
-7.2377 -1.7117 -0.4551 2.3614 5.6978
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -39.9197 11.8960 -3.356 0.00375 **
Air.Flow 0.7156 0.1349 5.307 5.8e-05 ***
Water.Temp 1.2953 0.3680 3.520 0.00263 **
Acid.Conc. -0.1521 0.1563 -0.973 0.34405
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 3.243 on 17 degrees of freedom
Multiple R-squared: 0.9136, Adjusted R-squared: 0.8983
F-statistic: 59.9 on 3 and 17 DF, p-value: 3.016e-09
Here I want to consider the two regressions coefficients β1 (Air.Flow
)
and β2 (Water.Temp
). For this we need to construct a matrix K
with two rows, where each row selects one of the two coefficients:
X <- model.matrix(m)
n <- nrow(X)
p <- ncol(X) - 1
K <- matrix(c(0, 1, 0, 0, # indices in R start at 1, so beta_1 is col. 2
0, 0, 1, 0), # ... and beta_2 is column 3.
byrow = TRUE,
nrow = 2, ncol = 4)
k <- nrow(K)
K.beta.hat <- as.vector(K %*% coef(m))
As we have seen in the previous section, for this K the covariance matrix for the ellipse is Q=K(X⊤X)−1K⊤:
6.2.2.1 A Single Point
To try out the method, we first consider the test point m=Kβ=(1,1) and we compute the F value for this point to see whether the point is inside or outside the ellipse:
a <- c(1, 1)
sigma.hat <- summary(m)$sigma
d <- a - K.beta.hat
F <- t(d) %*% solve(Q, d) / (k * sigma.hat^2)
F
[,1]
[1,] 2.834083
The F value, measuring the distance from the centre of the ellipse is 2.834 for this test point. This has to be compared to the critical value:
[1] 3.591531
Since the F value is less than the critical value, the point (1,1) is inside the ellipse. As the next step, we will plot a picture of the full ellipse.
6.2.2.2 Points on a Grid
An easy way to show the ellipse is to repeat the above procedure for all points on a rectangular grid, and then colour the points depending on whether they are inside or outside the ellipse. We start by making a list of grid points.
x.min <- -1
x.max <- 3
y.min <- -0.5
y.max <- 3
L <- 200
xx <- seq(x.min, x.max, length.out = L)
yy <- seq(y.min, y.max, length.out = L)
Z <- as.matrix(expand.grid(x = xx - K.beta.hat[1],
y = yy - K.beta.hat[2],
KEEP.OUT.ATTRS = FALSE))
dim(Z)
[1] 40000 2
The matrix Z now has two columns, containing the x and y coordinates respectively of the 200×200 points in our grid. Now we need to compute the F value for every grid point:
[1] 200 200
The resulting matrix contains the F value for every grid point. Finally, we can mark all the points where F exceeds the critical value in a plot:
image(x = xx, y = yy, t(F > f.crit), asp = 1,
col = c("green", "white"),
xlab = expression(beta[1]), ylab = expression(beta[2]))
points(K.beta.hat[1], K.beta.hat[2], pch = "+")
The green ellipse in this plot is the 95% confidence ellipse for the vector (β1,β2).
6.2.2.3 Coordinates of the Outline
This sub-section is non-examinable (but hopefully interesting).
Using some more linear algebra, we can find a formula for the coordinates of the points on the ellipse which forms the boundary of the confidence region. For this, we use the Singular Value Decomposition of the matrix Q. This allows to write Q as Q=UDV⊤ where U and V are orthogonal matrices and D is a diagonal matrix:
$d
[1] 0.0138678012 0.0007364967
$u
[,1] [,2]
[1,] -0.2749061 0.9614711
[2,] 0.9614711 0.2749061
$vt
[,1] [,2]
[1,] -0.2749061 0.9614711
[2,] 0.9614711 0.2749061
The R output shows the diagonal elements dii of D and the matrices U and V⊤. Since Q is symmetric, we have U=V and thus Q=UDU⊤ in this case, i.e. we have found a diagonalisation of Q.
Using the SVD we can simplify the norm used in the definition of the ellipse. We write D−1/2 for the matrix which has 1/√dii on the diagonal. Then we have Q(D−1/2U⊤)⊤(D−1/2U⊤)=QUD−1/2D−1/2U⊤=UDU⊤UD−1U⊤=UDD−1U⊤=UU⊤=I. Thus, (D−1/2U⊤)⊤(D−1/2U⊤) is the inverse of Q and we get ‖z−Kˆβ‖2Q=(z−Kˆβ)⊤Q−1(z−Kˆβ)=(z−Kˆβ)⊤(D−1/2U⊤)⊤(D−1/2U⊤)(z−Kˆβ)=‖D−1/2U⊤(z−Kˆβ)‖2, where the norm on the right-hand side is the usual Euclidean norm. Thus, the boundary of the ellipse consists of all points of the form Kˆβ+d, where e:=D−1/2U⊤d is a vector of length ‖e‖=√kˆσ2fk,n−p−1(α). Finally, using polar coordinates, we find the points on the boundary to be d=UD1/2e=UD1/2√kˆσ2fk,n−p−1(α)(cos(φ)sin(φ)) with φ∈[0,2π]. This allows to plot the boundary line in R.
phi <- seq(0, 2*pi, length.out = 201)
circ <- rbind(cos(phi), sin(phi)) * sqrt(f.crit * k * sigma.hat^2)
ellipse <- svd$u %*% (circ * sqrt(svd$d)) + K.beta.hat
image(x = xx, y = yy, t(F > f.crit), asp = 1,
col = c("green", "white"),
xlab = expression(beta[1]), ylab = expression(beta[2]))
lines(ellipse[1,], ellipse[2,])
To verify that everything is consistent, we have plotted the line on top of the image from the previous subsection. Since the black line perfectly surrounds the green area, both approaches are consistent.
6.3 Hypothesis Tests
We can easily derive a hypothesis test to test the hypothesis H0:Kβ=m against the alternative H1:Kβ≠m, where m∈Rk.
We redefine F as F:=‖Kˆβ−m‖2K(X⊤X)−1K⊤kˆσ2 using m in place of the Kβ above. Then the new definition of F is the same as (6.1) if H0 is true.
Lemma 6.3 The test which rejects H0 if and only if |F|>fk,n−p−1(α) has significance level α.
Proof. Assume that H0 is true. Then we have Kβ=m and thus the F defined in this section coincides with the expression from equation (6.1). From lemma 6.1 we know that F∼Fk,n−p−1. Thus we have P(type I error)=P(F>fk,n−p−1(α))=1−P(F≤fk,n−p−1(α))=1−(1−α)=α. This completes the proof.
Summary
- We have learned how to measure the distance between a subset of coefficients and the corresponding estimates.
- We have introduced the F-distribution.
- We have learned how to compute confidence regions for subsets of the regression coefficients.
- We have learned how to perform hypothesis tests for subsets of the regression coefficients.