Problem Sheet 1

You should attempt all these questions and write up your solutions in advance of your workshop in week 3 where the answers will be discussed.

1. Consider the simple linear regression model \(y_i = \beta_0 + x_{i} \beta_1 + \varepsilon_i\) for \(i \in \{1, 2, \ldots, n\}\) and let \(X\) be the design matrix.

  1. Show that \(\displaystyle X^\top X = \begin{pmatrix} n & \sum_{i=1}^n x_i \\ \sum_{i=1}^n x_i & \sum_{i=1}^n x_i^2 \end{pmatrix} \in \mathbb{R}^{2\times 2}\).

For simple linear regression, we have \(p=1\) and the design matrix is \[\begin{equation*} X = \begin{pmatrix} 1 & x_1 \\ 1 & x_2 \\ \vdots & \vdots \\ 1 & x_n \end{pmatrix}. \end{equation*}\] Thus we have \[\begin{align*} X^\top X &= \begin{pmatrix} 1 & 1 & \cdots & 1 \\ x_1 & x_2 & \cdots & x_n \end{pmatrix} \begin{pmatrix} 1 & x_1 \\ 1 & x_2 \\ \vdots & \vdots \\ 1 & x_n \end{pmatrix} \\ &= \begin{pmatrix} \sum_{i=1}^n 1 & \sum_{i=1}^n 1 \cdot x_i \\ \sum_{i=1}^n x_i \cdot 1 & \sum_{i=1}^n x_i^2 \end{pmatrix} \\ &= \begin{pmatrix} n & \sum_{i=1}^n x_i \\ \sum_{i=1}^n x_i & \sum_{i=1}^n x_i^2 \end{pmatrix}. \end{align*}\]

  1. Using the formula \[\begin{equation*} \begin{pmatrix} a & b \\ c & d \end{pmatrix}^{-1} = \frac{1}{ad-bc} \begin{pmatrix} \;d & -b \\ -c & \,a \end{pmatrix}, \end{equation*}\] find \((X^\top X)^{-1}\).

Using the formula for the inverse of a \(2\times 2\)-matrix, we find \[\begin{align*} (X^\top X)^{-1} &= \begin{pmatrix} n & \sum_{i=1}^n x_i \\ \sum_{i=1}^n x_i & \sum_{i=1}^n x_i^2 \end{pmatrix}^{-1} \\ &= \frac{1}{n \sum_{i=1}^n x_i^2 - \bigl(\sum_{i=1}^n x_i\bigr)^2} \begin{pmatrix} \sum_{i=1}^n x_i^2 & -\sum_{i=1}^n x_i \\ -\sum_{i=1}^n x_i & n \end{pmatrix}. \end{align*}\]

  1. Find \(X^\top y\) and use this to derive an explicit formula for the least squares estimate \(\hat\beta = (X^\top X)^{-1} X^\top y\).

Omitting the indices in the sums for brevity, we have \[\begin{equation*} X^\top y = \begin{pmatrix} 1 & 1 & \cdots & 1 \\ x_1 & x_2 & \cdots & x_n \end{pmatrix} \begin{pmatrix} y_1 \\ y_2 \\ \vdots \\ y_n \end{pmatrix} = \begin{pmatrix} \sum y_i \\ \sum x_i y_i \end{pmatrix} \end{equation*}\] and thus \[\begin{align*} \hat\beta &= (X^\top X)^{-1} X^\top y \\ &= \frac{1}{n \sum x_i^2 - \bigl(\sum x_i\bigr)^2} \begin{pmatrix} \sum x_i^2 & -\sum x_i \\ -\sum x_i & n \end{pmatrix} \begin{pmatrix} \sum y_i \\ \sum x_i y_i \end{pmatrix} \\ &= \frac{1}{n \sum x_i^2 - \bigl(\sum x_i\bigr)^2} \begin{pmatrix} \sum x_i^2 \sum y_i - \sum x_i \sum x_i y_i \\ -\sum x_i \sum y_i + n \sum x_i y_i \end{pmatrix} \\ &= \frac{1}{\frac1n \sum_{i=1}^n x_i^2 - \bigl(\frac1n \sum_{i=1}^n x_i\bigr)^2} \begin{pmatrix} \frac1n \sum_{i=1}^n x_i^2 \cdot \frac1n\sum_{i=1}^n y_i - \frac1n\sum_{i=1}^n x_i \cdot \frac1n\sum_{i=1}^n x_i y_i \\ \frac1n\sum_{i=1}^n x_i y_i - \frac1n\sum_{i=1}^n x_i \cdot \frac1n\sum_{i=1}^n y_i \end{pmatrix}. \end{align*}\] This completes the answer.

Inspection of the final result shows that we have recovered the traditional formula for the coefficients in simple linear regression, only written in slightly unusual form. For example, the term \(\frac1n \sum_{i=1}^n x_i^2 - \bigl(\frac1n \sum_{i=1}^n x_i\bigr)^2\) equals the sample variance of the \(x_i\) (up to a factor of \(n/(n-1)\)). The algebra in this answer could be slightly simplified by changing to new coordinates \(\tilde x_i = x_i - \bar x\) and \(\tilde y_i = y_i - \bar y\) before fitting the regression model.

2. Let \(X\) be the design matrix of a model including an intercept, and let \(H = X (X^\top X)^{-1} X^\top \in\mathbb{R}^{n\times n}\) be the hat matrix. Finally, let \(\mathbf{1} = (1, 1, \ldots, 1) \in\mathbb{R}^n\). Show that \(H \mathbf{1} = \mathbf{1}\).

We already know that \(H X = X (X^\top X)^{-1} X^\top X = X\). Since the first column of \(X\) equals \(\mathbf{1}\), the first column of the matrix equation \(HX = X\) is \(H\mathbf{1} = \mathbf{1}\). This completes the proof.

3. For the stackloss dataset built into R, predict a value for stack.loss when the inputs are Air.Flow = 60, Water.Temp = 21 and Acid.Conc = 87.

We can fit the model using lm() as usual:

m <- lm(stack.loss ~ ., data = stackloss)
m

Call:
lm(formula = stack.loss ~ ., data = stackloss)

Coefficients:
(Intercept)     Air.Flow   Water.Temp   Acid.Conc.  
   -39.9197       0.7156       1.2953      -0.1521  

To predict a new value, we use the predict() command. Note that Acid.Conc. is spelled with a trailing dot, which we need to include in the name when we supply the new input values here.

predict(m, data.frame(Air.Flow = 60, Water.Temp = 21, Acid.Conc. = 87))
       1 
16.98509 

Thus, the predicted value for stack.loss is \(16.98509\).

4. Let \(\varepsilon_1, \ldots, \varepsilon_n \sim \mathcal{N}(\mu, \sigma^2)\) be independent. Then \(\varepsilon= (\varepsilon_1, \ldots, \varepsilon_n)\) is a random vector. Determine \(\mathbb{E}(\varepsilon)\), \(\mathop{\mathrm{Cov}}(\varepsilon)\) and \(\mathbb{E}\bigl( \|\varepsilon\|^2 \bigr)\).

We immediately find \(\mathbb{E}(\varepsilon) = \mu \mathbf{1}\) where \(\mathbf{1} = (1, \ldots, 1) \in\mathbb{R}^n\). Since the \(\varepsilon_i\) are independent, we have \(\mathop{\mathrm{Cov}}(\varepsilon) = \sigma^2 I\). Finally we have \[\begin{align*} \mathbb{E}\bigl(\|\varepsilon\|^2\bigr) &= \mathbb{E}\bigl( \sum_{i=1}^n \varepsilon_i^2 \bigr) \\ &= \sum_{i=1}^n E(\varepsilon_i^2) \\ &= n \mathbb{E}(\varepsilon_1^2). \end{align*}\] Since \(\mathbb{E}(\varepsilon_1^2) = \mathop{\mathrm{Var}}(\varepsilon_1) + \mathbb{E}(\varepsilon_1)^2 = \sigma^2 + \mu^2\) we get \(\mathbb{E}\bigl(\|\varepsilon\|^2\bigr) = n (\sigma^2 + \mu^2)\).