A Linear Algebra Reminders

A.1 Vectors

We write \(v \in \mathbb{R}^d\) if \(v = (v_1, \ldots, v_d)\) for numbers \(v_1, \ldots, v_d \in\mathbb{R}\). We say that \(v\) is a \(d\)-dimensional vector, and \(\mathbb{R}^d\) is the \(d\)-dimensional Euclidean space. Vectors are often graphically represented as “column vectors”: \[\begin{equation*} v = \begin{pmatrix} v_1 \\ v_2 \\ \vdots \\ v_d \end{pmatrix}. \end{equation*}\]

If \(u,v\in\mathbb{R}^d\) are two vectors, the inner product of \(u\) and \(v\) is given by \[\begin{equation} u^\top v = \sum_{i=1}^d u_i v_i. \tag{A.1} \end{equation}\] Note that the two vectors must have the same length for the inner product to exist.

Using this notation, the Euclidean length of a vector \(v\) can be written as \[\begin{equation*} \|v\| = \sqrt{\sum_{i=1}^d v_i^2} = \sqrt{v^\top v}. \end{equation*}\]

Vectors \(v_1, \ldots, v_n\) are said to be orthogonal, if \(v_i^\top v_j = 0\) for all \(i \neq j\). The vectors are said to be orthonormal, if they are orthogonal and satisfy \(\|v_i\| = 1\) for all \(i\).

A.2 Matrices

We write \(A \in \mathbb{R}^{m\times n}\) if \[\begin{equation*} A = \begin{pmatrix} a_{1,1} & \ldots & a_{1,n}\\ a_{2,1} & \ldots & a_{2,n}\\ \vdots & \ddots & \vdots\\ a_{m,1} & \ldots & a_{m,n} \end{pmatrix}, \end{equation*}\] where \(a_{i,j}\), sometimes also written as \(a_{ij}\) are numbers for \(i \in \{1, \ldots, m\}\) and \(j \in \{1, \ldots, n\}\).

A.2.1 Transpose

If \(A \in \mathbb{R}^{m\times n}\), then the transpose of \(A\) is the matrix \(A^\top \in \mathbb{R}^{n\times m}\), with \((A^\top)_{ij} = a_{ji}\) for all \(i \in \{1, \ldots, n\}\) and \(j \in \{1, \ldots, m\}\). Graphically, this can be written as \[\begin{equation*} A^\top = \begin{pmatrix} a_{1,1} & a_{2,1} & \ldots & a_{m,1} \\ \vdots & \vdots & \ddots & \vdots \\ a_{1,n} & a_{2,n} & \ldots & a_{m,n} \end{pmatrix}, \end{equation*}\]

Definition A.1 A matrix \(A\) is called symmetric, if \(A^\top = A\).

A.2.2 Matrix-vector Product

If \(A \in \mathbb{R}^{m \times n}\) and \(v \in \mathbb{R}^n\), then \(Av \in \mathbb{R}^m\) is the vector with \[\begin{equation*} (Av)_i = \sum_{j=1}^n a_{ij} v_j \end{equation*}\] for all \(i \in \{1, \ldots, m\}\).

If we consider \(v\) to be a \((n\times 1)\)-matrix instead of a vector, \(Av\) can also be interpreted as a matrix-matrix product between an \(m \times n\) and an \(n\times 1\) matrix. Using this convention, \(v^\top\) is then interpreted as an \(1 \times n\) matrix and if \(u\in\mathbb{R}^m\) we have \(u^\top A \in \mathbb{R}^{1 \times n} \cong \mathbb{R}^n\) with \[\begin{equation*} (u^\top A)_j = \sum_{i=1}^m u_i a_{ij} \end{equation*}\] for all \(j \in \{1, \ldots, n\}\). Going one step further, this notation also motivates the expression \(u^\top v\) in equation (A.1).

A.2.3 Matrix-matrix Product

If \(A \in \mathbb{R}^{\ell \times m}\) and \(B \in \mathbb{R}^{m\times n}\), then \(AB \in \mathbb{R}^{\ell \times n}\) is the matrix with \[\begin{equation*} (AB)_{ik} = \sum_{j=1}^m a_{ij} b_{jk} \end{equation*}\] for all \(i \in \{1, \ldots, \ell\}\) and \(k \in \{1, \ldots, n\}\). This is called the matrix product of \(A\) and \(B\). Note that \(A\) and \(B\) must have compatible shapes for the product to exist.

Properties:

The matrix product is associative: if \(A\), \(B\) and \(C\) are matrices with shapes such that \(AB\) and \(BC\) exist, then we have \(A(BC) = (AB)C\). It does not matter in which order we perform the matrix products here.
The matrix product is transitive: if \(A\), \(B\) and \(C\) have the correct shapes, we have \(A(B+C) = AB + AC\).
The matrix product is not commutative: if \(AB\) exists, in general \(A\) and \(B\) don’t have the correct shapes for \(BA\) to also exist, and even if \(BA\) exists, in general we have \(AB \neq BA\).
Taking the transpose swaps the order in a matrix product: we have \[\begin{equation} (AB)^\top = B^\top A^\top \tag{A.2} \end{equation}\]

A.2.4 Rank

Definition A.2 The rank of a matrix \(A \in \mathbb{R}^{m \times n}\) is the dimension of the image space of \(A\): \[\begin{equation*} \mathop{\mathrm{rank}}(A) = \dim \bigl\{ Av \bigm| v\in \mathbb{R}^n \}. \end{equation*}\]

The rank can also be characterised as the largest number of linearly independent columns of \(A\), or the largest number of linearly independent rows of \(A\). Thus, for \(A \in \mathbb{R}^{m \times n}\) we always have \(\mathop{\mathrm{rank}}(A) \leq \min(m, n)\). The matrix \(A\) is said to have “full rank” if \(\mathop{\mathrm{rank}}(A) = \min(m, n)\).

A.2.5 Trace

Definition A.3 The trace of a matrix \(A \in \mathbb{R}^{n\times n}\) is given by \[\begin{equation*} \mathop{\mathrm{tr}}(A) = \sum_{i=1}^n a_ii. \end{equation*}\]

Properties:

\(\mathop{\mathrm{tr}}(A+B) = \mathop{\mathrm{tr}}(A) + \mathop{\mathrm{tr}}(B)\)
\(\mathop{\mathrm{tr}}(A^\top) = \mathop{\mathrm{tr}}(A)\)
\(\mathop{\mathrm{tr}}(ABC) = \mathop{\mathrm{tr}}(BCA) = \mathop{\mathrm{tr}}(CAB)\). The individual matrices \(A, B, C\) don’t need to be square for this relation to hold, but the relation only holds for cyclic permutations as shown. In general \(\mathop{\mathrm{tr}}(ABC) \neq \mathop{\mathrm{tr}}(ACB)\).

A.2.6 Matrix Inverse

If \(A\) is a square matrix and if there is a matrix \(B\) such that \(AB = I\), then \(A\) is called invertible and the matrix \(B\) is called the inverse of \(A\), denoted by \(A^{-1} := B\). Some important properties of the inverse:

The inverse, if it exists, is unique.
Left-inverse and right-inverse for matrices are the same: \(A^{-1} A = I\) holds if and only if \(A A^{-1} = I\).
If \(A\) is symmetric and invertible, then \(A^{-1}\) is also symmetric. This is true because \(A (A^{-1})^\top = (A^{-1} A)^\top = I^\top = I\) and thus \((A^{-1})^\top\) is an inverse of \(A\). Since the inverse is unique, \((A^{-1})^\top = A^{-1}\).

Theorem A.1 Let \(A \in \mathbb{R}^{n\times n}\). Then the following statements are equivalent:

\(A\) is invertible
\(A\) has full rank, i.e. \(\mathop{\mathrm{rank}}(A) = n\).
The equation \(Ax = 0\) has \(x = 0\) as its only solution.
The equation \(Ax = b\) has exactly one solution \(x\) for every \(b\in\mathbb{R}^n\).

A.2.7 Orthogonal Matrices

Definition A.4 A matrix \(U\) is called orthogonal, if \(U^\top U = I = U U^\top\).

Some important properties of orthogonal matrices:

If \(U\) is orthogonal, the inverse and the transpose are the same: \(U^\top = U^{-1}\).
We have \(\| U x \|^2 = x^\top U^\top U x = x^\top x = \| x \|^2\). Thus, multiplying a vector \(x\) by an orthogonal matrix \(U\) does not change its length.

A.2.8 Positive Definite Matrices

Definition A.5 A symmetric matrix \(A \in \mathbb{R}^{n\times n}\) is called positive definite, if \[\begin{equation*} x^\top A x > 0 \end{equation*}\] for all \(x \in \mathbb{R}^n\) with \(x\neq 0\). The matrix is called positive semi-definite, if \[\begin{equation*} x^\top A x \geq 0 \end{equation*}\] for all \(x \in \mathbb{R}^n\).

A.2.9 Idempotent Matrices

Definition A.6 The matrix \(A\) is idempotent, if \(A^2 = A\).

A.3 Eigenvalues

Definition A.7 Let \(A \in\mathbb{R}^{n\times n}\) be a square matrix and \(\lambda\in R\). The number \(\lambda\) is called an eigenvalue of \(A\), if there exists a vector \(v \neq 0\) such that \(A x = \lambda x\). Any such vector \(x\) is called an eigenvector of \(A\) with eigenvalue \(\lambda\).

While there are very many results about eigenvectors and eigenvalues in Linear Algebra, here we will only use a small number of these results. We summarise what we need for this module:

If \(A\) is idempotent and \(x\) is an eigenvector with eigenvalue \(\lambda\), then we have \(\lambda x = A x = A^2 x = \lambda Ax = \lambda^2 x\). Thus we have \(\lambda^2 = \lambda\). This shows that the only eigenvalues possible for idempotent matrices are \(0\) and \(1\).

Theorem A.2 Let \(A\in\mathbb{R}^{n\times n}\) be symmetric. Then there is an orthogonal matrix \(U\) such that \(D := U A U^\top\) is diagonal. The diagonal elements of \(D\) are the eigenvalues of \(A\) and the rows of \(U\) are corresponding eigenvectors.