Problem Sheet 1
This problem sheet is for self-study only. It is not assessed.
1. Consider the following function: \[\begin{equation*} K(x) = \begin{cases} \frac23 (1 - |x|^3) & \mbox{if $|x|\leq 1$, and} \\ 0 & \text{otherwise}. \end{cases} \end{equation*}\]
- Show that this function integrates to 1 over its domain.
We have \[\begin{align*} \int_{-\infty}^{\infty} K(x) , dx &= \frac23 \int_{-1}^1 (1 - |x|^3) , dx \\ &= \frac43 \int_0^1 (1 - x^3) , dx \\ &= \frac43 \left[ x - \frac{x^4}{4} \right]_0^1 \\ &= \frac43 \left( 1 - \frac{1}{4} \right) \\ &= 1. \end{align*}\]
- Show that \(K\) satisfies the conditions of a kernel.
We have already seen that \(K\) integrates to 1 over its domain. Since \(|x| = |-x|\) for all \(x\in\mathbb{R}\), the function \(K\) is symmetric. Finally, since \(|x|^3 \leq 1\) for all \(x\in\mathbb{R}\) with \(|x|\leq 1\), we have \(K(x) \geq 0\) for all \(x\in\mathbb{R}\).
- Compute the moments \(\mu_0(K)\), \(\mu_1(K)\) and \(\mu_2(K)\) of \(K\).
The \(k\)th moment of \(K\) is given by \[\begin{equation*} \mu_k(K) = \int_{-\infty}^\infty x^k K(x) \,dx = \frac23 \int_{-1}^1 x^k \bigl( 1 - |x|^3 \bigr) \,dx. \end{equation*}\] For \(k = 0\), we know \(\mu_0(K) = 1\), from part a. For \(k = 1\), we have \(\mu_1(K) = 0\), since \(K\) is symmetric. For \(k = 2\), we find \[\begin{align*} \mu_2(K) &= \frac23 \int_{-1}^1 x^2 (1 - |x|^3) \,dx \\ &= \frac43 \int_0^1 x^2 (1 - x^3) \,dx \\ &= \frac43 \left[ \frac{x^3}{3} - \frac{x^6}{6} \right]_0^1 \\ &= \frac43 \left( \frac13 - \frac16 \right) \\ &= \frac29. \end{align*}\]
2. Consider a normal density with mean μ and variance σ², given by: \[\begin{equation*} f(x) = \frac{1}{\sqrt{2\pi\sigma^2}} \exp\left(-\frac{(x-\mu)^2}{2\sigma^2}\right). \end{equation*}\]
- Calculate f’(x) and f’’(x) for this density.
Using the chain rule: \[\begin{align*} f'(x) &= \frac{1}{\sqrt{2\pi\sigma^2}} \exp\left(-\frac{(x-\mu)^2}{2\sigma^2}\right) \cdot \left(-\frac{x-\mu}{\sigma^2}\right) \\ &= -f(x)\cdot\frac{x-\mu}{\sigma^2} \end{align*}\] Taking derivatives again, using the product rule: \[\begin{align*} f''(x) &= -f'(x)\cdot\frac{x-\mu}{\sigma^2} - f(x)\cdot\frac{1}{\sigma^2} \\ &= f(x)\cdot\frac{(x-\mu)^2}{\sigma^4} - f(x)\cdot\frac{1}{\sigma^2} \\ &= f(x)\cdot\left(\frac{(x-\mu)^2}{\sigma^4} - \frac{1}{\sigma^2}\right) \end{align*}\]
- Using the formula \[\begin{equation*} \mathop{\mathrm{bias}}(\hat f_h(x)) \approx \frac{\mu_2(K)}{2} f''(x) h^2, \end{equation*}\] show that for fixed \(K\) and \(h\), the bias satisfies the following proportionality: \[\begin{equation*} \mathop{\mathrm{bias}}(\hat f_h(x)) \propto \exp\left(-\frac{(x-\mu)^2}{2\sigma^2}\right) \left(\frac{(x-\mu)^2}{\sigma^4} - \frac{1}{\sigma^2}\right). \end{equation*}\]
Substituting our expression for \(f''(x)\) into the bias formula we get \[\begin{align*} \mathop{\mathrm{bias}}(\hat f_h(x)) &\approx \frac{\mu_2(K)}{2} f''(x) h^2 \\ &= \frac{\mu_2(K)}{2} \cdot f(x)\cdot\left(\frac{(x-\mu)^2}{\sigma^4} - \frac{1}{\sigma^2}\right) h^2 \\ &= \frac{\mu_2(K)h^2}{2\sqrt{2\pi\sigma^2}} \cdot \exp\left(-\frac{(x-\mu)^2}{2\sigma^2}\right) \left(\frac{(x-\mu)^2}{\sigma^4} - \frac{1}{\sigma^2}\right). \end{align*}\]
- For which values of \(x\) is the bias positive, negative, or zero? Why does this make sense intuitively?
The exponential term is always positive, so the sign depends on the sign of the term \((x-\mu)^2 / \sigma^4 - 1/\sigma^2\). This equals zero when \((x-\mu)^2 = \sigma^2\), or when \(x = \mu \pm \sigma\). The bias is negative when \(|x-\mu| < \sigma\), is positive when \(|x-\mu| > \sigma\).
This makes sense because the positive bandwidth \(h\) smoothes the density, so the bias is negative when the density is concave (near the maxiumum) and positive when it is convex (towards the tails).
3. Consider the variance of a kernel density estimate \(\hat f_h(x)\) based on a sample \(X_1, \ldots, X_n\) with common density \(f\): \[\begin{equation*} \mathop{\mathrm{Var}}(\hat f_h(x)) = \frac{1}{n^2} \sum_{i,j=1}^n \mathbb{E}\Bigl( K_h(x - X_i)K_h(x - X_j) \Bigr) \end{equation*}\]
- Let \(n=3\). Write out all terms in the sum explicitly and identify which terms involve the only one random variable \(X_i\) and which involve more than one random variable.
For \(n=3\), expanding the double sum gives 9 terms:
Terms with same random variable (\(i=j\)): \[\begin{align*} & \mathbb{E}(K_h(x - X_1)K_h(x - X_1)) \\ & \mathbb{E}(K_h(x - X_2)K_h(x - X_2)) \\ & \mathbb{E}(K_h(x - X_3)K_h(x - X_3)) \end{align*}\]
Terms with different random variables (\(i\neq j\)): \[\begin{align*} & \mathbb{E}(K_h(x - X_1)K_h(x - X_2)) \\ & \mathbb{E}(K_h(x - X_1)K_h(x - X_3)) \\ & \mathbb{E}(K_h(x - X_2)K_h(x - X_1)) \\ & \mathbb{E}(K_h(x - X_2)K_h(x - X_3)) \\ & \mathbb{E}(K_h(x - X_3)K_h(x - X_1)) \\ & \mathbb{E}(K_h(x - X_3)K_h(x - X_2)) \end{align*}\]
- For general \(n\), how many terms in the sum have \(i=j\) and how many have \(i\neq j\)?
When \(i=j\), we are choosing the same index from \(\{1,\ldots,n\}\) once, giving \(n\) terms. When \(i\neq j\), we are choosing two different indices from \(\{1,\ldots,n\}\), giving \(n(n-1)\) terms. The total number of terms is thus \(n + n(n-1) = n^2\), as expected.
- Using these counts and the independence of the \(X_i\), show that: \[\begin{equation*} \mathop{\mathrm{Var}}(\hat f_h(x)) = \frac{1}{n} \mathbb{E}\Bigl( K_h(x - X_1)^2 \Bigr) + \frac{n-1}{n} \mathbb{E}\Bigl( \hat f_h(x) \Bigr)^2. \end{equation*}\]
Using part b, we can split the sum into two parts: \[\begin{align*} \mathop{\mathrm{Var}}(\hat f_h(x)) &= \frac{1}{n^2} \Bigl( n \mathbb{E}\bigl( K_h(x - X_1)^2 \bigr) + n(n-1) \mathbb{E}\bigl( K_h(x - X_1)K_h(x - X_2) \bigr) \Bigr) \\ &= \frac{1}{n} \mathbb{E}\bigl( K_h(x - X_1)^2 \bigr) + \frac{n-1}{n} \mathbb{E}\bigl( K_h(x - X_1) \bigr) \, \mathbb{E}\bigl( K_h(x - X_2) \bigr), \end{align*}\] where we used the independence of the \(X_i\) in the last step. Since \(\mathbb{E}\bigl( K_h(x - X_1) \bigr) = \mathbb{E}\bigl( K_h(x - X_2) \bigr) = \mathbb{E}(\hat f_h(x))\), this gives the required result.
4. Consider the following data:
- Determine the sample standard deviation of these data.
- Using the “plug-in rule” from section 4.3, what bandwidth would you choose for a kernel density estimate of these data using the triangular kernel?
The plug-in rule assumes that the density is normal, to get \[\begin{equation*} R(f'') = \frac{3}{8\sigma^5\sqrt{\pi}}. \end{equation*}\] Substituting the sample standard deviation into this formula gives
## [1] 1.954895e-05
We also need the sample size, and the roughness and second moment of the kernel:
Using these quantities and the formula from section 4.1 gives the result:
## [1] 9.607254