Problem Sheet 1

This problem sheet is for self-study only. It is not assessed.

1. Consider the following function: \[\begin{equation*} K(x) = \begin{cases} \frac23 (1 - |x|^3) & \mbox{if $|x|\leq 1$, and} \\ 0 & \text{otherwise}. \end{cases} \end{equation*}\]

Show that this function integrates to 1 over its domain.

We have \[\begin{align*} \int_{-\infty}^{\infty} K(x) , dx &= \frac23 \int_{-1}^1 (1 - |x|^3) , dx \\ &= \frac43 \int_0^1 (1 - x^3) , dx \\ &= \frac43 \left[ x - \frac{x^4}{4} \right]_0^1 \\ &= \frac43 \left( 1 - \frac{1}{4} \right) \\ &= 1. \end{align*}\]

Show that $K$ satisfies the conditions of a kernel.

We have already seen that $K$ integrates to 1 over its domain. Since $|x| = |-x|$ for all $x\in\mathbb{R}$, the function $K$ is symmetric. Finally, since $|x|^3 \leq 1$ for all $x\in\mathbb{R}$ with $|x|\leq 1$, we have $K(x) \geq 0$ for all $x\in\mathbb{R}$.

Compute the moments $\mu_0(K)$, $\mu_1(K)$ and $\mu_2(K)$ of $K$.

The $k$th moment of $K$ is given by \[\begin{equation*} \mu_k(K) = \int_{-\infty}^\infty x^k K(x) \,dx = \frac23 \int_{-1}^1 x^k \bigl( 1 - |x|^3 \bigr) \,dx. \end{equation*}\] For $k = 0$, we know $\mu_0(K) = 1$, from part a. For $k = 1$, we have $\mu_1(K) = 0$, since $K$ is symmetric. For $k = 2$, we find \[\begin{align*} \mu_2(K) &= \frac23 \int_{-1}^1 x^2 (1 - |x|^3) \,dx \\ &= \frac43 \int_0^1 x^2 (1 - x^3) \,dx \\ &= \frac43 \left[ \frac{x^3}{3} - \frac{x^6}{6} \right]_0^1 \\ &= \frac43 \left( \frac13 - \frac16 \right) \\ &= \frac29. \end{align*}\]

2. Consider a normal density with mean μ and variance σ², given by: \[\begin{equation*} f(x) = \frac{1}{\sqrt{2\pi\sigma^2}} \exp\left(-\frac{(x-\mu)^2}{2\sigma^2}\right). \end{equation*}\]

Calculate f’(x) and f’’(x) for this density.

Using the chain rule: \[\begin{align*} f'(x) &= \frac{1}{\sqrt{2\pi\sigma^2}} \exp\left(-\frac{(x-\mu)^2}{2\sigma^2}\right) \cdot \left(-\frac{x-\mu}{\sigma^2}\right) \\ &= -f(x)\cdot\frac{x-\mu}{\sigma^2} \end{align*}\] Taking derivatives again, using the product rule: \[\begin{align*} f''(x) &= -f'(x)\cdot\frac{x-\mu}{\sigma^2} - f(x)\cdot\frac{1}{\sigma^2} \\ &= f(x)\cdot\frac{(x-\mu)^2}{\sigma^4} - f(x)\cdot\frac{1}{\sigma^2} \\ &= f(x)\cdot\left(\frac{(x-\mu)^2}{\sigma^4} - \frac{1}{\sigma^2}\right) \end{align*}\]

Using the formula \[\begin{equation*} \mathop{\mathrm{bias}}(\hat f_h(x)) \approx \frac{\mu_2(K)}{2} f''(x) h^2, \end{equation*}\] show that for fixed $K$ and $h$, the bias satisfies the following proportionality: \[\begin{equation*} \mathop{\mathrm{bias}}(\hat f_h(x)) \propto \exp\left(-\frac{(x-\mu)^2}{2\sigma^2}\right) \left(\frac{(x-\mu)^2}{\sigma^4} - \frac{1}{\sigma^2}\right). \end{equation*}\]

Substituting our expression for $f''(x)$ into the bias formula we get \[\begin{align*} \mathop{\mathrm{bias}}(\hat f_h(x)) &\approx \frac{\mu_2(K)}{2} f''(x) h^2 \\ &= \frac{\mu_2(K)}{2} \cdot f(x)\cdot\left(\frac{(x-\mu)^2}{\sigma^4} - \frac{1}{\sigma^2}\right) h^2 \\ &= \frac{\mu_2(K)h^2}{2\sqrt{2\pi\sigma^2}} \cdot \exp\left(-\frac{(x-\mu)^2}{2\sigma^2}\right) \left(\frac{(x-\mu)^2}{\sigma^4} - \frac{1}{\sigma^2}\right). \end{align*}\]

For which values of $x$ is the bias positive, negative, or zero? Why does this make sense intuitively?

The exponential term is always positive, so the sign depends on the sign of the term $(x-\mu)^2 / \sigma^4 - 1/\sigma^2$. This equals zero when $(x-\mu)^2 = \sigma^2$, or when $x = \mu \pm \sigma$. The bias is negative when $|x-\mu| < \sigma$, is positive when $|x-\mu| > \sigma$.

This makes sense because the positive bandwidth $h$ smoothes the density, so the bias is negative when the density is concave (near the maxiumum) and positive when it is convex (towards the tails).

3. Consider the variance of a kernel density estimate $\hat f_h(x)$ based on a sample $X_1, \ldots, X_n$ with common density $f$: \[\begin{equation*} \mathop{\mathrm{Var}}(\hat f_h(x)) = \frac{1}{n^2} \sum_{i,j=1}^n \mathbb{E}\Bigl( K_h(x - X_i)K_h(x - X_j) \Bigr) \end{equation*}\]

Let $n=3$. Write out all terms in the sum explicitly and identify which terms involve the only one random variable $X_i$ and which involve more than one random variable.

For $n=3$, expanding the double sum gives 9 terms:

Terms with same random variable ($i=j$): \[\begin{align*} & \mathbb{E}(K_h(x - X_1)K_h(x - X_1)) \\ & \mathbb{E}(K_h(x - X_2)K_h(x - X_2)) \\ & \mathbb{E}(K_h(x - X_3)K_h(x - X_3)) \end{align*}\]

Terms with different random variables ($i\neq j$): \[\begin{align*} & \mathbb{E}(K_h(x - X_1)K_h(x - X_2)) \\ & \mathbb{E}(K_h(x - X_1)K_h(x - X_3)) \\ & \mathbb{E}(K_h(x - X_2)K_h(x - X_1)) \\ & \mathbb{E}(K_h(x - X_2)K_h(x - X_3)) \\ & \mathbb{E}(K_h(x - X_3)K_h(x - X_1)) \\ & \mathbb{E}(K_h(x - X_3)K_h(x - X_2)) \end{align*}\]

For general $n$, how many terms in the sum have $i=j$ and how many have $i\neq j$?

When $i=j$, we are choosing the same index from $\{1,\ldots,n\}$ once, giving $n$ terms. When $i\neq j$, we are choosing two different indices from $\{1,\ldots,n\}$, giving $n(n-1)$ terms. The total number of terms is thus $n + n(n-1) = n^2$, as expected.

Using these counts and the independence of the $X_i$, show that: \[\begin{equation*} \mathop{\mathrm{Var}}(\hat f_h(x)) = \frac{1}{n} \mathbb{E}\Bigl( K_h(x - X_1)^2 \Bigr) + \frac{n-1}{n} \mathbb{E}\Bigl( \hat f_h(x) \Bigr)^2. \end{equation*}\]

Using part b, we can split the sum into two parts: \[\begin{align*} \mathop{\mathrm{Var}}(\hat f_h(x)) &= \frac{1}{n^2} \Bigl( n \mathbb{E}\bigl( K_h(x - X_1)^2 \bigr) + n(n-1) \mathbb{E}\bigl( K_h(x - X_1)K_h(x - X_2) \bigr) \Bigr) \\ &= \frac{1}{n} \mathbb{E}\bigl( K_h(x - X_1)^2 \bigr) + \frac{n-1}{n} \mathbb{E}\bigl( K_h(x - X_1) \bigr) \, \mathbb{E}\bigl( K_h(x - X_2) \bigr), \end{align*}\] where we used the independence of the $X_i$ in the last step. Since $\mathbb{E}\bigl( K_h(x - X_1) \bigr) = \mathbb{E}\bigl( K_h(x - X_2) \bigr) = \mathbb{E}(\hat f_h(x))$, this gives the required result.

4. Consider the following data:

x <- c(89.6, 82.5, 70.9, 83.8, 92.4, 86.5, 77.3, 89.2,
       93.1, 84.7, 78.5, 88.3, 85.6, 90.4, 76.8)

Determine the sample standard deviation of these data.

sigma <- sd(x)
sigma

## [1] 6.410126

The R ouputs shows that the standard deviation is $6.410126$.

Using the “plug-in rule” from section 4.3, what bandwidth would you choose for a kernel density estimate of these data using the triangular kernel?

The plug-in rule assumes that the density is normal, to get \[\begin{equation*} R(f'') = \frac{3}{8\sigma^5\sqrt{\pi}}. \end{equation*}\] Substituting the sample standard deviation into this formula gives

R.fpp <- 3 / (8 * sigma^5 * sqrt(pi))
R.fpp

## [1] 1.954895e-05

We also need the sample size, and the roughness and second moment of the kernel:

n <- length(x)
R.K <- 2/3
mu2.K <- 1/6

Using these quantities and the formula from section 4.1 gives the result:

h <- (R.K / (n * mu2.K^2 * R.fpp))^(1/5)
h

## [1] 9.607254