Kernel density estimation

Kernel density estimation is a method for non-parametric density estimation.

On this page you can read about its unconditional (standard) and conditional form.

Unconditional case

Formula of unconditional kernel density estimation with product kernel:

\[\hat{f}(x) = \sum_{i=1}^m w_{i} \prod_{j=i}^n \frac{1}{h_j} K \left( \frac{x_{j} - x_{i, j}}{h_j} \right) \text{,} \quad x \in \mathbb{R}^n\]
  • \(m\) - size of dataset

  • \(w\) - weights of dataset

  • \(h\) - bandwidth (smoothing parameter)

  • \(n\) - dimensionality

  • \(K(x)\) - kernel function

Check available kernels.

Example of constructing kernel density estimation on small dataset (\(m=9\)) with gaussian kernel:

_images/kde_construction.png

Kernels

There are four available kernel functions. See formulas and plot below:

Formulas of available kernel functions

Kernel name

Formula

Gaussian

\(\frac{1}{\sqrt{2 \pi}} \exp \left( \frac{x^2}{2} \right)\)

Uniform

\(0.5 \quad \text{if } |x| \leq 1 \quad \text{otherwise } 0\)

Epanechnikov

\(\frac{3}{4} (1-x^2) \quad \text{if } |x| \leq 1 \quad \text{otherwise } 0\)

Cauchy

\(\frac{2}{\pi (x^2 + 1)^2}\)

_images/kernels.png

Weighted data

Example of constructing kernel density estimation with weighted data points.

Notice that the rightmost data points have more impact on estimated density than others.

_images/kde_construction_weighted.png

Bandwidth selection

There are four available bandwidth selection methods:

  • normal reference

  • direct plugin

  • solve-the-equation plugin

  • maximum likelihood cross-validation

Illustration of kernel density estimations with different bandwidth selection methods computed on data drawn from gaussian mixture (blue curve):

_images/bandwidth_selection.png

Conditional case