Unconditional case

KDE

class kdelearn.kde.KDE(kernel_name: str = 'gaussian')[source]

Kernel density estimator with product kernel:

\[\hat{f}(x) = \sum_{i=1}^m w_{i} \prod_{j=i}^n \frac{1}{h_j} K \left( \frac{x_{j} - x_{i, j}}{h_j} \right), \quad x \in \mathbb{R}^n\]

Read more here.

Parameters: kernel_name ({'gaussian', 'uniform', 'epanechnikov', 'cauchy'}, default='gaussian') – Name of kernel function.

Examples

>>> # Prepare data
>>> m_train, n = 100, 1
>>> x_train = np.random.normal(0, 1, size=(m_train, n))
>>> # Fit
>>> kde = KDE("gaussian").fit(x_train)

References

[1] Silverman, B. W. Density Estimation for Statistics and Data Analysis. Chapman and Hall, 1986.

[2] Wand, M. P., Jones M.C. Kernel Smoothing. Chapman and Hall, 1995.

Methods

`fit`(x_train[, weights_train, bandwidth, ...])	Fit the estimator.
`pdf`(x_test)	Compute probability density.
`sample`()

fit(x_train: ndarray, weights_train: Optional[ndarray] = None, bandwidth: Optional[ndarray] = None, bandwidth_method: str = 'normal_reference', **kwargs)[source]

Fit the estimator.

Parameters

x_train (ndarray of shape (m_train, n)) – Array containing data points with float type for constructing the estimator.
weights_train (ndarray of shape (m_train,), optional) – Weights of data points. If None, all data points are equally weighted.
bandwidth (ndarray of shape (n,), optional) – Smoothing parameter for scaling the estimator. If None, bandwidth_method is used to compute the bandwidth.
bandwidth_method ({'normal_reference', 'direct_plugin'}, default='normal_reference') – Name of bandwidth selection method used to compute bandwidth when it is not given explicitly.

Returns

self – Fitted self instance of KDE.

Return type

object

Examples

>>> # Prepare data
>>> m_train, n = 100, 1
>>> x_train = np.random.normal(0, 1, size=(m_train, n))
>>> weights_train = np.full((m_train,), 1 / m_train)
>>> bandwidth = np.full((n,), 1.0)
>>> # Fit the estimator
>>> kde = KDE().fit(x_train, weights_train, bandwidth)

pdf(x_test: ndarray) → ndarray[source]

Compute probability density.

Parameters: x_test (ndarray of shape (m_test, n)) – Argument of the estimator - array containing data points with float type.
Returns: scores – Computed estimation of probability densities for testing data points x_test.
Return type: ndarray of shape (m_test,)

Examples

>>> # Prepare data
>>> m_train, n = 100, 1
>>> m_test = 10
>>> x_train = np.random.normal(0, 1, (m_train, n))
>>> x_test = np.linspace(-3, 3, 10).reshape(-1, 1)
>>> # Fit the estimator
>>> kde = KDE().fit(x_train)
>>> # Compute pdf
>>> scores = kde.pdf(x_test)  # shape of scores: (10,)

KDEClassification

class kdelearn.kde_tasks.KDEClassification(kernel_name: str = 'gaussian')[source]

Bayes’ classifier based on kernel density estimation.

Probability that \(x\) belongs to class \(c\):

\[P(C=c|X=x) \propto \pi_c \hat{f}_c(X=x)\]

To predict class label for \(x\) we need to take class \(c\) with the highest probability:

\[\underset{c}{\mathrm{argmax}} \quad P(C=c|X=x)\]

Read more here.

Parameters: kernel_name ({'gaussian', 'uniform', 'epanechnikov', 'cauchy'}, default='gaussian') – Name of kernel function.

Examples

>>> # Prepare data for two classes
>>> m_train, n = 100, 1
>>> x_train1 = np.random.normal(0, 1, size=(m_train // 2, n))
>>> labels_train1 = np.full(m_train // 2, 1)
>>> x_train2 = np.random.normal(3, 1, size=(m_train // 2, n))
>>> labels_train2 = np.full(m_train // 2, 2)
>>> x_train = np.concatenate((x_train1, x_train2))
>>> labels_train = np.concatenate((labels_train1, labels_train2))
>>> # Fit
>>> classifier = KDEClassification("gaussian").fit(x_train, labels_train)

References

[1] Silverman, B. W. Density Estimation for Statistics and Data Analysis. Chapman and Hall, 1986.

Methods

`fit`(x_train, labels_train[, weights_train, ...])	Fit the classifier.
`pdfs`(x_test)	Compute pdf of each class.
`predict`(x_test)	Predict class labels.

fit(x_train: ndarray, labels_train: ndarray, weights_train: Optional[ndarray] = None, bandwidths: Optional[ndarray] = None, bandwidth_method: str = 'normal_reference', share_bandwidth: bool = False, prior_prob: Optional[ndarray] = None, **kwargs)[source]

Fit the classifier.

Parameters

x_train (ndarray of shape (m_train, n)) – Array containing data points with float type for constructing the classifier.
labels_train (ndarray of shape (m_train,)) – Class labels of x_train containing data with int type.
weights_train (ndarray of shape (m_train,), default=None) – Weights for data points. If None, all data points are equally weighted.
bandwidths (ndarray of shape (n_classes, n), optional) – Smoothing parameters for scaling the estimators of each class. If None, bandwidth_method is used to compute the bandwidth.
bandwidth_method ({'normal_reference', 'direct_plugin'}, default='normal_reference') – Name of bandwidth selection method used to compute bandwidths when it is not given explicitly.
share_bandwidth (bool, default=False) – Determines whether all classes should have common bandwidth. If False, estimator of each class gets its own bandwidth.
prior_prob (ndarray of shape (n_classes,), default=None) – Prior probabilities of each class. If None, all classes are equally probable.

Returns

self – Fitted self instance of KDEClassification.

Return type

object

Examples

>>> # Prepare data for two classes
>>> m_train, n = 100, 1
>>> x_train1 = np.random.normal(0, 1, size=(m_train // 2, n))
>>> labels_train1 = np.full((m_train // 2,), 1)
>>> x_train2 = np.random.normal(3, 1, size=(m_train // 2, n))
>>> labels_train2 = np.full((m_train // 2,), 2)
>>> x_train = np.concatenate((x_train1, x_train2))
>>> labels_train = np.concatenate((labels_train1, labels_train2))
>>> weights_train = np.full((m_train,), 1 / m_train)
>>> # Fit
>>> prior_prob = np.array([0.3, 0.7])
>>> params = (x_train, labels_train, weights_train)
>>> classifier = KDEClassification().fit(*params, prior_prob=prior_prob)

predict(x_test: ndarray) → ndarray[source]

Predict class labels.

Parameters: x_test (ndarray of shape (m_test, n)) – Data points to classify - array containing data points with float type.
Returns: labels_pred – Predicted class labels containing data with int type.
Return type: ndarray of shape (m_test,)

Examples

>>> # Prepare data for two classes
>>> m_train, n = 100, 1
>>> m_test = 10
>>> x_train1 = np.random.normal(0, 1, size=(m_train // 2, n))
>>> labels_train1 = np.full(m_train // 2, 1)
>>> x_train2 = np.random.normal(3, 1, size=(m_train // 2, n))
>>> labels_train2 = np.full(m_train // 2, 2)
>>> x_train = np.concatenate((x_train1, x_train2))
>>> labels_train = np.concatenate((labels_train1, labels_train2))
>>> # Fit the classifier
>>> x_test = np.linspace(-3, 6, m_test).reshape(-1, 1)
>>> classifier = KDEClassification().fit(x_train, labels_train)
>>> # Predict labels
>>> labels_pred = classifier.predict(x_test)  # shape: (10,)

pdfs(x_test: ndarray) → ndarray[source]

Compute pdf of each class.

Parameters: x_test (ndarray of shape (m_test, n)) – Argument of each class estimator - array containing data points with float type.
Returns: scores – Predicted scores as an array containing data with float type.
Return type: ndarray of shape (m_test, n_classes)

Examples

>>> # Prepare data for two classes
>>> m_train, n = 100, 1
>>> m_test = 10
>>> x_train1 = np.random.normal(0, 1, size=(m_train // 2, n))
>>> labels_train1 = np.full(m_train // 2, 1)
>>> x_train2 = np.random.normal(3, 1, size=(m_train // 2, n))
>>> labels_train2 = np.full(m_train // 2, 2)
>>> x_train = np.concatenate((x_train1, x_train2))
>>> labels_train = np.concatenate((labels_train1, labels_train2))
>>> # Fit the classifier
>>> x_test = np.linspace(-3, 6, m_test).reshape(-1, 1)
>>> classifier = KDEClassification().fit(x_train, labels_train)
>>> # Compute pdf of each class
>>> scores = classifier.pdfs(x_test)  # shape: (10, 2)

KDEOutliersDetection

class kdelearn.kde_tasks.KDEOutliersDetection(kernel_name: str = 'gaussian')[source]

Outliers detectoion based on kernel density estimation.

Read more here.

Parameters: kernel_name ({'gaussian', 'uniform', 'epanechnikov', 'cauchy'}, default='gaussian') – Name of kernel function.

Examples

>>> # Prepare data
>>> m_train, n = 100, 1
>>> x_train = np.random.normal(0, 1, size=(m_train, n))
>>> # Fit the outliers detector
>>> outliers_detector = KDEOutliersDetection("gaussian").fit(x_train)

Methods

`fit`(x_train[, weights_train, bandwidth, ...])	Fit the outliers detector.
`predict`(x_test)	Predict labels.

fit(x_train: ndarray, weights_train: Optional[ndarray] = None, bandwidth: Optional[ndarray] = None, bandwidth_method: str = 'normal_reference', r: float = 0.1, **kwargs)[source]

Fit the outliers detector.

Parameters

x_train (ndarray of shape (m_train, n)) – Array containing data points with float type for constructing the detector.
weights_train (ndarray of shape (m_train,), default=None) – Weights of data points. If None, all data points are equally weighted.
bandwidth (ndarray of shape (n,), optional) – Smoothing parameter for scaling the estimator. If None, bandwidth_method is used to compute the bandwidth.
bandwidth_method ({'normal_reference', 'direct_plugin'}, default='normal_reference') – Name of bandwidth selection method used to compute bandwidth when it is not given explicitly.
r (float, default=0.1) – Threshold separating outliers and inliers.

Returns

self – Fitted self instance of KDEOutliersDetection.

Return type

object

Examples

>>> # Prepare data
>>> m_train, n = 100, 1
>>> x_train = np.random.normal(0, 1, size=(m_train, n))
>>> weights_train = np.full((m_train,), 1 / m_train)
>>> # Fit the outliers detector
>>> params = (x_train, weights_train)
>>> outliers_detector = KDEOutliersDetection().fit(*params, r=0.1)

predict(x_test: ndarray) → ndarray[source]

Predict labels.

Parameters: x_test (ndarray of shape (m_test, n)) – Argument of the detector - array containing data points with float type.
Returns: labels_pred – Predicted labels (0 - inlier, 1 - outlier) containing data with int type.
Return type: ndarray of shape (m_test,)

Examples

>>> # Prepare data
>>> m_train, n = 100, 1
>>> m_test = 10
>>> x_train = np.random.normal(0, 1, size=(m_train, n))
>>> x_test = np.linspace(-3, 3, m_test).reshape(-1, 1)
>>> # Fit the outliers detector
>>> outliers_detector = KDEOutliersDetection().fit(x_train, r=0.1)
>>> # Predict the labels
>>> labels_pred = outliers_detector.predict(x_test)  # shape: (10,)

KDEClustering

class kdelearn.kde_tasks.KDEClustering[source]

Clustering based on kernel density estimation.

Read more here.

Examples

>>> # Prepare data for two clusters
>>> m_train, n = 100, 1
>>> x_train1 = np.random.normal(0, 1, size=(m_train // 2, n))
>>> x_train2 = np.random.normal(3, 1, size=(m_train // 2, n))
>>> x_train = np.concatenate((x_train1, x_train2))
>>> # Fit
>>> clustering = KDEClustering().fit(x_train)

Methods

`fit`(x_train[, weights_train, bandwidth, ...])	Fit the model.
`predict`(x_test[, algorithm, epsilon, delta])	Predict cluster labels.

fit(x_train: ndarray, weights_train: Optional[ndarray] = None, bandwidth: Optional[ndarray] = None, bandwidth_method: str = 'normal_reference', **kwargs)[source]

Fit the model.

Parameters

x_train (ndarray of shape (m_train, n)) – Array containing data points with float type for constructing the model.
weights_train (ndarray of shape (m_train,), optional) – Weights of data points. If None, all data points are equally weighted.
bandwidth (ndarray of shape (n,), optional) – Smoothing parameter for scaling the estimator. If None, bandwidth_method is used to compute the bandwidth.
bandwidth_method ({'normal_reference', 'direct_plugin'}, default='normal_reference') – Name of bandwidth selection method used to compute bandwidth when it is not given explicitly.

Returns

self – Fitted self instance of KDEClustering.

Return type

object

Examples

>>> # Prepare data for two clusters
>>> m_train, n = 100, 1
>>> x_train1 = np.random.normal(0, 1, size=(m_train // 2, n))
>>> x_train2 = np.random.normal(3, 1, size=(m_train // 2, n))
>>> x_train = np.concatenate((x_train1, x_train2))
>>> weights_train = np.full((m_train,), 1 / m_train)
>>> # Fit
>>> clustering = KDEClustering().fit(x_train, weights_train)

predict(x_test: ndarray, algorithm: str = 'mean_shift', epsilon: float = 1e-08, delta: float = 0.001)[source]

Predict cluster labels.

Parameters

x_test (ndarray of shape (m_test, n)) – Data points to be grouped - array containing data points with float type.
algorithm ({'gradient_ascent', 'mean_shift'}, default='mean_shift') – Name of clustering algorithm.
epsilon (float, default=1e-8) – Threshold for difference (euclidean distance) of data point position while shifting. When the difference is less than epsilon, data point is no longer shifted.
delta (float, default=1e-3) – Acceptance error (euclidean distance) between shifted data point and representative of cluster. If the error is less than delta, data point is assigned to cluster represented by cluster representative.

Returns

labels_pred – Predicted cluster labels containing data with int type.

Return type

ndarray of shape (m_train,)

Examples

>>> # Prepare data for two clusters
>>> m_train, n = 100, 1
>>> x_train1 = np.random.normal(0, 1, size=(m_train // 2, n))
>>> x_train2 = np.random.normal(3, 1, size=(m_train // 2, n))
>>> x_train = np.concatenate((x_train1, x_train2))
>>> # Fit
>>> clustering = KDEClustering().fit(x_train)
>>> labels_pred = clustering.predict(x_train)