Conditional case

CKDE

class kdelearn.ckde.CKDE(kernel_name: str = 'gaussian')[source]

Conditional kernel density estimator with product kernel:

TODO: <MATH FORMULA and READ MORE and REFERENCES>

Parameters:

kernel_name ({'gaussian', 'uniform', 'epanechnikov', 'cauchy'}, default='gaussian') – Name of kernel function.

Examples

>>> # Prepare data
>>> m_train = 100
>>> n_x, n_y = 1, 1
>>> x_train = np.random.normal(0, 1, size=(m_train, n_x))
>>> y_train = np.random.normal(0, 1, size=(m_train, n_y))
>>> y_star = np.array([0.0] * n_y)
>>> # Fit
>>> ckde = CKDE("gaussian").fit(x_train, y_train, y_star)

Methods

fit(x_train, y_train, y_star[, ...])

Fit the estimator.

pdf(x_test)

Compute conditional probability density function.

fit(x_train: ndarray, y_train: ndarray, y_star: ndarray, weights_train: ndarray | None = None, bandwidth_x: ndarray | None = None, bandwidth_y: ndarray | None = None, bandwidth_method: str = 'direct_plugin', **kwargs)[source]

Fit the estimator.

Parameters:
  • x_train (ndarray of shape (m_train, n_x)) – Data points (describing variables) as an array containing data with float type.

  • y_train (ndarray of shape (m_train, n_y)) – Data points (conditioning variables) as an array containing data with float type.

  • y_star (ndarray of shape (n_y,)) – Conditioned value.

  • weights_train (ndarray of shape (m_train,), optional) – Weights of data points. If None, all points are equally weighted.

  • bandwidth_x (ndarray of shape (n_x,), optional) – Smoothing parameter of describing variables.

  • bandwidth_y (ndarray of shape (n_y,), optional) – Smoothing parameter of conditioning variables.

  • bandwidth_method ({'normal_reference', 'direct_plugin', 'ste_plugin', 'ml_cv'}, default='normal_reference') – Name of bandwidth selection method used to compute smoothing parameter when bandwidth is not given explicitly.

Returns:

self – Fitted self instance of CKDE.

Return type:

object

Examples

>>> # Prepare data
>>> m_train = 100
>>> n_x, n_y = 1, 1
>>> x_train = np.random.normal(0, 1, size=(m_train, n_x))
>>> y_train = np.random.normal(0, 1, size=(m_train, n_y))
>>> weights_train = np.random.randint(1, 10, size=(m_train,))
>>> y_star = np.array([0.0] * n_y)
>>> bandwidth_x = np.array([0.5] * n_x)
>>> bandwidth_y = np.array([0.5] * n_y)
>>> # Fit the estimator
>>> params = (x_train, y_train, y_star, weights_train, bandwidth_x, bandwidth_y)
>>> ckde = CKDE().fit(*params)
pdf(x_test: ndarray) Tuple[ndarray, ndarray][source]

Compute conditional probability density function.

Parameters:

x_test (ndarray of shape (m_test, n_x)) – Grid data points (describing variables) as an array containing data with float type.

Returns:

  • scores (ndarray of shape (m_test,)) – Values of kernel density estimator.

  • cond_weights_train (ndarray of shape (m_train,)) – TODO: complete !!!!!!!!

Examples

>>> # Prepare data
>>> m_train = 100
>>> n_x, n_y = 1, 1
>>> m_test = 10
>>> x_train = np.random.normal(0, 1, (m_train, n_x))
>>> y_train = np.random.normal(0, 1, (m_train, n_y))
>>> y_star = np.array([0.0] * n_y)
>>> x_test = np.random.uniform(-3, 3, (m_test, n_x))
>>> # Fit the estimator.
>>> ckde = CKDE().fit(x_train, y_train, y_star)
>>> # Compute pdf
>>> scores = ckde.pdf(x_test)  # scores shape (10,)

CKDEClassification

class kdelearn.ckde_tasks.CKDEClassification(kernel_name: str = 'gaussian')[source]

Bayes’ classifier based on conditional kernel density estimation.

TODO: <MATH FORMULA and READ MORE and REFERENCES>

Parameters:

kernel_name ({'gaussian', 'uniform', 'epanechnikov', 'cauchy'}, default='gaussian') – Name of kernel function.

Examples

>>> # Prepare data for two classes
>>> m_train = 100
>>> n_x, n_y = 1, 1
>>> x_train1 = np.random.normal(0, 1, size=(m_train // 2, n_x))
>>> y_train1 = np.random.normal(0, 1, size=(m_train // 2, n_y))
>>> labels_train1 = np.full(m_train // 2, 1)
>>> x_train2 = np.random.normal(3, 1, size=(m_train // 2, n_x))
>>> y_train2 = np.random.normal(0, 1, size=(m_train // 2, n_y))
>>> labels_train2 = np.full(m_train // 2, 2)
>>> x_train = np.concatenate((x_train1, x_train2))
>>> y_train = np.concatenate((y_train1, y_train2))
>>> labels_train = np.concatenate((labels_train1, labels_train2))
>>> y_star = np.array([0.0] * n_y)
>>> # Fit
>>> classifier = CKDEClassification().fit(x_train, y_train, y_star, labels_train)

Methods

fit(x_train, y_train, y_star, labels_train)

Fit the classifier.

pdfs(x_test)

Compute pdf of each class.

predict(x_test)

Predict class labels.

fit(x_train: ndarray, y_train: ndarray, y_star: ndarray, labels_train: ndarray, weights_train: ndarray | None = None, share_bandwidth: bool = False, bandwidths_x: ndarray | None = None, bandwidths_y: ndarray | None = None, bandwidth_method: str = 'direct_plugin', prior_prob: ndarray | None = None, **kwargs)[source]

Fit the classifier.

Parameters:
  • x_train (ndarray of shape (m_train, n_x)) – Data points (describing variables) as an array containing data with float type.

  • y_train (ndarray of shape (m_train, n_y)) – Data points (conditioning variables) as an array containing data with float type.

  • y_star (ndarray of shape (n_y,)) – Conditioned value.

  • labels_train (ndarray of shape (m_train,)) – Labels of data points as an array containing data with int type.

  • weights_train (ndarray of shape (m_train,), default=None) – Weights for data points. If None, all points are equally weighted.

  • share_bandwidth (bool, default=False) – Determines whether all classes should have common bandwidth. If False, estimator of each class gets its own bandwidth.

  • bandwidths_x (ndarray of shape (n_classes, n_x), optional) – Smoothing parameter of describing variables for each class.

  • bandwidths_y (ndarray of shape (n_classes, n_y), optional) – Smoothing parameter of conditioning variables for each class.

  • bandwidth_method ({'normal_reference', 'direct_plugin', 'ste_plugin', 'ml_cv'}, default='normal_reference') – Name of bandwidth selection method used to compute smoothing parameter.

  • prior_prob (ndarray of shape (n_classes,), default=None) – Prior probabilities of each class. If None, all classes are equally probable.

Returns:

self – Fitted self instance of CKDEClassification.

Return type:

object

Examples

>>> # Prepare data for two classes
>>> m_train = 100
>>> n_x, n_y = 1, 1
>>> x_train1 = np.random.normal(0, 1, size=(m_train // 2, n_x))
>>> y_train1 = np.random.normal(0, 1, size=(m_train // 2, n_y))
>>> labels_train1 = np.full(m_train // 2, 1)
>>> x_train2 = np.random.normal(3, 1, size=(m_train // 2, n_x))
>>> y_train2 = np.random.normal(0, 1, size=(m_train // 2, n_y))
>>> labels_train2 = np.full(m_train // 2, 2)
>>> x_train = np.concatenate((x_train1, x_train2))
>>> y_train = np.concatenate((y_train1, y_train2))
>>> labels_train = np.concatenate((labels_train1, labels_train2))
>>> y_star = np.array([0.0] * n_y)
>>> weights_train = np.random.uniform(0, 1, size=(m_train,))
>>> # Fit
>>> prior_prob = np.array([0.3, 0.7])
>>> params = (x_train, y_train, y_star, labels_train, weights_train)
>>> classifier = CKDEClassification().fit(*params, prior_prob=prior_prob)
predict(x_test: ndarray) ndarray[source]

Predict class labels.

Parameters:

x_test (ndarray of shape (m_test, n_x)) – Grid data points (describing variables) as an array containing data with float type.

Returns:

labels_pred – Predicted labels as an array containing data with int type.

Return type:

ndarray of shape (m_test,)

Examples

>>> # Prepare data for two classes
>>> m_train = 100
>>> n_x, n_y = 1, 1
>>> m_test = 10
>>> x_train1 = np.random.normal(0, 1, size=(m_train // 2, n_x))
>>> y_train1 = np.random.normal(0, 1, size=(m_train // 2, n_y))
>>> labels_train1 = np.full(m_train // 2, 1)
>>> x_train2 = np.random.normal(3, 1, size=(m_train // 2, n_x))
>>> y_train2 = np.random.normal(0, 1, size=(m_train // 2, n_y))
>>> labels_train2 = np.full(m_train // 2, 2)
>>> x_train = np.concatenate((x_train1, x_train2))
>>> y_train = np.concatenate((y_train1, y_train2))
>>> labels_train = np.concatenate((labels_train1, labels_train2))
>>> y_star = np.array([0.0] * n_y)
>>> # Fit the classifier
>>> x_test = np.random.uniform(-1, 4, size=(m_test, n_x))
>>> params = (x_train, y_train, y_star, labels_train)
>>> classifier = CKDEClassification().fit(*params)
>>> # Predict labels
>>> labels_pred = classifier.predict(x_test)  # labels_pred shape (10,)
pdfs(x_test: ndarray) ndarray[source]

Compute pdf of each class.

Parameters:

x_test (ndarray of shape (m_test, n_x)) – Grid data points (describing variables) as an array containing data with float type.

Returns:

scores – Predicted scores as an array containing data with float type.

Return type:

ndarray of shape (m_test, n_classes)

Examples

>>> # Prepare data for two classes
>>> m_train = 100
>>> n_x, n_y = 1, 1
>>> m_test = 10
>>> x_train1 = np.random.normal(0, 1, size=(m_train // 2, n_x))
>>> y_train1 = np.random.normal(0, 1, size=(m_train // 2, n_y))
>>> labels_train1 = np.full(m_train // 2, 1)
>>> x_train2 = np.random.normal(3, 1, size=(m_train // 2, n_x))
>>> y_train2 = np.random.normal(0, 1, size=(m_train // 2, n_y))
>>> labels_train2 = np.full(m_train // 2, 2)
>>> x_train = np.concatenate((x_train1, x_train2))
>>> y_train = np.concatenate((y_train1, y_train2))
>>> labels_train = np.concatenate((labels_train1, labels_train2))
>>> y_star = np.array([0.0] * n_y)
>>> # Fit the classifier
>>> x_test = np.random.uniform(-1, 4, size=(m_test, n_x))
>>> params = (x_train, y_train, y_star, labels_train)
>>> classifier = CKDEClassification().fit(*params)
>>> # Compute pdf of each class
>>> scores = classifier.pdfs(x_test)  # scores shape (10, 2)

CKDEOutliersDetection

class kdelearn.ckde_tasks.CKDEOutliersDetection(kernel_name: str = 'gaussian')[source]

Outliers detection based on conditional kernel density estimation.

TODO: <READ MORE>

Parameters:

kernel_name ({'gaussian', 'uniform', 'epanechnikov', 'cauchy'}, default='gaussian') – Name of kernel function.

Examples

>>> # Prepare data
>>> m_train = 100
>>> n_x, n_y = 1, 1
>>> x_train = np.random.normal(0, 1, size=(m_train, n_x))
>>> y_train = np.random.normal(0, 1, size=(m_train, n_y))
>>> y_star = np.array([0.0] * n_y)
>>> # Fit the outliers detector
>>> params = (x_train, y_train, y_star)
>>> outliers_detector = CKDEOutliersDetection("gaussian").fit(*params)

Methods

fit(x_train, y_train, y_star[, ...])

Fit the outliers detector.

predict(x_test)

Predict the labels.

fit(x_train: ndarray, y_train: ndarray, y_star: ndarray, weights_train: ndarray | None = None, bandwidth_x: ndarray | None = None, bandwidth_y: ndarray | None = None, bandwidth_method: str = 'direct_plugin', r: float = 0.05, **kwargs)[source]

Fit the outliers detector.

Parameters:
  • x_train (ndarray of shape (m_train, n_x)) – Data points (describing variables) as an array containing data with float type.

  • y_train (ndarray of shape (m_train, n_y)) – Data points (conditioning variables) as an array containing data with float type.

  • y_star (ndarray of shape (n_y,)) – Conditioned value.

  • weights_train (ndarray of shape (m_train,), default=None) – Weights for data points. If None is passed, all points are equally weighted.

  • bandwidth_x (ndarray of shape (n_x,), optional) – Smoothing parameter of describing variables.

  • bandwidth_y (ndarray of shape (n_y,), optional) – Smoothing parameter of conditioning variables.

  • bandwidth_method ({'normal_reference', 'direct_plugin'}, default='normal_reference') – Name of bandwidth selection method used to compute smoothing parameter when bandwidth is not given explicitly.

  • r (float, default=0.05) – Threshold separating outliers and inliers.

Returns:

self – Fitted self instance of CKDEOutliersDetection.

Return type:

object

Examples

>>> # Prepare data
>>> m_train = 100
>>> n_x, n_y = 1, 1
>>> x_train = np.random.normal(0, 1, size=(m_train, n_x))
>>> y_train = np.random.normal(0, 1, size=(m_train, n_y))
>>> y_star = np.array([0.0] * n_y)
>>> weights_train = np.random.uniform(0, 1, size=(m_train,))
>>> # Fit the outliers detector
>>> params = (x_train, y_train, y_star, weights_train)
>>> outliers_detector = CKDEOutliersDetection().fit(*params, r=0.1)
predict(x_test: ndarray) ndarray[source]

Predict the labels.

Parameters:

x_test (ndarray of shape (m_test, n_x)) – Grid data points (describing variables) as a 2D array containing data with float type.

Returns:

labels_pred – Predicted labels (0 - inlier, 1 - outlier) as an array containing data with int type.

Return type:

ndarray of shape (m_test,)

Examples

>>> # Prepare data
>>> m_train = 100
>>> n_x, n_y = 1, 1
>>> m_test = 10
>>> x_train = np.random.normal(0, 1, size=(m_train, n_x))
>>> y_train = np.random.normal(0, 1, size=(m_train, n_y))
>>> y_star = np.array([0.0] * n_y)
>>> x_test = np.random.uniform(-3, 3, size=(m_test, n_x))
>>> # Fit the outliers detector
>>> params = (x_train, y_train, y_star)
>>> outliers_detector = CKDEOutliersDetection().fit(*params, r=0.1)
>>> # Predict the labels
>>> labels_pred = outliers_detector.predict(x_test)  # labels_pred shape (10,)

CKDEClustering

class kdelearn.ckde_tasks.CKDEClustering[source]

Clustering based on conditional kernel density estimation.

TODO: <READ MORE>

Examples

>>> # Prepare data for two clusters
>>> m_train = 100
>>> n_x, n_y = 1, 1
>>> x_train1 = np.random.normal(0, 1, size=(m_train // 2, n_x))
>>> y_train1 = np.random.normal(0, 1, size=(m_train // 2, n_y))
>>> x_train2 = np.random.normal(3, 1, size=(m_train // 2, n_x))
>>> y_train2 = np.random.normal(0, 1, size=(m_train // 2, n_y))
>>> x_train = np.concatenate((x_train1, x_train2))
>>> y_train = np.concatenate((y_train1, y_train2))
>>> y_star = np.array([0.0] * n_y)
>>> # Fit
>>> clustering = CKDEClustering().fit(x_train, y_train, y_star)

Methods

fit(x_train, y_train, y_star[, ...])

Fit the model.

predict(x_test[, algorithm, epsilon, delta])

Predict cluster labels.

fit(x_train: ndarray, y_train: ndarray, y_star: ndarray, weights_train: ndarray | None = None, bandwidth_x: ndarray | None = None, bandwidth_y: ndarray | None = None, bandwidth_method: str = 'direct_plugin', **kwargs)[source]

Fit the model.

Parameters:
  • x_train (ndarray of shape (m_train, n_x)) – Data points (describing variables) as an array containing data with float type.

  • y_train (ndarray of shape (m_train, n_y)) – Data points (conditioning variables) as an array containing data with float type.

  • y_star (ndarray of shape (n_y,)) – Conditioned value.

  • weights_train (ndarray of shape (m_train,), optional) – Weights of data points. If None, all points are equally weighted.

  • bandwidth_x (ndarray of shape (n_x,), optional) – Smoothing parameter of describing variables.

  • bandwidth_y (ndarray of shape (n_y,), optional) – Smoothing parameter of conditioning variables.

  • bandwidth_method ({'normal_reference', 'direct_plugin', 'ste_plugin', 'ml_cv'}, default='normal_reference') – Name of bandwidth selection method used to compute smoothing parameter when bandwidth is not given explicitly.

Returns:

self – Fitted self instance of CKDEClustering.

Return type:

object

Examples

>>> # Prepare data for two clusters
>>> m_train = 100
>>> n_x, n_y = 1, 1
>>> x_train1 = np.random.normal(0, 1, size=(m_train // 2, n_x))
>>> y_train1 = np.random.normal(0, 1, size=(m_train // 2, n_y))
>>> x_train2 = np.random.normal(3, 1, size=(m_train // 2, n_x))
>>> y_train2 = np.random.normal(0, 1, size=(m_train // 2, n_y))
>>> x_train = np.concatenate((x_train1, x_train2))
>>> y_train = np.concatenate((y_train1, y_train2))
>>> y_star = np.array([0.0] * n_y)
>>> weights_train = np.random.uniform(0, 1, size=(m_train,))
>>> # Fit
>>> clustering = CKDEClustering().fit(x_train, y_train, y_star, weights_train)
predict(x_test: ndarray, algorithm: str = 'mean_shift', epsilon: float = 1e-08, delta: float = 0.001)[source]

Predict cluster labels.

Parameters:
  • x_test (ndarray of shape (m_test, n_x)) – Grid data points (describing variables) as an array containing data with float type.

  • algorithm ({'gradient_ascent', 'mean_shift'}, default='mean_shift') – Name of clustering algorithm.

  • epsilon (float, default=1e-8) – Threshold for difference (euclidean distance) of data point position while shifting. When the difference is less than epsilon, data point is no longer shifted.

  • delta (float, default=1e-3) – Acceptance error (euclidean distance) between shifted data point and representative of cluster. If the error is less than delta, data point is assigned to cluster represented by cluster representative.

Returns:

labels_pred – Predicted labels as an array containing data with int type.

Return type:

ndarray of shape (m_train,)

Examples

>>> # Prepare data for two clusters
>>> m_train = 100
>>> n_x, n_y = 1, 1
>>> x_train1 = np.random.normal(0, 1, size=(m_train // 2, n_x))
>>> y_train1 = np.random.normal(0, 1, size=(m_train // 2, n_y))
>>> x_train2 = np.random.normal(3, 1, size=(m_train // 2, n_x))
>>> y_train2 = np.random.normal(0, 1, size=(m_train // 2, n_y))
>>> x_train = np.concatenate((x_train1, x_train2))
>>> y_train = np.concatenate((y_train1, y_train2))
>>> y_star = np.array([0.0] * n_y)
>>> # Fit
>>> clustering = CKDEClustering().fit(x_train, y_train, y_star)
>>> labels_pred = clustering.predict()