Conditional case

CKDE

class kdelearn.ckde.CKDE(kernel_name: str = 'gaussian')[source]

Conditional kernel density estimator with product kernel:

TODO: <MATH FORMULA and READ MORE and REFERENCES>

Parameters

kernel_name ({'gaussian', 'uniform', 'epanechnikov', 'cauchy'}, default='gaussian') – Name of kernel function.

Examples

>>> # Prepare data
>>> m_train = 100
>>> n_x, n_y = 1, 1
>>> x_train = np.random.normal(0, 1, size=(m_train, n_x))
>>> y_train = np.random.normal(0, 1, size=(m_train, n_y))
>>> y_star = np.array([0.0] * n_y)
>>> # Fit
>>> ckde = CKDE("gaussian").fit(x_train, y_train, y_star)

Methods

fit(x_train, y_train, y_star[, ...])

Fit the estimator.

pdf(x_test)

Compute conditional probability density function.

fit(x_train: ndarray, y_train: ndarray, y_star: ndarray, weights_train: Optional[ndarray] = None, bandwidth_x: Optional[ndarray] = None, bandwidth_y: Optional[ndarray] = None, bandwidth_method: str = 'normal_reference', **kwargs)[source]

Fit the estimator.

Parameters
  • x_train (ndarray of shape (m_train, n_x)) – Data points (describing variables) as an array containing data with float type.

  • y_train (ndarray of shape (m_train, n_y)) – Data points (conditioning variables) as an array containing data with float type.

  • y_star (ndarray of shape (n_y,)) – Conditioned value.

  • weights_train (ndarray of shape (m_train,), optional) – Weights of data points. If None, all points are equally weighted.

  • bandwidth_x (ndarray of shape (n_x,), optional) – Smoothing parameter of describing variables.

  • bandwidth_y (ndarray of shape (n_y,), optional) – Smoothing parameter of conditioning variables.

  • bandwidth_method ({'normal_reference', 'direct_plugin', 'ste_plugin', 'ml_cv'}, default='normal_reference') – Name of bandwidth selection method used to compute smoothing parameter when bandwidth is not given explicitly.

Returns

self – Fitted self instance of CKDE.

Return type

object

Examples

>>> # Prepare data
>>> m_train = 100
>>> n_x, n_y = 1, 1
>>> x_train = np.random.normal(0, 1, size=(m_train, n_x))
>>> y_train = np.random.normal(0, 1, size=(m_train, n_y))
>>> weights_train = np.random.randint(1, 10, size=(m_train,))
>>> y_star = np.array([0.0] * n_y)
>>> bandwidth_x = np.array([0.5] * n_x)
>>> bandwidth_y = np.array([0.5] * n_y)
>>> # Fit the estimator
>>> params = (x_train, y_train, y_star, weights_train, bandwidth_x, bandwidth_y)
>>> ckde = CKDE().fit(*params)
pdf(x_test: ndarray) Tuple[ndarray, ndarray][source]

Compute conditional probability density function.

Parameters

x_test (ndarray of shape (m_test, n_x)) – Grid data points (describing variables) as an array containing data with float type.

Returns

  • scores (ndarray of shape (m_test,)) – Values of kernel density estimator.

  • cond_weights_train (ndarray of shape (m_train,)) – TODO: complete !!!!!!!!

Examples

>>> # Prepare data
>>> m_train = 100
>>> n_x, n_y = 1, 1
>>> m_test = 10
>>> x_train = np.random.normal(0, 1, (m_train, n_x))
>>> y_train = np.random.normal(0, 1, (m_train, n_y))
>>> y_star = np.array([0.0] * n_y)
>>> x_test = np.random.uniform(-3, 3, (m_test, n_x))
>>> # Fit the estimator.
>>> ckde = CKDE().fit(x_train, y_train, y_star)
>>> # Compute pdf
>>> scores, d = ckde.pdf(x_test)  # scores shape (10,)

CKDEClassification

class kdelearn.ckde_tasks.CKDEClassification(kernel_name: str = 'gaussian')[source]

Bayes’ classifier based on conditional kernel density estimation.

TODO: <MATH FORMULA and READ MORE and REFERENCES>

Parameters

kernel_name ({'gaussian', 'uniform', 'epanechnikov', 'cauchy'}, default='gaussian') – Name of kernel function.

Examples

>>> # Prepare data for two classes
>>> m_train = 100
>>> n_x, n_y = 1, 1
>>> x_train1 = np.random.normal(0, 1, size=(m_train // 2, n_x))
>>> y_train1 = np.random.normal(0, 1, size=(m_train // 2, n_y))
>>> labels_train1 = np.full(m_train // 2, 1)
>>> x_train2 = np.random.normal(3, 1, size=(m_train // 2, n_x))
>>> y_train2 = np.random.normal(0, 1, size=(m_train // 2, n_y))
>>> labels_train2 = np.full(m_train // 2, 2)
>>> x_train = np.concatenate((x_train1, x_train2))
>>> y_train = np.concatenate((y_train1, y_train2))
>>> labels_train = np.concatenate((labels_train1, labels_train2))
>>> y_star = np.array([0.0] * n_y)
>>> # Fit
>>> classifier = CKDEClassification().fit(x_train, y_train, y_star, labels_train)

Methods

fit(x_train, y_train, y_star, labels_train)

Fit the classifier.

pdfs(x_test)

Compute pdf of each class.

predict(x_test)

Predict class labels.

fit(x_train: ndarray, y_train: ndarray, y_star: ndarray, labels_train: ndarray, weights_train: Optional[ndarray] = None, share_bandwidth: bool = False, bandwidths_x: Optional[ndarray] = None, bandwidths_y: Optional[ndarray] = None, bandwidth_method: str = 'normal_reference', prior_prob: Optional[ndarray] = None, **kwargs)[source]

Fit the classifier.

Parameters
  • x_train (ndarray of shape (m_train, n_x)) – Data points (describing variables) as an array containing data with float type.

  • y_train (ndarray of shape (m_train, n_y)) – Data points (conditioning variables) as an array containing data with float type.

  • y_star (ndarray of shape (n_y,)) – Conditioned value.

  • labels_train (ndarray of shape (m_train,)) – Labels of data points as an array containing data with int type.

  • weights_train (ndarray of shape (m_train,), default=None) – Weights for data points. If None, all points are equally weighted.

  • share_bandwidth (bool, default=False) – Determines whether all classes should have common bandwidth. If False, estimator of each class gets its own bandwidth.

  • bandwidths_x (ndarray of shape (n_classes, n_x), optional) – Smoothing parameter of describing variables for each class.

  • bandwidths_y (ndarray of shape (n_classes, n_y), optional) – Smoothing parameter of conditioning variables for each class.

  • bandwidth_method ({'normal_reference', 'direct_plugin', 'ste_plugin', 'ml_cv'}, default='normal_reference') – Name of bandwidth selection method used to compute smoothing parameter.

  • prior_prob (ndarray of shape (n_classes,), default=None) – Prior probabilities of each class. If None, all classes are equally probable.

Returns

self – Fitted self instance of CKDEClassification.

Return type

object

Examples

>>> # Prepare data for two classes
>>> m_train = 100
>>> n_x, n_y = 1, 1
>>> x_train1 = np.random.normal(0, 1, size=(m_train // 2, n_x))
>>> y_train1 = np.random.normal(0, 1, size=(m_train // 2, n_y))
>>> labels_train1 = np.full(m_train // 2, 1)
>>> x_train2 = np.random.normal(3, 1, size=(m_train // 2, n_x))
>>> y_train2 = np.random.normal(0, 1, size=(m_train // 2, n_y))
>>> labels_train2 = np.full(m_train // 2, 2)
>>> x_train = np.concatenate((x_train1, x_train2))
>>> y_train = np.concatenate((y_train1, y_train2))
>>> labels_train = np.concatenate((labels_train1, labels_train2))
>>> y_star = np.array([0.0] * n_y)
>>> weights_train = np.random.uniform(0, 1, size=(m_train,))
>>> # Fit
>>> prior_prob = np.array([0.3, 0.7])
>>> params = (x_train, y_train, y_star, labels_train, weights_train)
>>> classifier = CKDEClassification().fit(*params, prior_prob=prior_prob)
predict(x_test: ndarray) ndarray[source]

Predict class labels.

Parameters

x_test (ndarray of shape (m_test, n_x)) – Grid data points (describing variables) as an array containing data with float type.

Returns

labels_pred – Predicted labels as an array containing data with int type.

Return type

ndarray of shape (m_test,)

Examples

>>> # Prepare data for two classes
>>> m_train = 100
>>> n_x, n_y = 1, 1
>>> m_test = 10
>>> x_train1 = np.random.normal(0, 1, size=(m_train // 2, n_x))
>>> y_train1 = np.random.normal(0, 1, size=(m_train // 2, n_y))
>>> labels_train1 = np.full(m_train // 2, 1)
>>> x_train2 = np.random.normal(3, 1, size=(m_train // 2, n_x))
>>> y_train2 = np.random.normal(0, 1, size=(m_train // 2, n_y))
>>> labels_train2 = np.full(m_train // 2, 2)
>>> x_train = np.concatenate((x_train1, x_train2))
>>> y_train = np.concatenate((y_train1, y_train2))
>>> labels_train = np.concatenate((labels_train1, labels_train2))
>>> y_star = np.array([0.0] * n_y)
>>> # Fit the classifier
>>> x_test = np.random.uniform(-1, 4, size=(m_test, n_x))
>>> params = (x_train, y_train, y_star, labels_train)
>>> classifier = CKDEClassification().fit(*params)
>>> # Predict labels
>>> labels_pred = classifier.predict(x_test)  # labels_pred shape (10,)
pdfs(x_test: ndarray) ndarray[source]

Compute pdf of each class.

Parameters

x_test (ndarray of shape (m_test, n_x)) – Grid data points (describing variables) as an array containing data with float type.

Returns

scores – Predicted scores as an array containing data with float type.

Return type

ndarray of shape (m_test, n_classes)

Examples

>>> # Prepare data for two classes
>>> m_train = 100
>>> n_x, n_y = 1, 1
>>> m_test = 10
>>> x_train1 = np.random.normal(0, 1, size=(m_train // 2, n_x))
>>> y_train1 = np.random.normal(0, 1, size=(m_train // 2, n_y))
>>> labels_train1 = np.full(m_train // 2, 1)
>>> x_train2 = np.random.normal(3, 1, size=(m_train // 2, n_x))
>>> y_train2 = np.random.normal(0, 1, size=(m_train // 2, n_y))
>>> labels_train2 = np.full(m_train // 2, 2)
>>> x_train = np.concatenate((x_train1, x_train2))
>>> y_train = np.concatenate((y_train1, y_train2))
>>> labels_train = np.concatenate((labels_train1, labels_train2))
>>> y_star = np.array([0.0] * n_y)
>>> # Fit the classifier
>>> x_test = np.random.uniform(-1, 4, size=(m_test, n_x))
>>> params = (x_train, y_train, y_star, labels_train)
>>> classifier = CKDEClassification().fit(*params)
>>> # Compute pdf of each class
>>> scores = classifier.pdfs(x_test)  # scores shape (10, 2)

CKDEOutliersDetection

class kdelearn.ckde_tasks.CKDEOutliersDetection(kernel_name: str = 'gaussian')[source]

Outliers detection based on conditional kernel density estimation.

TODO: <READ MORE>

Parameters

kernel_name ({'gaussian', 'uniform', 'epanechnikov', 'cauchy'}, default='gaussian') – Name of kernel function.

Examples

>>> # Prepare data
>>> m_train = 100
>>> n_x, n_y = 1, 1
>>> x_train = np.random.normal(0, 1, size=(m_train, n_x))
>>> y_train = np.random.normal(0, 1, size=(m_train, n_y))
>>> y_star = np.array([0.0] * n_y)
>>> # Fit the outliers detector
>>> params = (x_train, y_train, y_star)
>>> outliers_detector = CKDEOutliersDetection("gaussian").fit(*params)

Methods

fit(x_train, y_train, y_star[, ...])

Fit the outliers detector.

predict(x_test)

Predict the labels.

fit(x_train: ndarray, y_train: ndarray, y_star: ndarray, weights_train: Optional[ndarray] = None, bandwidth_x: Optional[ndarray] = None, bandwidth_y: Optional[ndarray] = None, bandwidth_method: str = 'normal_reference', r: float = 0.1, **kwargs)[source]

Fit the outliers detector.

Parameters
  • x_train (ndarray of shape (m_train, n_x)) – Data points (describing variables) as an array containing data with float type.

  • y_train (ndarray of shape (m_train, n_y)) – Data points (conditioning variables) as an array containing data with float type.

  • y_star (ndarray of shape (n_y,)) – Conditioned value.

  • weights_train (ndarray of shape (m_train,), default=None) – Weights for data points. If None is passed, all points are equally weighted.

  • bandwidth_x (ndarray of shape (n_x,), optional) – Smoothing parameter of describing variables.

  • bandwidth_y (ndarray of shape (n_y,), optional) – Smoothing parameter of conditioning variables.

  • bandwidth_method ({'normal_reference', 'direct_plugin'}, default='normal_reference') – Name of bandwidth selection method used to compute smoothing parameter when bandwidth is not given explicitly.

  • r (float, default=0.1) – Threshold separating outliers and inliers.

Returns

self – Fitted self instance of CKDEOutliersDetection.

Return type

object

Examples

>>> # Prepare data
>>> m_train = 100
>>> n_x, n_y = 1, 1
>>> x_train = np.random.normal(0, 1, size=(m_train, n_x))
>>> y_train = np.random.normal(0, 1, size=(m_train, n_y))
>>> y_star = np.array([0.0] * n_y)
>>> weights_train = np.random.uniform(0, 1, size=(m_train,))
>>> # Fit the outliers detector
>>> params = (x_train, y_train, y_star, weights_train)
>>> outliers_detector = CKDEOutliersDetection().fit(*params, r=0.1)
predict(x_test: ndarray) ndarray[source]

Predict the labels.

Parameters

x_test (ndarray of shape (m_test, n_x)) – Grid data points (describing variables) as a 2D array containing data with float type.

Returns

labels_pred – Predicted labels (0 - inlier, 1 - outlier) as an array containing data with int type.

Return type

ndarray of shape (m_test,)

Examples

>>> # Prepare data
>>> m_train = 100
>>> n_x, n_y = 1, 1
>>> m_test = 10
>>> x_train = np.random.normal(0, 1, size=(m_train, n_x))
>>> y_train = np.random.normal(0, 1, size=(m_train, n_y))
>>> y_star = np.array([0.0] * n_y)
>>> x_test = np.random.uniform(-3, 3, size=(m_test, n_x))
>>> # Fit the outliers detector
>>> params = (x_train, y_train, y_star)
>>> outliers_detector = CKDEOutliersDetection().fit(*params, r=0.1)
>>> # Predict the labels
>>> labels_pred = outliers_detector.predict(x_test)  # labels_pred shape (10,)

CKDEClustering

class kdelearn.ckde_tasks.CKDEClustering[source]

Clustering based on conditional kernel density estimation.

TODO: <READ MORE>

Examples

>>> # Prepare data for two clusters
>>> m_train = 100
>>> n_x, n_y = 1, 1
>>> x_train1 = np.random.normal(0, 1, size=(m_train // 2, n_x))
>>> y_train1 = np.random.normal(0, 1, size=(m_train // 2, n_y))
>>> x_train2 = np.random.normal(3, 1, size=(m_train // 2, n_x))
>>> y_train2 = np.random.normal(0, 1, size=(m_train // 2, n_y))
>>> x_train = np.concatenate((x_train1, x_train2))
>>> y_train = np.concatenate((y_train1, y_train2))
>>> y_star = np.array([0.0] * n_y)
>>> # Fit
>>> clustering = CKDEClustering().fit(x_train, y_train, y_star)

Methods

fit(x_train, y_train, y_star[, ...])

Fit the model.

predict(x_test[, algorithm, epsilon, delta])

Predict cluster labels.

fit(x_train: ndarray, y_train: ndarray, y_star: ndarray, weights_train: Optional[ndarray] = None, bandwidth_x: Optional[ndarray] = None, bandwidth_y: Optional[ndarray] = None, bandwidth_method: str = 'normal_reference', **kwargs)[source]

Fit the model.

Parameters
  • x_train (ndarray of shape (m_train, n_x)) – Data points (describing variables) as an array containing data with float type.

  • y_train (ndarray of shape (m_train, n_y)) – Data points (conditioning variables) as an array containing data with float type.

  • y_star (ndarray of shape (n_y,)) – Conditioned value.

  • weights_train (ndarray of shape (m_train,), optional) – Weights of data points. If None, all points are equally weighted.

  • bandwidth_x (ndarray of shape (n_x,), optional) – Smoothing parameter of describing variables.

  • bandwidth_y (ndarray of shape (n_y,), optional) – Smoothing parameter of conditioning variables.

  • bandwidth_method ({'normal_reference', 'direct_plugin', 'ste_plugin', 'ml_cv'}, default='normal_reference') – Name of bandwidth selection method used to compute smoothing parameter when bandwidth is not given explicitly.

Returns

self – Fitted self instance of CKDEClustering.

Return type

object

Examples

>>> # Prepare data for two clusters
>>> m_train = 100
>>> n_x, n_y = 1, 1
>>> x_train1 = np.random.normal(0, 1, size=(m_train // 2, n_x))
>>> y_train1 = np.random.normal(0, 1, size=(m_train // 2, n_y))
>>> x_train2 = np.random.normal(3, 1, size=(m_train // 2, n_x))
>>> y_train2 = np.random.normal(0, 1, size=(m_train // 2, n_y))
>>> x_train = np.concatenate((x_train1, x_train2))
>>> y_train = np.concatenate((y_train1, y_train2))
>>> y_star = np.array([0.0] * n_y)
>>> weights_train = np.random.uniform(0, 1, size=(m_train,))
>>> # Fit
>>> clustering = CKDEClustering().fit(x_train, y_train, y_star, weights_train)
predict(x_test: ndarray, algorithm: str = 'mean_shift', epsilon: float = 1e-08, delta: float = 0.001)[source]

Predict cluster labels.

Parameters
  • x_test (ndarray of shape (m_test, n_x)) – Grid data points (describing variables) as an array containing data with float type.

  • algorithm ({'gradient_ascent', 'mean_shift'}, default='mean_shift') – Name of clustering algorithm.

  • epsilon (float, default=1e-8) – Threshold for difference (euclidean distance) of data point position while shifting. When the difference is less than epsilon, data point is no longer shifted.

  • delta (float, default=1e-3) – Acceptance error (euclidean distance) between shifted data point and representative of cluster. If the error is less than delta, data point is assigned to cluster represented by cluster representative.

Returns

labels_pred – Predicted labels as an array containing data with int type.

Return type

ndarray of shape (m_train,)

Examples

>>> # Prepare data for two clusters
>>> m_train = 100
>>> n_x, n_y = 1, 1
>>> x_train1 = np.random.normal(0, 1, size=(m_train // 2, n_x))
>>> y_train1 = np.random.normal(0, 1, size=(m_train // 2, n_y))
>>> x_train2 = np.random.normal(3, 1, size=(m_train // 2, n_x))
>>> y_train2 = np.random.normal(0, 1, size=(m_train // 2, n_y))
>>> x_train = np.concatenate((x_train1, x_train2))
>>> y_train = np.concatenate((y_train1, y_train2))
>>> y_star = np.array([0.0] * n_y)
>>> # Fit
>>> clustering = CKDEClustering().fit(x_train, y_train, y_star)
>>> labels_pred = clustering.predict()