Conditional case

CKDE

class kdelearn.ckde.CKDE(kernel_name: str = 'gaussian')[source]

Conditional kernel density estimator with product kernel:

TODO: <MATH FORMULA and READ MORE and REFERENCES>

Parameters:: kernel_name ({'gaussian', 'uniform', 'epanechnikov', 'cauchy'}, default='gaussian') – Name of kernel function.

Examples

>>> # Prepare data
>>> m_train = 100
>>> n_x, n_y = 1, 1
>>> x_train = np.random.normal(0, 1, size=(m_train, n_x))
>>> y_train = np.random.normal(0, 1, size=(m_train, n_y))
>>> y_star = np.array([0.0] * n_y)
>>> # Fit
>>> ckde = CKDE("gaussian").fit(x_train, y_train, y_star)

Methods

`fit`(x_train, y_train, y_star[, ...])	Fit the estimator.
`pdf`(x_test)	Compute conditional probability density function.

fit(x_train: ndarray, y_train: ndarray, y_star: ndarray, weights_train: ndarray | None = None, bandwidth_x: ndarray | None = None, bandwidth_y: ndarray | None = None, bandwidth_method: str = 'direct_plugin', **kwargs)[source]

Fit the estimator.

Parameters:

x_train (ndarray of shape (m_train, n_x)) – Data points (describing variables) as an array containing data with float type.
y_train (ndarray of shape (m_train, n_y)) – Data points (conditioning variables) as an array containing data with float type.
y_star (ndarray of shape (n_y,)) – Conditioned value.
weights_train (ndarray of shape (m_train,), optional) – Weights of data points. If None, all points are equally weighted.
bandwidth_x (ndarray of shape (n_x,), optional) – Smoothing parameter of describing variables.
bandwidth_y (ndarray of shape (n_y,), optional) – Smoothing parameter of conditioning variables.
bandwidth_method ({'normal_reference', 'direct_plugin', 'ste_plugin', 'ml_cv'}, default='normal_reference') – Name of bandwidth selection method used to compute smoothing parameter when bandwidth is not given explicitly.

Returns:

self – Fitted self instance of CKDE.

Return type:

object

Examples

>>> # Prepare data
>>> m_train = 100
>>> n_x, n_y = 1, 1
>>> x_train = np.random.normal(0, 1, size=(m_train, n_x))
>>> y_train = np.random.normal(0, 1, size=(m_train, n_y))
>>> weights_train = np.random.randint(1, 10, size=(m_train,))
>>> y_star = np.array([0.0] * n_y)
>>> bandwidth_x = np.array([0.5] * n_x)
>>> bandwidth_y = np.array([0.5] * n_y)
>>> # Fit the estimator
>>> params = (x_train, y_train, y_star, weights_train, bandwidth_x, bandwidth_y)
>>> ckde = CKDE().fit(*params)

pdf(x_test: ndarray) → Tuple[ndarray, ndarray][source]

Compute conditional probability density function.

Parameters:

x_test (ndarray of shape (m_test, n_x)) – Grid data points (describing variables) as an array containing data with float type.

Returns:

scores (ndarray of shape (m_test,)) – Values of kernel density estimator.
cond_weights_train (ndarray of shape (m_train,)) – TODO: complete !!!!!!!!

Examples

>>> # Prepare data
>>> m_train = 100
>>> n_x, n_y = 1, 1
>>> m_test = 10
>>> x_train = np.random.normal(0, 1, (m_train, n_x))
>>> y_train = np.random.normal(0, 1, (m_train, n_y))
>>> y_star = np.array([0.0] * n_y)
>>> x_test = np.random.uniform(-3, 3, (m_test, n_x))
>>> # Fit the estimator.
>>> ckde = CKDE().fit(x_train, y_train, y_star)
>>> # Compute pdf
>>> scores = ckde.pdf(x_test)  # scores shape (10,)

CKDEClassification

class kdelearn.ckde_tasks.CKDEClassification(kernel_name: str = 'gaussian')[source]

Bayes’ classifier based on conditional kernel density estimation.

TODO: <MATH FORMULA and READ MORE and REFERENCES>

Parameters:: kernel_name ({'gaussian', 'uniform', 'epanechnikov', 'cauchy'}, default='gaussian') – Name of kernel function.

Examples

>>> # Prepare data for two classes
>>> m_train = 100
>>> n_x, n_y = 1, 1
>>> x_train1 = np.random.normal(0, 1, size=(m_train // 2, n_x))
>>> y_train1 = np.random.normal(0, 1, size=(m_train // 2, n_y))
>>> labels_train1 = np.full(m_train // 2, 1)
>>> x_train2 = np.random.normal(3, 1, size=(m_train // 2, n_x))
>>> y_train2 = np.random.normal(0, 1, size=(m_train // 2, n_y))
>>> labels_train2 = np.full(m_train // 2, 2)
>>> x_train = np.concatenate((x_train1, x_train2))
>>> y_train = np.concatenate((y_train1, y_train2))
>>> labels_train = np.concatenate((labels_train1, labels_train2))
>>> y_star = np.array([0.0] * n_y)
>>> # Fit
>>> classifier = CKDEClassification().fit(x_train, y_train, y_star, labels_train)

Methods

`fit`(x_train, y_train, y_star, labels_train)	Fit the classifier.
`pdfs`(x_test)	Compute pdf of each class.
`predict`(x_test)	Predict class labels.

fit(x_train: ndarray, y_train: ndarray, y_star: ndarray, labels_train: ndarray, weights_train: ndarray | None = None, share_bandwidth: bool = False, bandwidths_x: ndarray | None = None, bandwidths_y: ndarray | None = None, bandwidth_method: str = 'direct_plugin', prior_prob: ndarray | None = None, **kwargs)[source]

Fit the classifier.

Parameters:

x_train (ndarray of shape (m_train, n_x)) – Data points (describing variables) as an array containing data with float type.
y_train (ndarray of shape (m_train, n_y)) – Data points (conditioning variables) as an array containing data with float type.
y_star (ndarray of shape (n_y,)) – Conditioned value.
labels_train (ndarray of shape (m_train,)) – Labels of data points as an array containing data with int type.
weights_train (ndarray of shape (m_train,), default=None) – Weights for data points. If None, all points are equally weighted.
share_bandwidth (bool, default=False) – Determines whether all classes should have common bandwidth. If False, estimator of each class gets its own bandwidth.
bandwidths_x (ndarray of shape (n_classes, n_x), optional) – Smoothing parameter of describing variables for each class.
bandwidths_y (ndarray of shape (n_classes, n_y), optional) – Smoothing parameter of conditioning variables for each class.
bandwidth_method ({'normal_reference', 'direct_plugin', 'ste_plugin', 'ml_cv'}, default='normal_reference') – Name of bandwidth selection method used to compute smoothing parameter.
prior_prob (ndarray of shape (n_classes,), default=None) – Prior probabilities of each class. If None, all classes are equally probable.

Returns:

self – Fitted self instance of CKDEClassification.

Return type:

object

Examples

>>> # Prepare data for two classes
>>> m_train = 100
>>> n_x, n_y = 1, 1
>>> x_train1 = np.random.normal(0, 1, size=(m_train // 2, n_x))
>>> y_train1 = np.random.normal(0, 1, size=(m_train // 2, n_y))
>>> labels_train1 = np.full(m_train // 2, 1)
>>> x_train2 = np.random.normal(3, 1, size=(m_train // 2, n_x))
>>> y_train2 = np.random.normal(0, 1, size=(m_train // 2, n_y))
>>> labels_train2 = np.full(m_train // 2, 2)
>>> x_train = np.concatenate((x_train1, x_train2))
>>> y_train = np.concatenate((y_train1, y_train2))
>>> labels_train = np.concatenate((labels_train1, labels_train2))
>>> y_star = np.array([0.0] * n_y)
>>> weights_train = np.random.uniform(0, 1, size=(m_train,))
>>> # Fit
>>> prior_prob = np.array([0.3, 0.7])
>>> params = (x_train, y_train, y_star, labels_train, weights_train)
>>> classifier = CKDEClassification().fit(*params, prior_prob=prior_prob)

predict(x_test: ndarray) → ndarray[source]

Predict class labels.

Parameters:: x_test (ndarray of shape (m_test, n_x)) – Grid data points (describing variables) as an array containing data with float type.
Returns:: labels_pred – Predicted labels as an array containing data with int type.
Return type:: ndarray of shape (m_test,)

Examples

>>> # Prepare data for two classes
>>> m_train = 100
>>> n_x, n_y = 1, 1
>>> m_test = 10
>>> x_train1 = np.random.normal(0, 1, size=(m_train // 2, n_x))
>>> y_train1 = np.random.normal(0, 1, size=(m_train // 2, n_y))
>>> labels_train1 = np.full(m_train // 2, 1)
>>> x_train2 = np.random.normal(3, 1, size=(m_train // 2, n_x))
>>> y_train2 = np.random.normal(0, 1, size=(m_train // 2, n_y))
>>> labels_train2 = np.full(m_train // 2, 2)
>>> x_train = np.concatenate((x_train1, x_train2))
>>> y_train = np.concatenate((y_train1, y_train2))
>>> labels_train = np.concatenate((labels_train1, labels_train2))
>>> y_star = np.array([0.0] * n_y)
>>> # Fit the classifier
>>> x_test = np.random.uniform(-1, 4, size=(m_test, n_x))
>>> params = (x_train, y_train, y_star, labels_train)
>>> classifier = CKDEClassification().fit(*params)
>>> # Predict labels
>>> labels_pred = classifier.predict(x_test)  # labels_pred shape (10,)

pdfs(x_test: ndarray) → ndarray[source]

Compute pdf of each class.

Parameters:: x_test (ndarray of shape (m_test, n_x)) – Grid data points (describing variables) as an array containing data with float type.
Returns:: scores – Predicted scores as an array containing data with float type.
Return type:: ndarray of shape (m_test, n_classes)

Examples

>>> # Prepare data for two classes
>>> m_train = 100
>>> n_x, n_y = 1, 1
>>> m_test = 10
>>> x_train1 = np.random.normal(0, 1, size=(m_train // 2, n_x))
>>> y_train1 = np.random.normal(0, 1, size=(m_train // 2, n_y))
>>> labels_train1 = np.full(m_train // 2, 1)
>>> x_train2 = np.random.normal(3, 1, size=(m_train // 2, n_x))
>>> y_train2 = np.random.normal(0, 1, size=(m_train // 2, n_y))
>>> labels_train2 = np.full(m_train // 2, 2)
>>> x_train = np.concatenate((x_train1, x_train2))
>>> y_train = np.concatenate((y_train1, y_train2))
>>> labels_train = np.concatenate((labels_train1, labels_train2))
>>> y_star = np.array([0.0] * n_y)
>>> # Fit the classifier
>>> x_test = np.random.uniform(-1, 4, size=(m_test, n_x))
>>> params = (x_train, y_train, y_star, labels_train)
>>> classifier = CKDEClassification().fit(*params)
>>> # Compute pdf of each class
>>> scores = classifier.pdfs(x_test)  # scores shape (10, 2)

CKDEOutliersDetection

class kdelearn.ckde_tasks.CKDEOutliersDetection(kernel_name: str = 'gaussian')[source]

Outliers detection based on conditional kernel density estimation.

TODO: <READ MORE>

Parameters:: kernel_name ({'gaussian', 'uniform', 'epanechnikov', 'cauchy'}, default='gaussian') – Name of kernel function.

Examples

>>> # Prepare data
>>> m_train = 100
>>> n_x, n_y = 1, 1
>>> x_train = np.random.normal(0, 1, size=(m_train, n_x))
>>> y_train = np.random.normal(0, 1, size=(m_train, n_y))
>>> y_star = np.array([0.0] * n_y)
>>> # Fit the outliers detector
>>> params = (x_train, y_train, y_star)
>>> outliers_detector = CKDEOutliersDetection("gaussian").fit(*params)

Methods

`fit`(x_train, y_train, y_star[, ...])	Fit the outliers detector.
`predict`(x_test)	Predict the labels.

fit(x_train: ndarray, y_train: ndarray, y_star: ndarray, weights_train: ndarray | None = None, bandwidth_x: ndarray | None = None, bandwidth_y: ndarray | None = None, bandwidth_method: str = 'direct_plugin', r: float = 0.05, **kwargs)[source]

Fit the outliers detector.

Parameters:

x_train (ndarray of shape (m_train, n_x)) – Data points (describing variables) as an array containing data with float type.
y_train (ndarray of shape (m_train, n_y)) – Data points (conditioning variables) as an array containing data with float type.
y_star (ndarray of shape (n_y,)) – Conditioned value.
weights_train (ndarray of shape (m_train,), default=None) – Weights for data points. If None is passed, all points are equally weighted.
bandwidth_x (ndarray of shape (n_x,), optional) – Smoothing parameter of describing variables.
bandwidth_y (ndarray of shape (n_y,), optional) – Smoothing parameter of conditioning variables.
bandwidth_method ({'normal_reference', 'direct_plugin'}, default='normal_reference') – Name of bandwidth selection method used to compute smoothing parameter when bandwidth is not given explicitly.
r (float, default=0.05) – Threshold separating outliers and inliers.

Returns:

self – Fitted self instance of CKDEOutliersDetection.

Return type:

object

Examples

>>> # Prepare data
>>> m_train = 100
>>> n_x, n_y = 1, 1
>>> x_train = np.random.normal(0, 1, size=(m_train, n_x))
>>> y_train = np.random.normal(0, 1, size=(m_train, n_y))
>>> y_star = np.array([0.0] * n_y)
>>> weights_train = np.random.uniform(0, 1, size=(m_train,))
>>> # Fit the outliers detector
>>> params = (x_train, y_train, y_star, weights_train)
>>> outliers_detector = CKDEOutliersDetection().fit(*params, r=0.1)

predict(x_test: ndarray) → ndarray[source]

Predict the labels.

Parameters:: x_test (ndarray of shape (m_test, n_x)) – Grid data points (describing variables) as a 2D array containing data with float type.
Returns:: labels_pred – Predicted labels (0 - inlier, 1 - outlier) as an array containing data with int type.
Return type:: ndarray of shape (m_test,)

Examples

>>> # Prepare data
>>> m_train = 100
>>> n_x, n_y = 1, 1
>>> m_test = 10
>>> x_train = np.random.normal(0, 1, size=(m_train, n_x))
>>> y_train = np.random.normal(0, 1, size=(m_train, n_y))
>>> y_star = np.array([0.0] * n_y)
>>> x_test = np.random.uniform(-3, 3, size=(m_test, n_x))
>>> # Fit the outliers detector
>>> params = (x_train, y_train, y_star)
>>> outliers_detector = CKDEOutliersDetection().fit(*params, r=0.1)
>>> # Predict the labels
>>> labels_pred = outliers_detector.predict(x_test)  # labels_pred shape (10,)

CKDEClustering

class kdelearn.ckde_tasks.CKDEClustering[source]

Clustering based on conditional kernel density estimation.

TODO: <READ MORE>

Examples

>>> # Prepare data for two clusters
>>> m_train = 100
>>> n_x, n_y = 1, 1
>>> x_train1 = np.random.normal(0, 1, size=(m_train // 2, n_x))
>>> y_train1 = np.random.normal(0, 1, size=(m_train // 2, n_y))
>>> x_train2 = np.random.normal(3, 1, size=(m_train // 2, n_x))
>>> y_train2 = np.random.normal(0, 1, size=(m_train // 2, n_y))
>>> x_train = np.concatenate((x_train1, x_train2))
>>> y_train = np.concatenate((y_train1, y_train2))
>>> y_star = np.array([0.0] * n_y)
>>> # Fit
>>> clustering = CKDEClustering().fit(x_train, y_train, y_star)

Methods

`fit`(x_train, y_train, y_star[, ...])	Fit the model.
`predict`(x_test[, algorithm, epsilon, delta])	Predict cluster labels.

fit(x_train: ndarray, y_train: ndarray, y_star: ndarray, weights_train: ndarray | None = None, bandwidth_x: ndarray | None = None, bandwidth_y: ndarray | None = None, bandwidth_method: str = 'direct_plugin', **kwargs)[source]

Fit the model.

Parameters:

x_train (ndarray of shape (m_train, n_x)) – Data points (describing variables) as an array containing data with float type.
y_train (ndarray of shape (m_train, n_y)) – Data points (conditioning variables) as an array containing data with float type.
y_star (ndarray of shape (n_y,)) – Conditioned value.
weights_train (ndarray of shape (m_train,), optional) – Weights of data points. If None, all points are equally weighted.
bandwidth_x (ndarray of shape (n_x,), optional) – Smoothing parameter of describing variables.
bandwidth_y (ndarray of shape (n_y,), optional) – Smoothing parameter of conditioning variables.
bandwidth_method ({'normal_reference', 'direct_plugin', 'ste_plugin', 'ml_cv'}, default='normal_reference') – Name of bandwidth selection method used to compute smoothing parameter when bandwidth is not given explicitly.

Returns:

self – Fitted self instance of CKDEClustering.

Return type:

object

Examples

>>> # Prepare data for two clusters
>>> m_train = 100
>>> n_x, n_y = 1, 1
>>> x_train1 = np.random.normal(0, 1, size=(m_train // 2, n_x))
>>> y_train1 = np.random.normal(0, 1, size=(m_train // 2, n_y))
>>> x_train2 = np.random.normal(3, 1, size=(m_train // 2, n_x))
>>> y_train2 = np.random.normal(0, 1, size=(m_train // 2, n_y))
>>> x_train = np.concatenate((x_train1, x_train2))
>>> y_train = np.concatenate((y_train1, y_train2))
>>> y_star = np.array([0.0] * n_y)
>>> weights_train = np.random.uniform(0, 1, size=(m_train,))
>>> # Fit
>>> clustering = CKDEClustering().fit(x_train, y_train, y_star, weights_train)

predict(x_test: ndarray, algorithm: str = 'mean_shift', epsilon: float = 1e-08, delta: float = 0.001)[source]

Predict cluster labels.

Parameters:

x_test (ndarray of shape (m_test, n_x)) – Grid data points (describing variables) as an array containing data with float type.
algorithm ({'gradient_ascent', 'mean_shift'}, default='mean_shift') – Name of clustering algorithm.
epsilon (float, default=1e-8) – Threshold for difference (euclidean distance) of data point position while shifting. When the difference is less than epsilon, data point is no longer shifted.
delta (float, default=1e-3) – Acceptance error (euclidean distance) between shifted data point and representative of cluster. If the error is less than delta, data point is assigned to cluster represented by cluster representative.

Returns:

labels_pred – Predicted labels as an array containing data with int type.

Return type:

ndarray of shape (m_train,)

Examples

>>> # Prepare data for two clusters
>>> m_train = 100
>>> n_x, n_y = 1, 1
>>> x_train1 = np.random.normal(0, 1, size=(m_train // 2, n_x))
>>> y_train1 = np.random.normal(0, 1, size=(m_train // 2, n_y))
>>> x_train2 = np.random.normal(3, 1, size=(m_train // 2, n_x))
>>> y_train2 = np.random.normal(0, 1, size=(m_train // 2, n_y))
>>> x_train = np.concatenate((x_train1, x_train2))
>>> y_train = np.concatenate((y_train1, y_train2))
>>> y_star = np.array([0.0] * n_y)
>>> # Fit
>>> clustering = CKDEClustering().fit(x_train, y_train, y_star)
>>> labels_pred = clustering.predict()