Conditional case

CKDE

class kdelearn.ckde.CKDE(kernel_name: str = 'gaussian')[source]

Conditional kernel density estimator with product kernel:

TODO: <MATH FORMULA and READ MORE and REFERENCES>

Parameters: kernel_name ({'gaussian', 'uniform', 'epanechnikov', 'cauchy'}, default='gaussian') – Name of kernel function.

Examples

>>> # Prepare data
>>> m_train = 100
>>> n_x, n_y = 1, 1
>>> x_train = np.random.normal(0, 1, size=(m_train, n_x))
>>> y_train = np.random.normal(0, 1, size=(m_train, n_y))
>>> y_star = np.array([0.0] * n_y)
>>> # Fit
>>> ckde = CKDE("gaussian").fit(x_train, y_train, y_star)

Methods

`fit`(x_train, y_train, y_star[, ...])	Fit the estimator.
`pdf`(x_test)	Compute conditional probability density function.

fit(x_train: ndarray, y_train: ndarray, y_star: ndarray, weights_train: Optional[ndarray] = None, bandwidth_x: Optional[ndarray] = None, bandwidth_y: Optional[ndarray] = None, bandwidth_method: str = 'normal_reference', **kwargs)[source]

Fit the estimator.

Parameters

x_train (ndarray of shape (m_train, n_x)) – Data points (describing variables) as an array containing data with float type.
y_train (ndarray of shape (m_train, n_y)) – Data points (conditioning variables) as an array containing data with float type.
y_star (ndarray of shape (n_y,)) – Conditioned value.
weights_train (ndarray of shape (m_train,), optional) – Weights of data points. If None, all points are equally weighted.
bandwidth_x (ndarray of shape (n_x,), optional) – Smoothing parameter of describing variables.
bandwidth_y (ndarray of shape (n_y,), optional) – Smoothing parameter of conditioning variables.
bandwidth_method ({'normal_reference', 'direct_plugin', 'ste_plugin', 'ml_cv'}, default='normal_reference') – Name of bandwidth selection method used to compute smoothing parameter when bandwidth is not given explicitly.

Returns

self – Fitted self instance of CKDE.

Return type

object

Examples

>>> # Prepare data
>>> m_train = 100
>>> n_x, n_y = 1, 1
>>> x_train = np.random.normal(0, 1, size=(m_train, n_x))
>>> y_train = np.random.normal(0, 1, size=(m_train, n_y))
>>> weights_train = np.random.randint(1, 10, size=(m_train,))
>>> y_star = np.array([0.0] * n_y)
>>> bandwidth_x = np.array([0.5] * n_x)
>>> bandwidth_y = np.array([0.5] * n_y)
>>> # Fit the estimator
>>> params = (x_train, y_train, y_star, weights_train, bandwidth_x, bandwidth_y)
>>> ckde = CKDE().fit(*params)

pdf(x_test: ndarray) → Tuple[ndarray, ndarray][source]

Compute conditional probability density function.

Parameters

x_test (ndarray of shape (m_test, n_x)) – Grid data points (describing variables) as an array containing data with float type.

Returns

scores (ndarray of shape (m_test,)) – Values of kernel density estimator.
cond_weights_train (ndarray of shape (m_train,)) – TODO: complete !!!!!!!!

Examples

>>> # Prepare data
>>> m_train = 100
>>> n_x, n_y = 1, 1
>>> m_test = 10
>>> x_train = np.random.normal(0, 1, (m_train, n_x))
>>> y_train = np.random.normal(0, 1, (m_train, n_y))
>>> y_star = np.array([0.0] * n_y)
>>> x_test = np.random.uniform(-3, 3, (m_test, n_x))
>>> # Fit the estimator.
>>> ckde = CKDE().fit(x_train, y_train, y_star)
>>> # Compute pdf
>>> scores, d = ckde.pdf(x_test)  # scores shape (10,)

CKDEClassification

class kdelearn.ckde_tasks.CKDEClassification(kernel_name: str = 'gaussian')[source]

Bayes’ classifier based on conditional kernel density estimation.

TODO: <MATH FORMULA and READ MORE and REFERENCES>

Parameters: kernel_name ({'gaussian', 'uniform', 'epanechnikov', 'cauchy'}, default='gaussian') – Name of kernel function.

Examples

>>> # Prepare data for two classes
>>> m_train = 100
>>> n_x, n_y = 1, 1
>>> x_train1 = np.random.normal(0, 1, size=(m_train // 2, n_x))
>>> y_train1 = np.random.normal(0, 1, size=(m_train // 2, n_y))
>>> labels_train1 = np.full(m_train // 2, 1)
>>> x_train2 = np.random.normal(3, 1, size=(m_train // 2, n_x))
>>> y_train2 = np.random.normal(0, 1, size=(m_train // 2, n_y))
>>> labels_train2 = np.full(m_train // 2, 2)
>>> x_train = np.concatenate((x_train1, x_train2))
>>> y_train = np.concatenate((y_train1, y_train2))
>>> labels_train = np.concatenate((labels_train1, labels_train2))
>>> y_star = np.array([0.0] * n_y)
>>> # Fit
>>> classifier = CKDEClassification().fit(x_train, y_train, y_star, labels_train)

Methods

`fit`(x_train, y_train, y_star, labels_train)	Fit the classifier.
`pdfs`(x_test)	Compute pdf of each class.
`predict`(x_test)	Predict class labels.

fit(x_train: ndarray, y_train: ndarray, y_star: ndarray, labels_train: ndarray, weights_train: Optional[ndarray] = None, share_bandwidth: bool = False, bandwidths_x: Optional[ndarray] = None, bandwidths_y: Optional[ndarray] = None, bandwidth_method: str = 'normal_reference', prior_prob: Optional[ndarray] = None, **kwargs)[source]

Fit the classifier.

Parameters

x_train (ndarray of shape (m_train, n_x)) – Data points (describing variables) as an array containing data with float type.
y_train (ndarray of shape (m_train, n_y)) – Data points (conditioning variables) as an array containing data with float type.
y_star (ndarray of shape (n_y,)) – Conditioned value.
labels_train (ndarray of shape (m_train,)) – Labels of data points as an array containing data with int type.
weights_train (ndarray of shape (m_train,), default=None) – Weights for data points. If None, all points are equally weighted.
share_bandwidth (bool, default=False) – Determines whether all classes should have common bandwidth. If False, estimator of each class gets its own bandwidth.
bandwidths_x (ndarray of shape (n_classes, n_x), optional) – Smoothing parameter of describing variables for each class.
bandwidths_y (ndarray of shape (n_classes, n_y), optional) – Smoothing parameter of conditioning variables for each class.
bandwidth_method ({'normal_reference', 'direct_plugin', 'ste_plugin', 'ml_cv'}, default='normal_reference') – Name of bandwidth selection method used to compute smoothing parameter.
prior_prob (ndarray of shape (n_classes,), default=None) – Prior probabilities of each class. If None, all classes are equally probable.

Returns

self – Fitted self instance of CKDEClassification.

Return type

object

Examples

>>> # Prepare data for two classes
>>> m_train = 100
>>> n_x, n_y = 1, 1
>>> x_train1 = np.random.normal(0, 1, size=(m_train // 2, n_x))
>>> y_train1 = np.random.normal(0, 1, size=(m_train // 2, n_y))
>>> labels_train1 = np.full(m_train // 2, 1)
>>> x_train2 = np.random.normal(3, 1, size=(m_train // 2, n_x))
>>> y_train2 = np.random.normal(0, 1, size=(m_train // 2, n_y))
>>> labels_train2 = np.full(m_train // 2, 2)
>>> x_train = np.concatenate((x_train1, x_train2))
>>> y_train = np.concatenate((y_train1, y_train2))
>>> labels_train = np.concatenate((labels_train1, labels_train2))
>>> y_star = np.array([0.0] * n_y)
>>> weights_train = np.random.uniform(0, 1, size=(m_train,))
>>> # Fit
>>> prior_prob = np.array([0.3, 0.7])
>>> params = (x_train, y_train, y_star, labels_train, weights_train)
>>> classifier = CKDEClassification().fit(*params, prior_prob=prior_prob)

predict(x_test: ndarray) → ndarray[source]

Predict class labels.

Parameters: x_test (ndarray of shape (m_test, n_x)) – Grid data points (describing variables) as an array containing data with float type.
Returns: labels_pred – Predicted labels as an array containing data with int type.
Return type: ndarray of shape (m_test,)

Examples

>>> # Prepare data for two classes
>>> m_train = 100
>>> n_x, n_y = 1, 1
>>> m_test = 10
>>> x_train1 = np.random.normal(0, 1, size=(m_train // 2, n_x))
>>> y_train1 = np.random.normal(0, 1, size=(m_train // 2, n_y))
>>> labels_train1 = np.full(m_train // 2, 1)
>>> x_train2 = np.random.normal(3, 1, size=(m_train // 2, n_x))
>>> y_train2 = np.random.normal(0, 1, size=(m_train // 2, n_y))
>>> labels_train2 = np.full(m_train // 2, 2)
>>> x_train = np.concatenate((x_train1, x_train2))
>>> y_train = np.concatenate((y_train1, y_train2))
>>> labels_train = np.concatenate((labels_train1, labels_train2))
>>> y_star = np.array([0.0] * n_y)
>>> # Fit the classifier
>>> x_test = np.random.uniform(-1, 4, size=(m_test, n_x))
>>> params = (x_train, y_train, y_star, labels_train)
>>> classifier = CKDEClassification().fit(*params)
>>> # Predict labels
>>> labels_pred = classifier.predict(x_test)  # labels_pred shape (10,)

pdfs(x_test: ndarray) → ndarray[source]

Compute pdf of each class.

Parameters: x_test (ndarray of shape (m_test, n_x)) – Grid data points (describing variables) as an array containing data with float type.
Returns: scores – Predicted scores as an array containing data with float type.
Return type: ndarray of shape (m_test, n_classes)

Examples

>>> # Prepare data for two classes
>>> m_train = 100
>>> n_x, n_y = 1, 1
>>> m_test = 10
>>> x_train1 = np.random.normal(0, 1, size=(m_train // 2, n_x))
>>> y_train1 = np.random.normal(0, 1, size=(m_train // 2, n_y))
>>> labels_train1 = np.full(m_train // 2, 1)
>>> x_train2 = np.random.normal(3, 1, size=(m_train // 2, n_x))
>>> y_train2 = np.random.normal(0, 1, size=(m_train // 2, n_y))
>>> labels_train2 = np.full(m_train // 2, 2)
>>> x_train = np.concatenate((x_train1, x_train2))
>>> y_train = np.concatenate((y_train1, y_train2))
>>> labels_train = np.concatenate((labels_train1, labels_train2))
>>> y_star = np.array([0.0] * n_y)
>>> # Fit the classifier
>>> x_test = np.random.uniform(-1, 4, size=(m_test, n_x))
>>> params = (x_train, y_train, y_star, labels_train)
>>> classifier = CKDEClassification().fit(*params)
>>> # Compute pdf of each class
>>> scores = classifier.pdfs(x_test)  # scores shape (10, 2)

CKDEOutliersDetection

class kdelearn.ckde_tasks.CKDEOutliersDetection(kernel_name: str = 'gaussian')[source]

Outliers detection based on conditional kernel density estimation.

TODO: <READ MORE>

Parameters: kernel_name ({'gaussian', 'uniform', 'epanechnikov', 'cauchy'}, default='gaussian') – Name of kernel function.

Examples

>>> # Prepare data
>>> m_train = 100
>>> n_x, n_y = 1, 1
>>> x_train = np.random.normal(0, 1, size=(m_train, n_x))
>>> y_train = np.random.normal(0, 1, size=(m_train, n_y))
>>> y_star = np.array([0.0] * n_y)
>>> # Fit the outliers detector
>>> params = (x_train, y_train, y_star)
>>> outliers_detector = CKDEOutliersDetection("gaussian").fit(*params)

Methods

`fit`(x_train, y_train, y_star[, ...])	Fit the outliers detector.
`predict`(x_test)	Predict the labels.

fit(x_train: ndarray, y_train: ndarray, y_star: ndarray, weights_train: Optional[ndarray] = None, bandwidth_x: Optional[ndarray] = None, bandwidth_y: Optional[ndarray] = None, bandwidth_method: str = 'normal_reference', r: float = 0.1, **kwargs)[source]

Fit the outliers detector.

Parameters

x_train (ndarray of shape (m_train, n_x)) – Data points (describing variables) as an array containing data with float type.
y_train (ndarray of shape (m_train, n_y)) – Data points (conditioning variables) as an array containing data with float type.
y_star (ndarray of shape (n_y,)) – Conditioned value.
weights_train (ndarray of shape (m_train,), default=None) – Weights for data points. If None is passed, all points are equally weighted.
bandwidth_x (ndarray of shape (n_x,), optional) – Smoothing parameter of describing variables.
bandwidth_y (ndarray of shape (n_y,), optional) – Smoothing parameter of conditioning variables.
bandwidth_method ({'normal_reference', 'direct_plugin'}, default='normal_reference') – Name of bandwidth selection method used to compute smoothing parameter when bandwidth is not given explicitly.
r (float, default=0.1) – Threshold separating outliers and inliers.

Returns

self – Fitted self instance of CKDEOutliersDetection.

Return type

object

Examples

>>> # Prepare data
>>> m_train = 100
>>> n_x, n_y = 1, 1
>>> x_train = np.random.normal(0, 1, size=(m_train, n_x))
>>> y_train = np.random.normal(0, 1, size=(m_train, n_y))
>>> y_star = np.array([0.0] * n_y)
>>> weights_train = np.random.uniform(0, 1, size=(m_train,))
>>> # Fit the outliers detector
>>> params = (x_train, y_train, y_star, weights_train)
>>> outliers_detector = CKDEOutliersDetection().fit(*params, r=0.1)

predict(x_test: ndarray) → ndarray[source]

Predict the labels.

Parameters: x_test (ndarray of shape (m_test, n_x)) – Grid data points (describing variables) as a 2D array containing data with float type.
Returns: labels_pred – Predicted labels (0 - inlier, 1 - outlier) as an array containing data with int type.
Return type: ndarray of shape (m_test,)

Examples

>>> # Prepare data
>>> m_train = 100
>>> n_x, n_y = 1, 1
>>> m_test = 10
>>> x_train = np.random.normal(0, 1, size=(m_train, n_x))
>>> y_train = np.random.normal(0, 1, size=(m_train, n_y))
>>> y_star = np.array([0.0] * n_y)
>>> x_test = np.random.uniform(-3, 3, size=(m_test, n_x))
>>> # Fit the outliers detector
>>> params = (x_train, y_train, y_star)
>>> outliers_detector = CKDEOutliersDetection().fit(*params, r=0.1)
>>> # Predict the labels
>>> labels_pred = outliers_detector.predict(x_test)  # labels_pred shape (10,)

CKDEClustering

class kdelearn.ckde_tasks.CKDEClustering[source]

Clustering based on conditional kernel density estimation.

TODO: <READ MORE>

Examples

>>> # Prepare data for two clusters
>>> m_train = 100
>>> n_x, n_y = 1, 1
>>> x_train1 = np.random.normal(0, 1, size=(m_train // 2, n_x))
>>> y_train1 = np.random.normal(0, 1, size=(m_train // 2, n_y))
>>> x_train2 = np.random.normal(3, 1, size=(m_train // 2, n_x))
>>> y_train2 = np.random.normal(0, 1, size=(m_train // 2, n_y))
>>> x_train = np.concatenate((x_train1, x_train2))
>>> y_train = np.concatenate((y_train1, y_train2))
>>> y_star = np.array([0.0] * n_y)
>>> # Fit
>>> clustering = CKDEClustering().fit(x_train, y_train, y_star)

Methods

`fit`(x_train, y_train, y_star[, ...])	Fit the model.
`predict`(x_test[, algorithm, epsilon, delta])	Predict cluster labels.

fit(x_train: ndarray, y_train: ndarray, y_star: ndarray, weights_train: Optional[ndarray] = None, bandwidth_x: Optional[ndarray] = None, bandwidth_y: Optional[ndarray] = None, bandwidth_method: str = 'normal_reference', **kwargs)[source]

Fit the model.

Parameters

x_train (ndarray of shape (m_train, n_x)) – Data points (describing variables) as an array containing data with float type.
y_train (ndarray of shape (m_train, n_y)) – Data points (conditioning variables) as an array containing data with float type.
y_star (ndarray of shape (n_y,)) – Conditioned value.
weights_train (ndarray of shape (m_train,), optional) – Weights of data points. If None, all points are equally weighted.
bandwidth_x (ndarray of shape (n_x,), optional) – Smoothing parameter of describing variables.
bandwidth_y (ndarray of shape (n_y,), optional) – Smoothing parameter of conditioning variables.
bandwidth_method ({'normal_reference', 'direct_plugin', 'ste_plugin', 'ml_cv'}, default='normal_reference') – Name of bandwidth selection method used to compute smoothing parameter when bandwidth is not given explicitly.

Returns

self – Fitted self instance of CKDEClustering.

Return type

object

Examples

>>> # Prepare data for two clusters
>>> m_train = 100
>>> n_x, n_y = 1, 1
>>> x_train1 = np.random.normal(0, 1, size=(m_train // 2, n_x))
>>> y_train1 = np.random.normal(0, 1, size=(m_train // 2, n_y))
>>> x_train2 = np.random.normal(3, 1, size=(m_train // 2, n_x))
>>> y_train2 = np.random.normal(0, 1, size=(m_train // 2, n_y))
>>> x_train = np.concatenate((x_train1, x_train2))
>>> y_train = np.concatenate((y_train1, y_train2))
>>> y_star = np.array([0.0] * n_y)
>>> weights_train = np.random.uniform(0, 1, size=(m_train,))
>>> # Fit
>>> clustering = CKDEClustering().fit(x_train, y_train, y_star, weights_train)

predict(x_test: ndarray, algorithm: str = 'mean_shift', epsilon: float = 1e-08, delta: float = 0.001)[source]

Predict cluster labels.

Parameters

x_test (ndarray of shape (m_test, n_x)) – Grid data points (describing variables) as an array containing data with float type.
algorithm ({'gradient_ascent', 'mean_shift'}, default='mean_shift') – Name of clustering algorithm.
epsilon (float, default=1e-8) – Threshold for difference (euclidean distance) of data point position while shifting. When the difference is less than epsilon, data point is no longer shifted.
delta (float, default=1e-3) – Acceptance error (euclidean distance) between shifted data point and representative of cluster. If the error is less than delta, data point is assigned to cluster represented by cluster representative.

Returns

labels_pred – Predicted labels as an array containing data with int type.

Return type

ndarray of shape (m_train,)

Examples

>>> # Prepare data for two clusters
>>> m_train = 100
>>> n_x, n_y = 1, 1
>>> x_train1 = np.random.normal(0, 1, size=(m_train // 2, n_x))
>>> y_train1 = np.random.normal(0, 1, size=(m_train // 2, n_y))
>>> x_train2 = np.random.normal(3, 1, size=(m_train // 2, n_x))
>>> y_train2 = np.random.normal(0, 1, size=(m_train // 2, n_y))
>>> x_train = np.concatenate((x_train1, x_train2))
>>> y_train = np.concatenate((y_train1, y_train2))
>>> y_star = np.array([0.0] * n_y)
>>> # Fit
>>> clustering = CKDEClustering().fit(x_train, y_train, y_star)
>>> labels_pred = clustering.predict()