Conditional case
CKDE
- class kdelearn.ckde.CKDE(kernel_name: str = 'gaussian')[source]
Conditional kernel density estimator with product kernel:
TODO: <MATH FORMULA and READ MORE and REFERENCES>
- Parameters:
kernel_name ({'gaussian', 'uniform', 'epanechnikov', 'cauchy'}, default='gaussian') – Name of kernel function.
Examples
>>> # Prepare data >>> m_train = 100 >>> n_x, n_y = 1, 1 >>> x_train = np.random.normal(0, 1, size=(m_train, n_x)) >>> y_train = np.random.normal(0, 1, size=(m_train, n_y)) >>> y_star = np.array([0.0] * n_y) >>> # Fit >>> ckde = CKDE("gaussian").fit(x_train, y_train, y_star)
Methods
fit(x_train, y_train, y_star[, ...])Fit the estimator.
pdf(x_test)Compute conditional probability density function.
- fit(x_train: ndarray, y_train: ndarray, y_star: ndarray, weights_train: ndarray | None = None, bandwidth_x: ndarray | None = None, bandwidth_y: ndarray | None = None, bandwidth_method: str = 'direct_plugin', **kwargs)[source]
Fit the estimator.
- Parameters:
x_train (ndarray of shape (m_train, n_x)) – Data points (describing variables) as an array containing data with float type.
y_train (ndarray of shape (m_train, n_y)) – Data points (conditioning variables) as an array containing data with float type.
y_star (ndarray of shape (n_y,)) – Conditioned value.
weights_train (ndarray of shape (m_train,), optional) – Weights of data points. If None, all points are equally weighted.
bandwidth_x (ndarray of shape (n_x,), optional) – Smoothing parameter of describing variables.
bandwidth_y (ndarray of shape (n_y,), optional) – Smoothing parameter of conditioning variables.
bandwidth_method ({'normal_reference', 'direct_plugin', 'ste_plugin', 'ml_cv'}, default='normal_reference') – Name of bandwidth selection method used to compute smoothing parameter when bandwidth is not given explicitly.
- Returns:
self – Fitted self instance of CKDE.
- Return type:
object
Examples
>>> # Prepare data >>> m_train = 100 >>> n_x, n_y = 1, 1 >>> x_train = np.random.normal(0, 1, size=(m_train, n_x)) >>> y_train = np.random.normal(0, 1, size=(m_train, n_y)) >>> weights_train = np.random.randint(1, 10, size=(m_train,)) >>> y_star = np.array([0.0] * n_y) >>> bandwidth_x = np.array([0.5] * n_x) >>> bandwidth_y = np.array([0.5] * n_y) >>> # Fit the estimator >>> params = (x_train, y_train, y_star, weights_train, bandwidth_x, bandwidth_y) >>> ckde = CKDE().fit(*params)
- pdf(x_test: ndarray) Tuple[ndarray, ndarray][source]
Compute conditional probability density function.
- Parameters:
x_test (ndarray of shape (m_test, n_x)) – Grid data points (describing variables) as an array containing data with float type.
- Returns:
scores (ndarray of shape (m_test,)) – Values of kernel density estimator.
cond_weights_train (ndarray of shape (m_train,)) – TODO: complete !!!!!!!!
Examples
>>> # Prepare data >>> m_train = 100 >>> n_x, n_y = 1, 1 >>> m_test = 10 >>> x_train = np.random.normal(0, 1, (m_train, n_x)) >>> y_train = np.random.normal(0, 1, (m_train, n_y)) >>> y_star = np.array([0.0] * n_y) >>> x_test = np.random.uniform(-3, 3, (m_test, n_x)) >>> # Fit the estimator. >>> ckde = CKDE().fit(x_train, y_train, y_star) >>> # Compute pdf >>> scores = ckde.pdf(x_test) # scores shape (10,)
CKDEClassification
- class kdelearn.ckde_tasks.CKDEClassification(kernel_name: str = 'gaussian')[source]
Bayes’ classifier based on conditional kernel density estimation.
TODO: <MATH FORMULA and READ MORE and REFERENCES>
- Parameters:
kernel_name ({'gaussian', 'uniform', 'epanechnikov', 'cauchy'}, default='gaussian') – Name of kernel function.
Examples
>>> # Prepare data for two classes >>> m_train = 100 >>> n_x, n_y = 1, 1 >>> x_train1 = np.random.normal(0, 1, size=(m_train // 2, n_x)) >>> y_train1 = np.random.normal(0, 1, size=(m_train // 2, n_y)) >>> labels_train1 = np.full(m_train // 2, 1) >>> x_train2 = np.random.normal(3, 1, size=(m_train // 2, n_x)) >>> y_train2 = np.random.normal(0, 1, size=(m_train // 2, n_y)) >>> labels_train2 = np.full(m_train // 2, 2) >>> x_train = np.concatenate((x_train1, x_train2)) >>> y_train = np.concatenate((y_train1, y_train2)) >>> labels_train = np.concatenate((labels_train1, labels_train2)) >>> y_star = np.array([0.0] * n_y) >>> # Fit >>> classifier = CKDEClassification().fit(x_train, y_train, y_star, labels_train)
Methods
fit(x_train, y_train, y_star, labels_train)Fit the classifier.
pdfs(x_test)Compute pdf of each class.
predict(x_test)Predict class labels.
- fit(x_train: ndarray, y_train: ndarray, y_star: ndarray, labels_train: ndarray, weights_train: ndarray | None = None, share_bandwidth: bool = False, bandwidths_x: ndarray | None = None, bandwidths_y: ndarray | None = None, bandwidth_method: str = 'direct_plugin', prior_prob: ndarray | None = None, **kwargs)[source]
Fit the classifier.
- Parameters:
x_train (ndarray of shape (m_train, n_x)) – Data points (describing variables) as an array containing data with float type.
y_train (ndarray of shape (m_train, n_y)) – Data points (conditioning variables) as an array containing data with float type.
y_star (ndarray of shape (n_y,)) – Conditioned value.
labels_train (ndarray of shape (m_train,)) – Labels of data points as an array containing data with int type.
weights_train (ndarray of shape (m_train,), default=None) – Weights for data points. If None, all points are equally weighted.
share_bandwidth (bool, default=False) – Determines whether all classes should have common bandwidth. If False, estimator of each class gets its own bandwidth.
bandwidths_x (ndarray of shape (n_classes, n_x), optional) – Smoothing parameter of describing variables for each class.
bandwidths_y (ndarray of shape (n_classes, n_y), optional) – Smoothing parameter of conditioning variables for each class.
bandwidth_method ({'normal_reference', 'direct_plugin', 'ste_plugin', 'ml_cv'}, default='normal_reference') – Name of bandwidth selection method used to compute smoothing parameter.
prior_prob (ndarray of shape (n_classes,), default=None) – Prior probabilities of each class. If None, all classes are equally probable.
- Returns:
self – Fitted self instance of CKDEClassification.
- Return type:
object
Examples
>>> # Prepare data for two classes >>> m_train = 100 >>> n_x, n_y = 1, 1 >>> x_train1 = np.random.normal(0, 1, size=(m_train // 2, n_x)) >>> y_train1 = np.random.normal(0, 1, size=(m_train // 2, n_y)) >>> labels_train1 = np.full(m_train // 2, 1) >>> x_train2 = np.random.normal(3, 1, size=(m_train // 2, n_x)) >>> y_train2 = np.random.normal(0, 1, size=(m_train // 2, n_y)) >>> labels_train2 = np.full(m_train // 2, 2) >>> x_train = np.concatenate((x_train1, x_train2)) >>> y_train = np.concatenate((y_train1, y_train2)) >>> labels_train = np.concatenate((labels_train1, labels_train2)) >>> y_star = np.array([0.0] * n_y) >>> weights_train = np.random.uniform(0, 1, size=(m_train,)) >>> # Fit >>> prior_prob = np.array([0.3, 0.7]) >>> params = (x_train, y_train, y_star, labels_train, weights_train) >>> classifier = CKDEClassification().fit(*params, prior_prob=prior_prob)
- predict(x_test: ndarray) ndarray[source]
Predict class labels.
- Parameters:
x_test (ndarray of shape (m_test, n_x)) – Grid data points (describing variables) as an array containing data with float type.
- Returns:
labels_pred – Predicted labels as an array containing data with int type.
- Return type:
ndarray of shape (m_test,)
Examples
>>> # Prepare data for two classes >>> m_train = 100 >>> n_x, n_y = 1, 1 >>> m_test = 10 >>> x_train1 = np.random.normal(0, 1, size=(m_train // 2, n_x)) >>> y_train1 = np.random.normal(0, 1, size=(m_train // 2, n_y)) >>> labels_train1 = np.full(m_train // 2, 1) >>> x_train2 = np.random.normal(3, 1, size=(m_train // 2, n_x)) >>> y_train2 = np.random.normal(0, 1, size=(m_train // 2, n_y)) >>> labels_train2 = np.full(m_train // 2, 2) >>> x_train = np.concatenate((x_train1, x_train2)) >>> y_train = np.concatenate((y_train1, y_train2)) >>> labels_train = np.concatenate((labels_train1, labels_train2)) >>> y_star = np.array([0.0] * n_y) >>> # Fit the classifier >>> x_test = np.random.uniform(-1, 4, size=(m_test, n_x)) >>> params = (x_train, y_train, y_star, labels_train) >>> classifier = CKDEClassification().fit(*params) >>> # Predict labels >>> labels_pred = classifier.predict(x_test) # labels_pred shape (10,)
- pdfs(x_test: ndarray) ndarray[source]
Compute pdf of each class.
- Parameters:
x_test (ndarray of shape (m_test, n_x)) – Grid data points (describing variables) as an array containing data with float type.
- Returns:
scores – Predicted scores as an array containing data with float type.
- Return type:
ndarray of shape (m_test, n_classes)
Examples
>>> # Prepare data for two classes >>> m_train = 100 >>> n_x, n_y = 1, 1 >>> m_test = 10 >>> x_train1 = np.random.normal(0, 1, size=(m_train // 2, n_x)) >>> y_train1 = np.random.normal(0, 1, size=(m_train // 2, n_y)) >>> labels_train1 = np.full(m_train // 2, 1) >>> x_train2 = np.random.normal(3, 1, size=(m_train // 2, n_x)) >>> y_train2 = np.random.normal(0, 1, size=(m_train // 2, n_y)) >>> labels_train2 = np.full(m_train // 2, 2) >>> x_train = np.concatenate((x_train1, x_train2)) >>> y_train = np.concatenate((y_train1, y_train2)) >>> labels_train = np.concatenate((labels_train1, labels_train2)) >>> y_star = np.array([0.0] * n_y) >>> # Fit the classifier >>> x_test = np.random.uniform(-1, 4, size=(m_test, n_x)) >>> params = (x_train, y_train, y_star, labels_train) >>> classifier = CKDEClassification().fit(*params) >>> # Compute pdf of each class >>> scores = classifier.pdfs(x_test) # scores shape (10, 2)
CKDEOutliersDetection
- class kdelearn.ckde_tasks.CKDEOutliersDetection(kernel_name: str = 'gaussian')[source]
Outliers detection based on conditional kernel density estimation.
TODO: <READ MORE>
- Parameters:
kernel_name ({'gaussian', 'uniform', 'epanechnikov', 'cauchy'}, default='gaussian') – Name of kernel function.
Examples
>>> # Prepare data >>> m_train = 100 >>> n_x, n_y = 1, 1 >>> x_train = np.random.normal(0, 1, size=(m_train, n_x)) >>> y_train = np.random.normal(0, 1, size=(m_train, n_y)) >>> y_star = np.array([0.0] * n_y) >>> # Fit the outliers detector >>> params = (x_train, y_train, y_star) >>> outliers_detector = CKDEOutliersDetection("gaussian").fit(*params)
Methods
fit(x_train, y_train, y_star[, ...])Fit the outliers detector.
predict(x_test)Predict the labels.
- fit(x_train: ndarray, y_train: ndarray, y_star: ndarray, weights_train: ndarray | None = None, bandwidth_x: ndarray | None = None, bandwidth_y: ndarray | None = None, bandwidth_method: str = 'direct_plugin', r: float = 0.05, **kwargs)[source]
Fit the outliers detector.
- Parameters:
x_train (ndarray of shape (m_train, n_x)) – Data points (describing variables) as an array containing data with float type.
y_train (ndarray of shape (m_train, n_y)) – Data points (conditioning variables) as an array containing data with float type.
y_star (ndarray of shape (n_y,)) – Conditioned value.
weights_train (ndarray of shape (m_train,), default=None) – Weights for data points. If None is passed, all points are equally weighted.
bandwidth_x (ndarray of shape (n_x,), optional) – Smoothing parameter of describing variables.
bandwidth_y (ndarray of shape (n_y,), optional) – Smoothing parameter of conditioning variables.
bandwidth_method ({'normal_reference', 'direct_plugin'}, default='normal_reference') – Name of bandwidth selection method used to compute smoothing parameter when bandwidth is not given explicitly.
r (float, default=0.05) – Threshold separating outliers and inliers.
- Returns:
self – Fitted self instance of CKDEOutliersDetection.
- Return type:
object
Examples
>>> # Prepare data >>> m_train = 100 >>> n_x, n_y = 1, 1 >>> x_train = np.random.normal(0, 1, size=(m_train, n_x)) >>> y_train = np.random.normal(0, 1, size=(m_train, n_y)) >>> y_star = np.array([0.0] * n_y) >>> weights_train = np.random.uniform(0, 1, size=(m_train,)) >>> # Fit the outliers detector >>> params = (x_train, y_train, y_star, weights_train) >>> outliers_detector = CKDEOutliersDetection().fit(*params, r=0.1)
- predict(x_test: ndarray) ndarray[source]
Predict the labels.
- Parameters:
x_test (ndarray of shape (m_test, n_x)) – Grid data points (describing variables) as a 2D array containing data with float type.
- Returns:
labels_pred – Predicted labels (0 - inlier, 1 - outlier) as an array containing data with int type.
- Return type:
ndarray of shape (m_test,)
Examples
>>> # Prepare data >>> m_train = 100 >>> n_x, n_y = 1, 1 >>> m_test = 10 >>> x_train = np.random.normal(0, 1, size=(m_train, n_x)) >>> y_train = np.random.normal(0, 1, size=(m_train, n_y)) >>> y_star = np.array([0.0] * n_y) >>> x_test = np.random.uniform(-3, 3, size=(m_test, n_x)) >>> # Fit the outliers detector >>> params = (x_train, y_train, y_star) >>> outliers_detector = CKDEOutliersDetection().fit(*params, r=0.1) >>> # Predict the labels >>> labels_pred = outliers_detector.predict(x_test) # labels_pred shape (10,)
CKDEClustering
- class kdelearn.ckde_tasks.CKDEClustering[source]
Clustering based on conditional kernel density estimation.
TODO: <READ MORE>
Examples
>>> # Prepare data for two clusters >>> m_train = 100 >>> n_x, n_y = 1, 1 >>> x_train1 = np.random.normal(0, 1, size=(m_train // 2, n_x)) >>> y_train1 = np.random.normal(0, 1, size=(m_train // 2, n_y)) >>> x_train2 = np.random.normal(3, 1, size=(m_train // 2, n_x)) >>> y_train2 = np.random.normal(0, 1, size=(m_train // 2, n_y)) >>> x_train = np.concatenate((x_train1, x_train2)) >>> y_train = np.concatenate((y_train1, y_train2)) >>> y_star = np.array([0.0] * n_y) >>> # Fit >>> clustering = CKDEClustering().fit(x_train, y_train, y_star)
Methods
fit(x_train, y_train, y_star[, ...])Fit the model.
predict(x_test[, algorithm, epsilon, delta])Predict cluster labels.
- fit(x_train: ndarray, y_train: ndarray, y_star: ndarray, weights_train: ndarray | None = None, bandwidth_x: ndarray | None = None, bandwidth_y: ndarray | None = None, bandwidth_method: str = 'direct_plugin', **kwargs)[source]
Fit the model.
- Parameters:
x_train (ndarray of shape (m_train, n_x)) – Data points (describing variables) as an array containing data with float type.
y_train (ndarray of shape (m_train, n_y)) – Data points (conditioning variables) as an array containing data with float type.
y_star (ndarray of shape (n_y,)) – Conditioned value.
weights_train (ndarray of shape (m_train,), optional) – Weights of data points. If None, all points are equally weighted.
bandwidth_x (ndarray of shape (n_x,), optional) – Smoothing parameter of describing variables.
bandwidth_y (ndarray of shape (n_y,), optional) – Smoothing parameter of conditioning variables.
bandwidth_method ({'normal_reference', 'direct_plugin', 'ste_plugin', 'ml_cv'}, default='normal_reference') – Name of bandwidth selection method used to compute smoothing parameter when bandwidth is not given explicitly.
- Returns:
self – Fitted self instance of CKDEClustering.
- Return type:
object
Examples
>>> # Prepare data for two clusters >>> m_train = 100 >>> n_x, n_y = 1, 1 >>> x_train1 = np.random.normal(0, 1, size=(m_train // 2, n_x)) >>> y_train1 = np.random.normal(0, 1, size=(m_train // 2, n_y)) >>> x_train2 = np.random.normal(3, 1, size=(m_train // 2, n_x)) >>> y_train2 = np.random.normal(0, 1, size=(m_train // 2, n_y)) >>> x_train = np.concatenate((x_train1, x_train2)) >>> y_train = np.concatenate((y_train1, y_train2)) >>> y_star = np.array([0.0] * n_y) >>> weights_train = np.random.uniform(0, 1, size=(m_train,)) >>> # Fit >>> clustering = CKDEClustering().fit(x_train, y_train, y_star, weights_train)
- predict(x_test: ndarray, algorithm: str = 'mean_shift', epsilon: float = 1e-08, delta: float = 0.001)[source]
Predict cluster labels.
- Parameters:
x_test (ndarray of shape (m_test, n_x)) – Grid data points (describing variables) as an array containing data with float type.
algorithm ({'gradient_ascent', 'mean_shift'}, default='mean_shift') – Name of clustering algorithm.
epsilon (float, default=1e-8) – Threshold for difference (euclidean distance) of data point position while shifting. When the difference is less than epsilon, data point is no longer shifted.
delta (float, default=1e-3) – Acceptance error (euclidean distance) between shifted data point and representative of cluster. If the error is less than delta, data point is assigned to cluster represented by cluster representative.
- Returns:
labels_pred – Predicted labels as an array containing data with int type.
- Return type:
ndarray of shape (m_train,)
Examples
>>> # Prepare data for two clusters >>> m_train = 100 >>> n_x, n_y = 1, 1 >>> x_train1 = np.random.normal(0, 1, size=(m_train // 2, n_x)) >>> y_train1 = np.random.normal(0, 1, size=(m_train // 2, n_y)) >>> x_train2 = np.random.normal(3, 1, size=(m_train // 2, n_x)) >>> y_train2 = np.random.normal(0, 1, size=(m_train // 2, n_y)) >>> x_train = np.concatenate((x_train1, x_train2)) >>> y_train = np.concatenate((y_train1, y_train2)) >>> y_star = np.array([0.0] * n_y) >>> # Fit >>> clustering = CKDEClustering().fit(x_train, y_train, y_star) >>> labels_pred = clustering.predict()