Unconditional case
KDE
- class kdelearn.kde.KDE(kernel_name: str = 'gaussian')[source]
Kernel density estimator with product kernel:
\[\hat{f}(x) = \sum_{i=1}^m w_{i} \prod_{j=i}^n \frac{1}{h_j} K \left( \frac{x_{j} - x_{i, j}}{h_j} \right), \quad x \in \mathbb{R}^n\]Read more here.
- Parameters
kernel_name ({'gaussian', 'uniform', 'epanechnikov', 'cauchy'}, default='gaussian') – Name of kernel function.
Examples
>>> # Prepare data >>> m_train, n = 100, 1 >>> x_train = np.random.normal(0, 1, size=(m_train, n)) >>> # Fit >>> kde = KDE("gaussian").fit(x_train)
References
[1] Silverman, B. W. Density Estimation for Statistics and Data Analysis. Chapman and Hall, 1986.
[2] Wand, M. P., Jones M.C. Kernel Smoothing. Chapman and Hall, 1995.
Methods
fit(x_train[, weights_train, bandwidth, ...])Fit the estimator.
pdf(x_test)Compute probability density.
sample()- fit(x_train: ndarray, weights_train: Optional[ndarray] = None, bandwidth: Optional[ndarray] = None, bandwidth_method: str = 'normal_reference', **kwargs)[source]
Fit the estimator.
- Parameters
x_train (ndarray of shape (m_train, n)) – Array containing data points with float type for constructing the estimator.
weights_train (ndarray of shape (m_train,), optional) – Weights of data points. If None, all data points are equally weighted.
bandwidth (ndarray of shape (n,), optional) – Smoothing parameter for scaling the estimator. If None, bandwidth_method is used to compute the bandwidth.
bandwidth_method ({'normal_reference', 'direct_plugin'}, default='normal_reference') – Name of bandwidth selection method used to compute bandwidth when it is not given explicitly.
- Returns
self – Fitted self instance of KDE.
- Return type
object
Examples
>>> # Prepare data >>> m_train, n = 100, 1 >>> x_train = np.random.normal(0, 1, size=(m_train, n)) >>> weights_train = np.full((m_train,), 1 / m_train) >>> bandwidth = np.full((n,), 1.0) >>> # Fit the estimator >>> kde = KDE().fit(x_train, weights_train, bandwidth)
- pdf(x_test: ndarray) ndarray[source]
Compute probability density.
- Parameters
x_test (ndarray of shape (m_test, n)) – Argument of the estimator - array containing data points with float type.
- Returns
scores – Computed estimation of probability densities for testing data points x_test.
- Return type
ndarray of shape (m_test,)
Examples
>>> # Prepare data >>> m_train, n = 100, 1 >>> m_test = 10 >>> x_train = np.random.normal(0, 1, (m_train, n)) >>> x_test = np.linspace(-3, 3, 10).reshape(-1, 1) >>> # Fit the estimator >>> kde = KDE().fit(x_train) >>> # Compute pdf >>> scores = kde.pdf(x_test) # shape of scores: (10,)
KDEClassification
- class kdelearn.kde_tasks.KDEClassification(kernel_name: str = 'gaussian')[source]
Bayes’ classifier based on kernel density estimation.
Probability that \(x\) belongs to class \(c\):
\[P(C=c|X=x) \propto \pi_c \hat{f}_c(X=x)\]To predict class label for \(x\) we need to take class \(c\) with the highest probability:
\[\underset{c}{\mathrm{argmax}} \quad P(C=c|X=x)\]Read more here.
- Parameters
kernel_name ({'gaussian', 'uniform', 'epanechnikov', 'cauchy'}, default='gaussian') – Name of kernel function.
Examples
>>> # Prepare data for two classes >>> m_train, n = 100, 1 >>> x_train1 = np.random.normal(0, 1, size=(m_train // 2, n)) >>> labels_train1 = np.full(m_train // 2, 1) >>> x_train2 = np.random.normal(3, 1, size=(m_train // 2, n)) >>> labels_train2 = np.full(m_train // 2, 2) >>> x_train = np.concatenate((x_train1, x_train2)) >>> labels_train = np.concatenate((labels_train1, labels_train2)) >>> # Fit >>> classifier = KDEClassification("gaussian").fit(x_train, labels_train)
References
[1] Silverman, B. W. Density Estimation for Statistics and Data Analysis. Chapman and Hall, 1986.
Methods
fit(x_train, labels_train[, weights_train, ...])Fit the classifier.
pdfs(x_test)Compute pdf of each class.
predict(x_test)Predict class labels.
- fit(x_train: ndarray, labels_train: ndarray, weights_train: Optional[ndarray] = None, bandwidths: Optional[ndarray] = None, bandwidth_method: str = 'normal_reference', share_bandwidth: bool = False, prior_prob: Optional[ndarray] = None, **kwargs)[source]
Fit the classifier.
- Parameters
x_train (ndarray of shape (m_train, n)) – Array containing data points with float type for constructing the classifier.
labels_train (ndarray of shape (m_train,)) – Class labels of x_train containing data with int type.
weights_train (ndarray of shape (m_train,), default=None) – Weights for data points. If None, all data points are equally weighted.
bandwidths (ndarray of shape (n_classes, n), optional) – Smoothing parameters for scaling the estimators of each class. If None, bandwidth_method is used to compute the bandwidth.
bandwidth_method ({'normal_reference', 'direct_plugin'}, default='normal_reference') – Name of bandwidth selection method used to compute bandwidths when it is not given explicitly.
share_bandwidth (bool, default=False) – Determines whether all classes should have common bandwidth. If False, estimator of each class gets its own bandwidth.
prior_prob (ndarray of shape (n_classes,), default=None) – Prior probabilities of each class. If None, all classes are equally probable.
- Returns
self – Fitted self instance of KDEClassification.
- Return type
object
Examples
>>> # Prepare data for two classes >>> m_train, n = 100, 1 >>> x_train1 = np.random.normal(0, 1, size=(m_train // 2, n)) >>> labels_train1 = np.full((m_train // 2,), 1) >>> x_train2 = np.random.normal(3, 1, size=(m_train // 2, n)) >>> labels_train2 = np.full((m_train // 2,), 2) >>> x_train = np.concatenate((x_train1, x_train2)) >>> labels_train = np.concatenate((labels_train1, labels_train2)) >>> weights_train = np.full((m_train,), 1 / m_train) >>> # Fit >>> prior_prob = np.array([0.3, 0.7]) >>> params = (x_train, labels_train, weights_train) >>> classifier = KDEClassification().fit(*params, prior_prob=prior_prob)
- predict(x_test: ndarray) ndarray[source]
Predict class labels.
- Parameters
x_test (ndarray of shape (m_test, n)) – Data points to classify - array containing data points with float type.
- Returns
labels_pred – Predicted class labels containing data with int type.
- Return type
ndarray of shape (m_test,)
Examples
>>> # Prepare data for two classes >>> m_train, n = 100, 1 >>> m_test = 10 >>> x_train1 = np.random.normal(0, 1, size=(m_train // 2, n)) >>> labels_train1 = np.full(m_train // 2, 1) >>> x_train2 = np.random.normal(3, 1, size=(m_train // 2, n)) >>> labels_train2 = np.full(m_train // 2, 2) >>> x_train = np.concatenate((x_train1, x_train2)) >>> labels_train = np.concatenate((labels_train1, labels_train2)) >>> # Fit the classifier >>> x_test = np.linspace(-3, 6, m_test).reshape(-1, 1) >>> classifier = KDEClassification().fit(x_train, labels_train) >>> # Predict labels >>> labels_pred = classifier.predict(x_test) # shape: (10,)
- pdfs(x_test: ndarray) ndarray[source]
Compute pdf of each class.
- Parameters
x_test (ndarray of shape (m_test, n)) – Argument of each class estimator - array containing data points with float type.
- Returns
scores – Predicted scores as an array containing data with float type.
- Return type
ndarray of shape (m_test, n_classes)
Examples
>>> # Prepare data for two classes >>> m_train, n = 100, 1 >>> m_test = 10 >>> x_train1 = np.random.normal(0, 1, size=(m_train // 2, n)) >>> labels_train1 = np.full(m_train // 2, 1) >>> x_train2 = np.random.normal(3, 1, size=(m_train // 2, n)) >>> labels_train2 = np.full(m_train // 2, 2) >>> x_train = np.concatenate((x_train1, x_train2)) >>> labels_train = np.concatenate((labels_train1, labels_train2)) >>> # Fit the classifier >>> x_test = np.linspace(-3, 6, m_test).reshape(-1, 1) >>> classifier = KDEClassification().fit(x_train, labels_train) >>> # Compute pdf of each class >>> scores = classifier.pdfs(x_test) # shape: (10, 2)
KDEOutliersDetection
- class kdelearn.kde_tasks.KDEOutliersDetection(kernel_name: str = 'gaussian')[source]
Outliers detectoion based on kernel density estimation.
Read more here.
- Parameters
kernel_name ({'gaussian', 'uniform', 'epanechnikov', 'cauchy'}, default='gaussian') – Name of kernel function.
Examples
>>> # Prepare data >>> m_train, n = 100, 1 >>> x_train = np.random.normal(0, 1, size=(m_train, n)) >>> # Fit the outliers detector >>> outliers_detector = KDEOutliersDetection("gaussian").fit(x_train)
Methods
fit(x_train[, weights_train, bandwidth, ...])Fit the outliers detector.
predict(x_test)Predict labels.
- fit(x_train: ndarray, weights_train: Optional[ndarray] = None, bandwidth: Optional[ndarray] = None, bandwidth_method: str = 'normal_reference', r: float = 0.1, **kwargs)[source]
Fit the outliers detector.
- Parameters
x_train (ndarray of shape (m_train, n)) – Array containing data points with float type for constructing the detector.
weights_train (ndarray of shape (m_train,), default=None) – Weights of data points. If None, all data points are equally weighted.
bandwidth (ndarray of shape (n,), optional) – Smoothing parameter for scaling the estimator. If None, bandwidth_method is used to compute the bandwidth.
bandwidth_method ({'normal_reference', 'direct_plugin'}, default='normal_reference') – Name of bandwidth selection method used to compute bandwidth when it is not given explicitly.
r (float, default=0.1) – Threshold separating outliers and inliers.
- Returns
self – Fitted self instance of KDEOutliersDetection.
- Return type
object
Examples
>>> # Prepare data >>> m_train, n = 100, 1 >>> x_train = np.random.normal(0, 1, size=(m_train, n)) >>> weights_train = np.full((m_train,), 1 / m_train) >>> # Fit the outliers detector >>> params = (x_train, weights_train) >>> outliers_detector = KDEOutliersDetection().fit(*params, r=0.1)
- predict(x_test: ndarray) ndarray[source]
Predict labels.
- Parameters
x_test (ndarray of shape (m_test, n)) – Argument of the detector - array containing data points with float type.
- Returns
labels_pred – Predicted labels (0 - inlier, 1 - outlier) containing data with int type.
- Return type
ndarray of shape (m_test,)
Examples
>>> # Prepare data >>> m_train, n = 100, 1 >>> m_test = 10 >>> x_train = np.random.normal(0, 1, size=(m_train, n)) >>> x_test = np.linspace(-3, 3, m_test).reshape(-1, 1) >>> # Fit the outliers detector >>> outliers_detector = KDEOutliersDetection().fit(x_train, r=0.1) >>> # Predict the labels >>> labels_pred = outliers_detector.predict(x_test) # shape: (10,)
KDEClustering
- class kdelearn.kde_tasks.KDEClustering[source]
Clustering based on kernel density estimation.
Read more here.
Examples
>>> # Prepare data for two clusters >>> m_train, n = 100, 1 >>> x_train1 = np.random.normal(0, 1, size=(m_train // 2, n)) >>> x_train2 = np.random.normal(3, 1, size=(m_train // 2, n)) >>> x_train = np.concatenate((x_train1, x_train2)) >>> # Fit >>> clustering = KDEClustering().fit(x_train)
Methods
fit(x_train[, weights_train, bandwidth, ...])Fit the model.
predict(x_test[, algorithm, epsilon, delta])Predict cluster labels.
- fit(x_train: ndarray, weights_train: Optional[ndarray] = None, bandwidth: Optional[ndarray] = None, bandwidth_method: str = 'normal_reference', **kwargs)[source]
Fit the model.
- Parameters
x_train (ndarray of shape (m_train, n)) – Array containing data points with float type for constructing the model.
weights_train (ndarray of shape (m_train,), optional) – Weights of data points. If None, all data points are equally weighted.
bandwidth (ndarray of shape (n,), optional) – Smoothing parameter for scaling the estimator. If None, bandwidth_method is used to compute the bandwidth.
bandwidth_method ({'normal_reference', 'direct_plugin'}, default='normal_reference') – Name of bandwidth selection method used to compute bandwidth when it is not given explicitly.
- Returns
self – Fitted self instance of KDEClustering.
- Return type
object
Examples
>>> # Prepare data for two clusters >>> m_train, n = 100, 1 >>> x_train1 = np.random.normal(0, 1, size=(m_train // 2, n)) >>> x_train2 = np.random.normal(3, 1, size=(m_train // 2, n)) >>> x_train = np.concatenate((x_train1, x_train2)) >>> weights_train = np.full((m_train,), 1 / m_train) >>> # Fit >>> clustering = KDEClustering().fit(x_train, weights_train)
- predict(x_test: ndarray, algorithm: str = 'mean_shift', epsilon: float = 1e-08, delta: float = 0.001)[source]
Predict cluster labels.
- Parameters
x_test (ndarray of shape (m_test, n)) – Data points to be grouped - array containing data points with float type.
algorithm ({'gradient_ascent', 'mean_shift'}, default='mean_shift') – Name of clustering algorithm.
epsilon (float, default=1e-8) – Threshold for difference (euclidean distance) of data point position while shifting. When the difference is less than epsilon, data point is no longer shifted.
delta (float, default=1e-3) – Acceptance error (euclidean distance) between shifted data point and representative of cluster. If the error is less than delta, data point is assigned to cluster represented by cluster representative.
- Returns
labels_pred – Predicted cluster labels containing data with int type.
- Return type
ndarray of shape (m_train,)
Examples
>>> # Prepare data for two clusters >>> m_train, n = 100, 1 >>> x_train1 = np.random.normal(0, 1, size=(m_train // 2, n)) >>> x_train2 = np.random.normal(3, 1, size=(m_train // 2, n)) >>> x_train = np.concatenate((x_train1, x_train2)) >>> # Fit >>> clustering = KDEClustering().fit(x_train) >>> labels_pred = clustering.predict(x_train)