Metrics

kdelearn.metrics.accuracy_loo(x_train: ndarray, labels_train: ndarray, model, **kwargs) → float[source]

Leave-one-out accuracy - ratio of correctly classified data points based on leave-one-out approach.

Parameters:

x_train (ndarray of shape (m_train, n)) – Data points as an array containing data with float type.
labels_train (ndarray of shape (m_train,)) – Labels of data points as an array containing data with int type.
model – Classifier with defined fit and predict methods.

Returns:

accuracy – Leave-one-out accuracy.

Return type:

float

Examples

>>> # Prepare data for two classes
>>> x_train1 = np.random.normal(0, 1, size=(100 // 2, 1))
>>> labels_train1 = np.full(100 // 2, 1)
>>> x_train2 = np.random.normal(3, 1, size=(100 // 2, 1))
>>> labels_train2 = np.full(100 // 2, 2)
>>> x_train = np.concatenate((x_train1, x_train2))
>>> labels_train = np.concatenate((labels_train1, labels_train2))
>>> # Classify and compute accuracy
>>> model = KDEClassification()
>>> accuracy = accuracy_loo(x_train, labels_train, model)

kdelearn.metrics.pi_kf(x_train: ndarray, labels_pred: ndarray, weights_train: ndarray | None = None, bandwidth: ndarray | None = None) → float[source]

Performance index for outliers detection.

Parameters:

x_train (ndarray of shape (m_train, n_x)) – Data points as an array containing data with float type.
labels_pred (ndarray of shape (m_test,)) – Labels (0 - inlier, 1 - outlier) of data points as an array containing data with int type.
weights_train (ndarray of shape (m_train,), optional) – Weights of data points. If None, all points are equally weighted.
bandwidth (ndarray of shape (n,), optional) – Smoothing parameter for scaling the estimator.

Returns:

pi – Performance index.

Return type:

float

Examples

>>> x_train = np.array([[-0.1], [0.0], [0.1], [1.1]])
>>> labels_train = np.array([0, 0, 0, 1])
>>> pi = pi_kf(x_train, labels_train)

kdelearn.metrics.density_silhouette(x_train: ndarray, labels_train: ndarray, weights_train: ndarray | None = None, kernel_name: str = 'gaussian', share_bandwidth: bool = False) → Tuple[ndarray, float][source]

Density based silhouette.

Parameters:

x_train (ndarray of shape (m_train, n)) – Data points as an array containing data with float type.
labels_train (ndarray of shape (m_train,)) – Labels of data points as an array containing data with int type.
weights_train (ndarray of shape (m_train,), default=None) – Weights of data points. If None, all points are equally weighted.
kernel_name ({'gaussian', 'uniform', 'epanechnikov', 'cauchy'}, default='gaussian') – Name of kernel function.
share_bandwidth (bool, default=False) – Determines whether all clusters should have common bandwidth. If False, estimator of each cluster gets its own bandwidth.

Returns:

dbs (ndarray of shape (m_train,)) – Density based silhouette scores of all data points.
dbs_mean (float) – Mean density based silhouette score.

Examples

>>> x_train = np.array([[-0.1], [0.0], [0.1], [2.9], [3.0], [3.1]])
>>> labels_train = np.array([0, 0, 0, 1, 1 ,1])
>>> dbs, dbs_mean = density_silhouette(x_train, labels_train)

References

[1] Menardi, G. Density-based Silhouette diagnostics for clustering methods. Springer, 2010.