Metrics
- kdelearn.metrics.accuracy_loo(x_train: ndarray, labels_train: ndarray, model, **kwargs) float[source]
Leave-one-out accuracy - ratio of correctly classified data points based on leave-one-out approach.
- Parameters:
x_train (ndarray of shape (m_train, n)) – Data points as an array containing data with float type.
labels_train (ndarray of shape (m_train,)) – Labels of data points as an array containing data with int type.
model – Classifier with defined fit and predict methods.
- Returns:
accuracy – Leave-one-out accuracy.
- Return type:
float
Examples
>>> # Prepare data for two classes >>> x_train1 = np.random.normal(0, 1, size=(100 // 2, 1)) >>> labels_train1 = np.full(100 // 2, 1) >>> x_train2 = np.random.normal(3, 1, size=(100 // 2, 1)) >>> labels_train2 = np.full(100 // 2, 2) >>> x_train = np.concatenate((x_train1, x_train2)) >>> labels_train = np.concatenate((labels_train1, labels_train2)) >>> # Classify and compute accuracy >>> model = KDEClassification() >>> accuracy = accuracy_loo(x_train, labels_train, model)
- kdelearn.metrics.pi_kf(x_train: ndarray, labels_pred: ndarray, weights_train: ndarray | None = None, bandwidth: ndarray | None = None) float[source]
Performance index for outliers detection.
- Parameters:
x_train (ndarray of shape (m_train, n_x)) – Data points as an array containing data with float type.
labels_pred (ndarray of shape (m_test,)) – Labels (0 - inlier, 1 - outlier) of data points as an array containing data with int type.
weights_train (ndarray of shape (m_train,), optional) – Weights of data points. If None, all points are equally weighted.
bandwidth (ndarray of shape (n,), optional) – Smoothing parameter for scaling the estimator.
- Returns:
pi – Performance index.
- Return type:
float
Examples
>>> x_train = np.array([[-0.1], [0.0], [0.1], [1.1]]) >>> labels_train = np.array([0, 0, 0, 1]) >>> pi = pi_kf(x_train, labels_train)
- kdelearn.metrics.density_silhouette(x_train: ndarray, labels_train: ndarray, weights_train: ndarray | None = None, kernel_name: str = 'gaussian', share_bandwidth: bool = False) Tuple[ndarray, float][source]
Density based silhouette.
- Parameters:
x_train (ndarray of shape (m_train, n)) – Data points as an array containing data with float type.
labels_train (ndarray of shape (m_train,)) – Labels of data points as an array containing data with int type.
weights_train (ndarray of shape (m_train,), default=None) – Weights of data points. If None, all points are equally weighted.
kernel_name ({'gaussian', 'uniform', 'epanechnikov', 'cauchy'}, default='gaussian') – Name of kernel function.
share_bandwidth (bool, default=False) – Determines whether all clusters should have common bandwidth. If False, estimator of each cluster gets its own bandwidth.
- Returns:
dbs (ndarray of shape (m_train,)) – Density based silhouette scores of all data points.
dbs_mean (float) – Mean density based silhouette score.
Examples
>>> x_train = np.array([[-0.1], [0.0], [0.1], [2.9], [3.0], [3.1]]) >>> labels_train = np.array([0, 0, 0, 1, 1 ,1]) >>> dbs, dbs_mean = density_silhouette(x_train, labels_train)
References
[1] Menardi, G. Density-based Silhouette diagnostics for clustering methods. Springer, 2010.