Metrics
- kdelearn.metrics.accuracy_loo(x_train: ndarray, labels_train: ndarray, model, **kwargs) float[source]
Leave-one-out accuracy - ratio of correctly classified data points based on leave-one-out approach.
- Parameters
x_train (ndarray of shape (m_train, n)) – Data points as an array containing data with float type.
labels_train (ndarray of shape (m_train,)) – Labels of data points as an array containing data with int type.
model – Classifier with defined fit and predict methods.
- Returns
accuracy – Leave-one-out accuracy.
- Return type
float
Examples
>>> # Prepare data for two classes >>> x_train1 = np.random.normal(0, 1, size=(100 // 2, 1)) >>> labels_train1 = np.full(100 // 2, 1) >>> x_train2 = np.random.normal(3, 1, size=(100 // 2, 1)) >>> labels_train2 = np.full(100 // 2, 2) >>> x_train = np.concatenate((x_train1, x_train2)) >>> labels_train = np.concatenate((labels_train1, labels_train2)) >>> # Classify and compute accuracy >>> model = KDEClassification() >>> accuracy = accuracy_loo(x_train, labels_train, model)
- kdelearn.metrics.pi_kf(x_train: ndarray, x_test: ndarray, labels_test: ndarray, weights_train: Optional[ndarray] = None) float[source]
Performance index for outliers detection.
- Parameters
x_train (ndarray of shape (m_train, n_x)) – Data points as an array containing data with float type.
x_test (ndarray of shape (m_test, n_x)) – Data points as an array containing data with float type.
labels_test (ndarray of shape (m_test,)) – Labels (0 - inlier, 1 - outlier) of data points as an array containing data with int type.
weights_train (ndarray of shape (m_train,), optional) – Weights of data points. If None, all points are equally weighted.
- Returns
pi – Performance index.
- Return type
float
Examples
>>> x_train = np.array([[-0.1], [0.0], [0.1], [1.1]]) >>> labels_train = np.array([0, 0, 0, 1]) >>> pi = pi_kf(x_train, labels_train)
- kdelearn.metrics.density_silhouette(x_test: ndarray, labels_test: ndarray, weights_test: Optional[ndarray] = None, weights2: Optional[ndarray] = None, kernel_name: str = 'gaussian', share_bandwidth: bool = False) Tuple[ndarray, float][source]
Density based silhouette.
- Parameters
x_test (ndarray of shape (m_test, n)) – Data points as an array containing data with float type.
labels_test (ndarray of shape (m_test,)) – Labels of data points as an array containing data with int type.
weights_test (ndarray of shape (m_test,), default=None) – Weights of data points. If None, all points are equally weighted.
weights2 (ndarray of shape (m_test,), default=None) – Weights of data points. If None, all points are equally weighted.
kernel_name ({'gaussian', 'uniform', 'epanechnikov', 'cauchy'}, default='gaussian') – Name of kernel function.
share_bandwidth (bool, default=False) – Determines whether all clusters should have common bandwidth. If False, estimator of each cluster gets its own bandwidth.
- Returns
dbs (ndarray of shape (m_train,)) – Density based silhouette scores of all data points.
dbs_mean (float) – Mean density based silhouette score.
Examples
>>> x_train = np.array([[-0.1], [0.0], [0.1], [2.9], [3.0], [3.1]]) >>> labels_train = np.array([0, 0, 0, 1, 1 ,1]) >>> dbs, dbs_mean = density_silhouette(x_train, labels_train)
References
[1] Menardi, G. Density-based Silhouette diagnostics for clustering methods. Springer, 2010.