Silhouette score (silhouette, silhouette_full)

Calculation

Gives the ratio between the cohesiveness of a cluster and its separation from other clusters. Values for silhouette score range from -1 to 1.

For the full method as proposed by [Rousseeuw], the pairwise distances between each point and every other point \(a(i)\) in a cluster \(C_i\) are calculated and then iterating through every other cluster’s distances between the points in \(C_i\) and the points in \(C_j\) are calculated. The cluster with the minimal mean distance is taken to be \(b(i)\). The average value of \(s(i)\) is taken to give the final silhouette score for a cluster with the following equations:

\[ \begin{align}\begin{aligned}a(i) = \frac{1}{C_i-1} \sum_{j \in C_i, j \neq i} d(i,j)\\b(i) = \min {J \neq I} \frac{1}{C_j} \sum_{j \in C_j} d(i, j)\\s(i) = \frac{a(i)-b(i)}{\max(a(i), b(i))}\\Silhouette Score = \frac{1}{N} \sum^{N} s(i)\end{aligned}\end{align} \]

In order to improve computational complexity an alternative approach was proposed by [Hruschka] in which rather than pairwise point calculations, centroids of clusters are used. Thus \(a(i)\) is determined by distances from each point \(i\) to the centroid of \(C_I\) given as \(mu_{C_I}\), which means the calculations are simplified as follows:

\[ \begin{align}\begin{aligned}a(i) = d(i, \mu_{C_I})\\b(i) = \min {C_J \neq C_I} d(i, \mu_{C_J})\\s(i) = \frac{a(i)-b(i)}{\max(a(i), b(i))}\\Silhouette Score = \frac{1}{N} \sum^{N} s(i)\end{aligned}\end{align} \]

Expectation and use

A good clustering with well separated and compact clusters will have a silhouette score close to 1. A low silhouette score (close to -1) indicates a poorly isolated cluster (both type I and type II error). SpikeInterface provides access to both implementations of silhouette score.

To reduce complexity the default implementation in SpikeInterface is to use the simplified silhouette score. This can be changes by switching the silhouette method to either ‘full’ (the Rousseeuw implementation) or (‘simplified’, ‘full’) for both methods when entering the qm_params parameter.

Example code

import spikeinterface.qualitymetrics as sqm

simple_sil_score = sqm.simplified_silhouette_score(all_pcs=all_pcs, all_labels=all_labels, this_unit_id=0)

References

spikeinterface.qualitymetrics.pca_metrics.simplified_silhouette_score(all_pcs, all_labels, this_unit_id)

Calculates the simplified silhouette score for each cluster. The value ranges from -1 (bad clustering) to 1 (good clustering). The simplified silhoutte score utilizes the centroids for distance calculations rather than pairwise calculations.

Parameters
all_pcs2d array

The PCs for all spikes, organized as [num_spikes, PCs].

all_labels1d array

The cluster labels for all spikes. Must have length of number of spikes.

this_unit_idint

The ID for the unit to calculate this metric for.

Returns
unit_silhouette_scorefloat

Simplified Silhouette Score for this unit

References

Based on simplified silhouette score suggested by [Hruschka]

spikeinterface.qualitymetrics.pca_metrics.silhouette_score(all_pcs, all_labels, this_unit_id)

Calculates the silhouette score which is a marker of cluster quality ranging from -1 (bad clustering) to 1 (good clustering). Distances are all calculated as pairwise comparisons of all data points.

Parameters
all_pcs2d array

The PCs for all spikes, organized as [num_spikes, PCs].

all_labels1d array

The cluster labels for all spikes. Must have length of number of spikes.

this_unit_idint

The ID for the unit to calculate this metric for.

Returns
unit_silhouette_scorefloat

Silhouette Score for this unit

References

Based on [Rousseeuw]

Literature

Full method introduced by [Rousseeuw]. Simplified method introduced by [Hruschka].