Silhouette score (silhouette
, silhouette_full
)¶
Calculation¶
Gives the ratio between the cohesiveness of a cluster and its separation from other clusters. Values for silhouette score range from -1 to 1.
For the full method as proposed by [Rousseeuw], the pairwise distances between each point and every other point \(a(i)\) in a cluster \(C_i\) are calculated and then iterating through every other cluster’s distances between the points in \(C_i\) and the points in \(C_j\) are calculated. The cluster with the minimal mean distance is taken to be \(b(i)\). The average value of \(s(i)\) is taken to give the final silhouette score for a cluster with the following equations:
In order to improve computational complexity an alternative approach was proposed by [Hruschka] in which rather than pairwise point calculations, centroids of clusters are used. Thus \(a(i)\) is determined by distances from each point \(i\) to the centroid of \(C_I\) given as \(mu_{C_I}\), which means the calculations are simplified as follows:
Expectation and use¶
A good clustering with well separated and compact clusters will have a silhouette score close to 1. A low silhouette score (close to -1) indicates a poorly isolated cluster (both type I and type II error). SpikeInterface provides access to both implementations of silhouette score.
To reduce complexity the default implementation in SpikeInterface is to use the simplified silhouette score. This can be changes by switching the silhouette method to either ‘full’ (the Rousseeuw implementation) or (‘simplified’, ‘full’) for both methods when entering the qm_params parameter.
Example code¶
import spikeinterface.qualitymetrics as sqm
simple_sil_score = sqm.simplified_silhouette_score(all_pcs=all_pcs, all_labels=all_labels, this_unit_id=0)
References¶
- spikeinterface.qualitymetrics.pca_metrics.simplified_silhouette_score(all_pcs, all_labels, this_unit_id)¶
Calculates the simplified silhouette score for each cluster. The value ranges from -1 (bad clustering) to 1 (good clustering). The simplified silhoutte score utilizes the centroids for distance calculations rather than pairwise calculations.
- Parameters
- all_pcs2d array
The PCs for all spikes, organized as [num_spikes, PCs].
- all_labels1d array
The cluster labels for all spikes. Must have length of number of spikes.
- this_unit_idint
The ID for the unit to calculate this metric for.
- Returns
- unit_silhouette_scorefloat
Simplified Silhouette Score for this unit
References
Based on simplified silhouette score suggested by [Hruschka]
- spikeinterface.qualitymetrics.pca_metrics.silhouette_score(all_pcs, all_labels, this_unit_id)¶
Calculates the silhouette score which is a marker of cluster quality ranging from -1 (bad clustering) to 1 (good clustering). Distances are all calculated as pairwise comparisons of all data points.
- Parameters
- all_pcs2d array
The PCs for all spikes, organized as [num_spikes, PCs].
- all_labels1d array
The cluster labels for all spikes. Must have length of number of spikes.
- this_unit_idint
The ID for the unit to calculate this metric for.
- Returns
- unit_silhouette_scorefloat
Silhouette Score for this unit
References
Based on [Rousseeuw]
Literature¶
Full method introduced by [Rousseeuw]. Simplified method introduced by [Hruschka].