Nearest Neighbor Metrics (`nn_hit_rate`, `nn_miss_rate`, `nn_isolation`, `nn_noise_overlap`)¶

Calculation¶

There are several implementations of nearest neighbor metrics which can be used to evaluate unit quality.

When calling the compute_quality_metrics() function, the following options are available to calculate NN metrics:

The nearest_neighbor option will return nn_hit_rate and nn_miss_rate (based on [Siegle] inspired by [Chung]).
The nn_isolation option will return the nearest neighbor isolation metric (adapted from [Chung]).
The nn_noise_overlap option will return the nearest neighbor isolation metric (adapted from [Chung]).

All options involve non-parametric calculations in PCA space.

`nearest_neighbor`¶

The membership function, \(\rho\) is defined such that for any spike \(g_i\) in some cluster \(G\), \(\rho(g_i) = G\). Additionally, the nearest neighbor function \(n_k(g_i)\) is defined such that the output of the function is the set of \(k\) spikes which are closest to \(g_i\).

For a unit associated with cluster \(C\), a subset of spikes are randomly drawn to form the cluster \(A\). A subset of spikes which are not in \(C\) are drawn to form the cluster \(B\). Note that \(|A| = |B|\). The NN-hit rate for \(C\) is then:

\[NN_{\textrm{hit}}(C) = \frac{1}{k} \sum_{i=1}^{k} \frac{ | \{x \in A : \rho(n_i(x)) = A \} |}{ | A | }\]

Similarly, the NN-miss rate for \(C\) is:

\[NN_{\textrm{miss}}(C) = \frac{1}{k} \sum_{i=1}^{k} \frac{ | \{x \in B : \rho(n_i(x)) = A \} |}{ | B | }\]

NN-hit rate gives an estimate of contamination (an uncontaminated unit should have a high NN-hit rate). NN-miss rate gives an estimate of completeness. A more complete unit should have a low NN-miss rate.

`nn_isolation`¶

The overall logic of this approach is to choose a cluster for which the isolation is to be computed, and compute the pairwise isolation score between the chosen cluster and every other cluster. The isolation score is then the minimum of the pairwise scores (the worst case).

Let A and B be two clusters from sorting. We set \(|A| = |B|\) by subsampling as appropriate to match the size of the smaller cluster (or the max_spikes_for_nn parameter value, if using). We also restrict the waveforms to channels with significant signal.

The pairwise isolation between clusters A and B is then:

\[NN_{\textrm{isolation}}(A, B) = \frac{1}{k} \sum_{i=1}^{k} \frac{ | \{x \in A \cup B : \rho(n_i(x)) = \rho(x) \} |}{ | A \cup B | }\]

Note that nn_isolation is affected by the size of the clusters, so setting the max_spikes_for_nn may aid downstream comparison of scores.

`nn_noise_overlap`¶

A noise cluster is generated by randomly sampling voltage snippets from the recording. Following a similar procedure to that of the nn_isolation method, compute isolation between the cluster of interest and the generated noise cluster. noise overlap is then \(1 - NN_{\textrm{isolation}}\).

This metric gives an indication of the contamination present in the unit cluster.

References¶

spikeinterface.qualitymetrics.pca_metrics.nearest_neighbors_metrics(all_pcs, all_labels, this_unit_id, max_spikes, n_neighbors)¶

Calculates unit contamination based on NearestNeighbors search in PCA space.

Parameters

all_pcs2d array: The PCs for all spikes, organized as [num_spikes, PCs].
all_labels1d array: The cluster labels for all spikes. Must have length of number of spikes.
this_unit_idint: The ID for the unit to calculate these metrics for.
max_spikesint: The number of spikes to use, per cluster. Note that the calculation can be very slow when this number is >20000.
n_neighborsint: The number of neighbors to use.

Returns

hit_ratefloat: Fraction of neighbors for target cluster that are also in target cluster.
miss_ratefloat: Fraction of neighbors outside target cluster that are in target cluster.

Notes

A is a (hopefully) representative subset of cluster X

\[NN_hit(X) = 1/k \sum_i=1^k |{{x in A such that ith closest neighbor is in X}}| / \|A\|\]

References

Based on metrics described in [Chung]

spikeinterface.qualitymetrics.pca_metrics.nearest_neighbors_isolation(sorting_analyzer, this_unit_id: int | str, n_spikes_all_units: Optional[dict] = None, fr_all_units: Optional[dict] = None, max_spikes: int = 1000, min_spikes: int = 10, min_fr: float = 0.0, n_neighbors: int = 5, n_components: int = 10, radius_um: float = 100, peak_sign: str = 'neg', min_spatial_overlap: float = 0.5, seed=None)¶

Calculates unit isolation based on NearestNeighbors search in PCA space.

Parameters

sorting_analyzer: SortingAnalyzer: A SortingAnalyzer object
this_unit_idint | str: The ID for the unit to calculate these metrics for.
n_spikes_all_units: dict, default: None: Dictionary of the form {<unit_id>: <n_spikes>} for the waveform extractor. Recomputed if None.
fr_all_units: dict, default: None: Dictionary of the form {<unit_id>: <firing_rate>} for the waveform extractor. Recomputed if None.
max_spikesint, default: 1000: Max number of spikes to use per unit.
min_spikesint, default: 10: Min number of spikes a unit must have to go through with metric computation. Units with spikes < min_spikes gets numpy.NaN as the quality metric, and are ignored when selecting other units’ neighbors.
min_frfloat, default: 0.0: Min firing rate a unit must have to go through with metric computation. Units with firing rate < min_fr gets numpy.NaN as the quality metric, and are ignored when selecting other units’ neighbors.
n_neighborsint, default: 5: Number of neighbors to check membership of.
n_componentsint, default: 10: The number of PC components to use to project the snippets to.
radius_umfloat, default: 100: The radius, in um, that channels need to be within the peak channel to be included.
peak_sign: “neg” | “pos” | “both”, default: “neg”: The peak_sign used to compute sparsity and neighbor units. Used if sorting_analyzer is not sparse already.
min_spatial_overlapfloat, default: 100: In case sorting_analyzer is sparse, other units are selected if they share at least min_spatial_overlap times n_target_unit_channels with the target unit
seedint, default: None: Seed for random subsampling of spikes.

Returns

nn_isolationfloat: The calculation nearest neighbor isolation metric for this_unit_id. If the unit has fewer than min_spikes, returns numpy.NaN instead.
nn_unit_idnp.int16: Id of the “nearest neighbor” unit (unit with lowest isolation score from this_unit_id)

Notes

The overall logic of this approach is:

Choose a cluster
Compute the isolation score with every other cluster
Isolation score is defined as the min of 2. (i.e. ‘worst-case measure’)

The implementation of this approach is:

Let A and B be two clusters from sorting.

We set |A| = |B|:

If max_spikes < |A| and max_spikes < |B|:

Then randomly subsample max_spikes samples from A and B.

If max_spikes > min(|A|, |B|) (e.g. |A| > max_spikes > |B|):

Then randomly subsample min(|A|, |B|) samples from A and B.

This is because the metric is affected by the size of the clusters being compared independently of how well-isolated they are.

We also restrict the waveforms to channels with significant signal.

See docstring for _compute_isolation for the definition of isolation score.

References

Based on isolation metric described in [Chung]

spikeinterface.qualitymetrics.pca_metrics.nearest_neighbors_noise_overlap(sorting_analyzer, this_unit_id: int | str, n_spikes_all_units: Optional[dict] = None, fr_all_units: Optional[dict] = None, max_spikes: int = 1000, min_spikes: int = 10, min_fr: float = 0.0, n_neighbors: int = 5, n_components: int = 10, radius_um: float = 100, peak_sign: str = 'neg', seed=None)¶

Calculates unit noise overlap based on NearestNeighbors search in PCA space.

Parameters

sorting_analyzer: SortingAnalyzer: A SortingAnalyzer object
this_unit_idint | str: The ID of the unit to calculate this metric on.
n_spikes_all_units: dict, default: None: Dictionary of the form {<unit_id>: <n_spikes>} for the waveform extractor. Recomputed if None.
fr_all_units: dict, default: None: Dictionary of the form {<unit_id>: <firing_rate>} for the waveform extractor. Recomputed if None.
max_spikesint, default: 1000: The max number of spikes to use per cluster.
min_spikesint, default: 10: Min number of spikes a unit must have to go through with metric computation. Units with spikes < min_spikes gets numpy.NaN as the quality metric.
min_frfloat, default: 0.0: Min firing rate a unit must have to go through with metric computation. Units with firing rate < min_fr gets numpy.NaN as the quality metric.
n_neighborsint, default: 5: The number of neighbors to check membership.
n_componentsint, default: 10: The number of PC components to use to project the snippets to.
radius_umfloat, default: 100: The radius, in um, that channels need to be within the peak channel to be included.
peak_sign: “neg” | “pos” | “both”, default: “neg”: The peak_sign used to compute sparsity and neighbor units. Used if sorting_analyzer is not sparse already.
seedint, default: 0: Random seed for subsampling spikes.

Returns

nn_noise_overlapfloat: The computed nearest neighbor noise estimate. If the unit has fewer than min_spikes, returns numpy.NaN instead.

Notes

The general logic of this measure is:

Generate a noise cluster by randomly sampling voltage snippets from recording.
Subtract projection onto the weighted average of noise snippets of both the target and noise clusters to correct for bias in sampling.
Compute the isolation score between the noise cluster and the target cluster.

As with nn_isolation, the clusters that are compared (target and noise clusters) have the same number of spikes.

See docstring for _compute_isolation for the definition of isolation score.

References

Based on noise overlap metric described in [Chung]

Literature¶

Introduced by [Chung] and adapted by [Siegle] and Kyu Hyun Lee.

Nearest Neighbor Metrics (nn_hit_rate, nn_miss_rate, nn_isolation, nn_noise_overlap)¶

Calculation¶

nearest_neighbor¶

nn_isolation¶

nn_noise_overlap¶

References¶

Literature¶

Nearest Neighbor Metrics (`nn_hit_rate`, `nn_miss_rate`, `nn_isolation`, `nn_noise_overlap`)¶

`nearest_neighbor`¶

`nn_isolation`¶

`nn_noise_overlap`¶