Note
Go to the end to download the full example code.
Curation Tutorial¶
After spike sorting and computing quality metrics, you can automatically curate the spike sorting output using the quality metrics that you have calculated.
Import the modules and/or functions necessary from spikeinterface
import spikeinterface.core as si
import spikeinterface.extractors as se
from spikeinterface.postprocessing import compute_principal_components
from spikeinterface.qualitymetrics import compute_quality_metrics
Let’s download a simulated dataset from the repo ‘https://gin.g-node.org/NeuralEnsemble/ephy_testing_data’
Let’s imagine that the ground-truth sorting is in fact the output of a sorter.
local_path = si.download_dataset(remote_path='mearec/mearec_test_10s.h5')
recording, sorting = se.read_mearec(file_path=local_path)
print(recording)
print(sorting)
MEArecRecordingExtractor: 32 channels - 32.0kHz - 1 segments - 320,000 samples - 10.00s
float32 dtype - 39.06 MiB
file_path: /home/docs/spikeinterface_datasets/ephy_testing_data/mearec/mearec_test_10s.h5
MEArecSortingExtractor: 10 units - 1 segments - 32.0kHz
file_path: /home/docs/spikeinterface_datasets/ephy_testing_data/mearec/mearec_test_10s.h5
First, we extract waveforms (to be saved in the folder ‘wfs_mearec’) and compute their PC (principal component) scores:
we = si.extract_waveforms(recording=recording,
sorting=sorting,
folder='wfs_mearec',
ms_before=1,
ms_after=2.,
max_spikes_per_unit=500,
n_jobs=1,
chunk_size=30000)
print(we)
pc = compute_principal_components(we, load_if_exists=True, n_components=3, mode='by_channel_local')
extract waveforms shared_memory multi buffer: 0%| | 0/11 [00:00<?, ?it/s]
extract waveforms shared_memory multi buffer: 100%|##########| 11/11 [00:00<00:00, 633.41it/s]
extract waveforms memmap multi buffer: 0%| | 0/11 [00:00<?, ?it/s]
extract waveforms memmap multi buffer: 100%|##########| 11/11 [00:00<00:00, 157.02it/s]
WaveformExtractor: 32 channels - 10 units - 1 segments
before:32 after:64 n_per_units:500 - sparse
Fitting PCA: 0%| | 0/10 [00:00<?, ?it/s]
Fitting PCA: 30%|███ | 3/10 [00:00<00:00, 29.16it/s]
Fitting PCA: 60%|██████ | 6/10 [00:00<00:00, 28.75it/s]
Fitting PCA: 90%|█████████ | 9/10 [00:00<00:00, 24.71it/s]
Fitting PCA: 100%|██████████| 10/10 [00:00<00:00, 23.01it/s]
Projecting waveforms: 0%| | 0/10 [00:00<?, ?it/s]
Projecting waveforms: 100%|██████████| 10/10 [00:00<00:00, 206.40it/s]
Then we compute some quality metrics:
metrics = compute_quality_metrics(waveform_extractor=we, metric_names=['snr', 'isi_violation', 'nearest_neighbor'])
print(metrics)
Computing PCA metrics: 0%| | 0/10 [00:00<?, ?it/s]
Computing PCA metrics: 60%|██████ | 6/10 [00:00<00:00, 53.07it/s]
Computing PCA metrics: 100%|██████████| 10/10 [00:00<00:00, 66.35it/s]
isi_violations_ratio isi_violations_count ... nn_hit_rate nn_miss_rate
#0 0.0 0.0 ... 1.000000 0.001289
#1 0.0 0.0 ... 0.990000 0.000746
#2 0.0 0.0 ... 0.976744 0.005117
#3 0.0 0.0 ... 1.000000 0.000000
#4 0.0 0.0 ... 0.989583 0.001012
#5 0.0 0.0 ... 0.993243 0.002660
#6 0.0 0.0 ... 0.995098 0.000000
#7 0.0 0.0 ... 0.986364 0.010753
#8 0.0 0.0 ... 0.994845 0.001506
#9 0.0 0.0 ... 0.996124 0.003348
[10 rows x 5 columns]
We can now threshold each quality metric and select units based on some rules.
The easiest and most intuitive way is to use boolean masking with a dataframe.
Then create a list of unit ids that we want to keep
keep_mask = (metrics['snr'] > 7.5) & (metrics['isi_violations_ratio'] < 0.2) & (metrics['nn_hit_rate'] > 0.90)
print(keep_mask)
keep_unit_ids = keep_mask[keep_mask].index.values
keep_unit_ids = [unit_id for unit_id in keep_unit_ids]
print(keep_unit_ids)
#0 True
#1 True
#2 True
#3 True
#4 False
#5 False
#6 True
#7 False
#8 True
#9 True
dtype: bool
['#0', '#1', '#2', '#3', '#6', '#8', '#9']
And now let’s create a sorting that contains only curated units and save it, for example to an NPZ file.
curated_sorting = sorting.select_units(keep_unit_ids)
print(curated_sorting)
se.NpzSortingExtractor.write_sorting(sorting=curated_sorting, save_path='curated_sorting.npz')
UnitsSelectionSorting: 7 units - 1 segments - 32.0kHz
Total running time of the script: (0 minutes 1.254 seconds)