Note
Click here to download the full example code
Curation Tutorial¶
After spike sorting and computing quality metrics, you can automatically curate the spike sorting output using the quality metrics.
import spikeinterface as si
import spikeinterface.extractors as se
import spikeinterface.toolkit as st
import spikeinterface.sorters as ss
- First, let’s download a simulated dataset
from the repo ‘https://gin.g-node.org/NeuralEnsemble/ephy_testing_data’
Let’s imagine that the ground-truth sorting is in fact the output of a sorter.
local_path = si.download_dataset(remote_path='mearec/mearec_test_10s.h5')
recording = se.MEArecRecordingExtractor(local_path)
sorting = se.MEArecSortingExtractor(local_path)
print(recording)
print(sorting)
Out:
MEArecRecordingExtractor: 32 channels - 1 segments - 32.0kHz - 10.000s
file_path: /home/docs/spikeinterface_datasets/ephy_testing_data/mearec/mearec_test_10s.h5
MEArecSortingExtractor: 10 units - 1 segments - 32.0kHz
file_path: /home/docs/spikeinterface_datasets/ephy_testing_data/mearec/mearec_test_10s.h5
First, we extract waveforms and compute their PC scores:
folder = 'waveforms_mearec'
we = si.extract_waveforms(recording, sorting, folder,
load_if_exists=True,
ms_before=1, ms_after=2., max_spikes_per_unit=500,
n_jobs=1, chunk_size=30000)
print(we)
pc = st.compute_principal_components(we, load_if_exists=True,
n_components=3, mode='by_channel_local')
print(pc)
Out:
WaveformExtractor: 32 channels - 10 units - 1 segments
before:32 after64 n_per_units: 500
WaveformPrincipalComponent: 32 channels - 1 segments
mode:by_channel_local n_components:3
Then we compute some quality metrics:
metrics = st.compute_quality_metrics(we, waveform_principal_component=pc,
metric_names=['snr', 'isi_violation', 'nearest_neighbor'])
print(metrics)
Out:
snr isi_violations_rate ... nn_hit_rate nn_miss_rate
#0 23.323524 0.0 ... 0.993711 0.004335
#1 25.331213 0.0 ... 0.946667 0.001918
#2 12.077727 0.0 ... 0.906977 0.004274
#3 21.502911 0.0 ... 0.988889 0.000000
#4 6.599532 0.0 ... 0.979167 0.001435
#5 6.964448 0.0 ... 0.963964 0.001883
#6 20.011555 0.0 ... 0.980392 0.000480
#7 7.321117 0.0 ... 0.933333 0.009974
#8 6.627583 0.0 ... 0.965636 0.016334
#9 7.210963 0.0 ... 0.943152 0.010823
[10 rows x 5 columns]
We can now threshold each quality metric and select units based on some rules.
The easiest and most intuitive way is to use boolean masking with dataframe:
keep_mask = (metrics['snr'] > 7.5) & (metrics['isi_violations_rate'] < 0.05) & (metrics['nn_hit_rate'] > 0.90)
print(keep_mask)
keep_unit_ids = keep_mask[keep_mask].index.values
print(keep_unit_ids)
Out:
#0 True
#1 True
#2 True
#3 True
#4 False
#5 False
#6 True
#7 False
#8 False
#9 False
dtype: bool
['#0' '#1' '#2' '#3' '#6']
And now let’s create a sorting that contains only curated units and save it, for example to an NPZ file.
curated_sorting = sorting.select_units(keep_unit_ids)
print(curated_sorting)
se.NpzSortingExtractor.write_sorting(curated_sorting, 'curated_sorting.pnz')
Out:
UnitsSelectionSorting: 5 units - 1 segments - 32.0kHz
Total running time of the script: ( 0 minutes 2.099 seconds)