Note
Go to the end to download the full example code.
Curation Tutorial¶
After spike sorting and computing quality metrics, you can automatically curate the spike sorting output using the quality metrics that you have calculated.
Import the modules and/or functions necessary from spikeinterface
import spikeinterface.core as si
import spikeinterface.extractors as se
from spikeinterface.postprocessing import compute_principal_components
from spikeinterface.qualitymetrics import compute_quality_metrics
Let’s download a simulated dataset from the repo ‘https://gin.g-node.org/NeuralEnsemble/ephy_testing_data’
Let’s imagine that the ground-truth sorting is in fact the output of a sorter.
local_path = si.download_dataset(remote_path="mearec/mearec_test_10s.h5")
recording, sorting = se.read_mearec(file_path=local_path)
print(recording)
print(sorting)
MEArecRecordingExtractor: 32 channels - 32.0kHz - 1 segments - 320,000 samples - 10.00s
float32 dtype - 39.06 MiB
file_path: /home/docs/spikeinterface_datasets/ephy_testing_data/mearec/mearec_test_10s.h5
MEArecSortingExtractor: 10 units - 1 segments - 32.0kHz
file_path: /home/docs/spikeinterface_datasets/ephy_testing_data/mearec/mearec_test_10s.h5
Create SortingAnalyzer¶
For this example, we will need a SortingAnalyzer and some extensions
to be computed first
analyzer = si.create_sorting_analyzer(sorting=sorting, recording=recording, format="memory")
analyzer.compute(["random_spikes", "waveforms", "templates", "noise_levels"])
analyzer.compute("principal_components", n_components=3, mode="by_channel_local")
print(analyzer)
/home/docs/checkouts/readthedocs.org/user_builds/spikeinterface/checkouts/0.101.1/src/spikeinterface/core/job_tools.py:103: UserWarning: `n_jobs` is not set so parallel processing is disabled! To speed up computations, it is recommended to set n_jobs either globally (with the `spikeinterface.set_global_job_kwargs()` function) or locally (with the `n_jobs` argument). Use `spikeinterface.set_global_job_kwargs?` for more information about job_kwargs.
warnings.warn(
estimate_sparsity: 0%| | 0/10 [00:00<?, ?it/s]
estimate_sparsity: 100%|##########| 10/10 [00:00<00:00, 1005.15it/s]
compute_waveforms: 0%| | 0/10 [00:00<?, ?it/s]
compute_waveforms: 100%|##########| 10/10 [00:00<00:00, 328.38it/s]
Fitting PCA: 0%| | 0/10 [00:00<?, ?it/s]
Fitting PCA: 30%|███ | 3/10 [00:00<00:00, 29.22it/s]
Fitting PCA: 60%|██████ | 6/10 [00:00<00:00, 28.98it/s]
Fitting PCA: 90%|█████████ | 9/10 [00:00<00:00, 24.78it/s]
Fitting PCA: 100%|██████████| 10/10 [00:00<00:00, 23.09it/s]
Projecting waveforms: 0%| | 0/10 [00:00<?, ?it/s]
Projecting waveforms: 100%|██████████| 10/10 [00:00<00:00, 204.09it/s]
SortingAnalyzer: 32 channels - 10 units - 1 segments - memory - sparse - has recording
Loaded 5 extensions: random_spikes, waveforms, templates, noise_levels, principal_components
Then we compute some quality metrics:
metrics = compute_quality_metrics(analyzer, metric_names=["snr", "isi_violation", "nearest_neighbor"])
print(metrics)
/home/docs/checkouts/readthedocs.org/user_builds/spikeinterface/checkouts/0.101.1/src/spikeinterface/core/job_tools.py:103: UserWarning: `n_jobs` is not set so parallel processing is disabled! To speed up computations, it is recommended to set n_jobs either globally (with the `spikeinterface.set_global_job_kwargs()` function) or locally (with the `n_jobs` argument). Use `spikeinterface.set_global_job_kwargs?` for more information about job_kwargs.
warnings.warn(
calculate pc_metrics: 0%| | 0/10 [00:00<?, ?it/s]
calculate pc_metrics: 50%|█████ | 5/10 [00:00<00:00, 49.74it/s]
calculate pc_metrics: 100%|██████████| 10/10 [00:00<00:00, 67.01it/s]
snr isi_violations_ratio ... nn_hit_rate nn_miss_rate
#0 24.06462 0 ... 1.0 0.001289
#1 25.655051 0 ... 0.99 0.000744
#2 13.9052 0 ... 0.976744 0.005831
#3 21.825417 0 ... 1.0 0.0
#4 7.465719 0 ... 0.989583 0.00101
#5 7.408548 0 ... 0.993243 0.002653
#6 20.917828 0 ... 0.995098 0.0
#7 7.396322 0 ... 0.986486 0.010753
#8 8.01203 0 ... 0.989744 0.001506
#9 9.053889 0 ... 0.996124 0.003348
[10 rows x 5 columns]
We can now threshold each quality metric and select units based on some rules.
The easiest and most intuitive way is to use boolean masking with a dataframe.
Then create a list of unit ids that we want to keep
keep_mask = (metrics["snr"] > 7.5) & (metrics["isi_violations_ratio"] < 0.2) & (metrics["nn_hit_rate"] > 0.90)
print(keep_mask)
keep_unit_ids = keep_mask[keep_mask].index.values
keep_unit_ids = [unit_id for unit_id in keep_unit_ids]
print(keep_unit_ids)
#0 True
#1 True
#2 True
#3 True
#4 False
#5 False
#6 True
#7 False
#8 True
#9 True
dtype: boolean
['#0', '#1', '#2', '#3', '#6', '#8', '#9']
And now let’s create a sorting that contains only curated units and save it.
curated_sorting = sorting.select_units(keep_unit_ids)
print(curated_sorting)
curated_sorting.save(folder="curated_sorting")
UnitsSelectionSorting: 7 units - 1 segments - 32.0kHz
We can also save the analyzer with only theses units
clean_analyzer = analyzer.select_units(unit_ids=keep_unit_ids, format="zarr", folder="clean_analyzer")
print(clean_analyzer)
Traceback (most recent call last):
File "/home/docs/checkouts/readthedocs.org/user_builds/spikeinterface/checkouts/0.101.1/examples/tutorials/qualitymetrics/plot_4_curation.py", line 78, in <module>
clean_analyzer = analyzer.select_units(unit_ids=keep_unit_ids, format="zarr", folder="clean_analyzer")
File "/home/docs/checkouts/readthedocs.org/user_builds/spikeinterface/checkouts/0.101.1/src/spikeinterface/core/sortinganalyzer.py", line 894, in select_units
return self._save_or_select_or_merge(format=format, folder=folder, unit_ids=unit_ids)
File "/home/docs/checkouts/readthedocs.org/user_builds/spikeinterface/checkouts/0.101.1/src/spikeinterface/core/sortinganalyzer.py", line 827, in _save_or_select_or_merge
new_sorting_analyzer.extensions[extension_name] = extension.copy(
File "/home/docs/checkouts/readthedocs.org/user_builds/spikeinterface/checkouts/0.101.1/src/spikeinterface/core/sortinganalyzer.py", line 2015, in copy
new_extension.save()
File "/home/docs/checkouts/readthedocs.org/user_builds/spikeinterface/checkouts/0.101.1/src/spikeinterface/core/sortinganalyzer.py", line 2060, in save
self._save_data(**kwargs)
File "/home/docs/checkouts/readthedocs.org/user_builds/spikeinterface/checkouts/0.101.1/src/spikeinterface/core/sortinganalyzer.py", line 2127, in _save_data
df_group.create_dataset(name="index", data=ext_data.index.to_numpy())
File "/home/docs/checkouts/readthedocs.org/user_builds/spikeinterface/conda/0.101.1/lib/python3.10/site-packages/zarr/hierarchy.py", line 1111, in create_dataset
return self._write_op(self._create_dataset_nosync, name, **kwargs)
File "/home/docs/checkouts/readthedocs.org/user_builds/spikeinterface/conda/0.101.1/lib/python3.10/site-packages/zarr/hierarchy.py", line 952, in _write_op
return f(*args, **kwargs)
File "/home/docs/checkouts/readthedocs.org/user_builds/spikeinterface/conda/0.101.1/lib/python3.10/site-packages/zarr/hierarchy.py", line 1126, in _create_dataset_nosync
a = array(data, store=self._store, path=path, chunk_store=self._chunk_store, **kwargs)
File "/home/docs/checkouts/readthedocs.org/user_builds/spikeinterface/conda/0.101.1/lib/python3.10/site-packages/zarr/creation.py", line 441, in array
z = create(**kwargs)
File "/home/docs/checkouts/readthedocs.org/user_builds/spikeinterface/conda/0.101.1/lib/python3.10/site-packages/zarr/creation.py", line 209, in create
init_array(
File "/home/docs/checkouts/readthedocs.org/user_builds/spikeinterface/conda/0.101.1/lib/python3.10/site-packages/zarr/storage.py", line 454, in init_array
_init_array_metadata(
File "/home/docs/checkouts/readthedocs.org/user_builds/spikeinterface/conda/0.101.1/lib/python3.10/site-packages/zarr/storage.py", line 583, in _init_array_metadata
raise ValueError("missing object_codec for object array")
ValueError: missing object_codec for object array
Total running time of the script: (0 minutes 1.528 seconds)