Curation module
Note
As of July 2024, this module is still under construction and quite experimental. The API of some of the functions could be changed/improved from time to time.
Curation with the SortingAnalyzer
The SortingAnalyzer
, as seen in previous modules,
is a powerful tool to posprocess the spike sorting output, as it can compute many
extensions and metrics to further characterize the spike sorting results.
To facilitate the spike sorting workflow, the SortingAnalyzer
can also be used to perform curation tasks, such as removing bad units or merging similar units.
Here’s an example of how to use the SortingAnalyzer
to remove
a subset of units from a spike sorting output and to perform some merges:
from spikeinterface import create_sorting_analyzer
sorting_analyzer = create_sorting_analyzer(sorting=sorting, recording=recording)
# compute some extensions
sorting_analyzer.compute(["random_spikes", "templates", "template_similarity", "correlograms"])
# remove some units
remove_unit_ids = [1, 2]
sorting_analyzer2 = sorting_analyzer.remove_units(remove_unit_ids=remove_unit_ids)
# merge some units
merge_unit_groups = [[4, 5], [7, 8, 12]]
sorting_analyzer3 = sorting_analyzer2.merge_units(
merge_unit_groups=merge_unit_groups,
censored_period_ms=0.5,
merging_mode="soft"
)
Importantly, all the extensions that were computed on the original SortingAnalyzer
(sorting_analyzer
) are automatically propagated to the returned new
SortingAnalyzer
objects (sorting_analyzer2
, sorting_analyzer3
).
In particular, the merging steps supports a few interesting and useful functions.
If censored_period_ms
is set, the function will remove spikes that are too close in time after the merge
(in the case above, closer than 0.5 ms).
The merging_mode
parameter can be set to "soft"
(default) or "hard"
. The "soft"
mode will
try to smartly combine the existing extension data (e.g. templates, template similarity, etc.)
to estimate the merged units’ data, when possible. This is the fastest mode, but it can be less accurate.
The "hard"
mode will simply merge the spike trains of the units and recompute the extensions on the
merged spike train. This is more accurate but slower, especially for the extensions that need to traverse the
raw data (e.g., spike amplitudes, spike locations, amplitude scalings, etc.).
Automatic curation tools
The spikeinterface.curation
module provides several automatic curation tools to clean spike sorting outputs.
Many of them are ported, adapted, or inspired by Lussac
([Llobet]).
Remove duplicated spikes and redundant units
There are some convenient functions of the curation module allows you to remove redundant units and duplicated spikes from the sorting output.
The remove_duplicated_spikes()
function removes
duplicated spikes from the sorting output. Duplicated spikes are spikes that are
occur within a certain time window for the same unit.
from spikeinterface.curation import remove_duplicated_spikes
# remove duplicated spikes from BaseSorting object
clean_sorting = remove_duplicated_spikes(sorting, censored_period_ms=0.1)
The censored_period_ms
parameter is the time window in milliseconds to consider two spikes as duplicated.
The remove_redundand_units()
function removes
redundant units from the sorting output. Redundant units are units that share over
a certain percentage of spikes, by default 80%.
The function can act both on a BaseSorting
or a SortingAnalyzer
object.
from spikeinterface.curation import remove_redundant_units
# remove redundant units from BaseSorting object
clean_sorting = remove_redundant_units(
sorting,
duplicate_threshold=0.9,
remove_strategy="max_spikes"
)
# remove redundant units from SortingAnalyzer object
# note this returns a cleaned sorting
clean_sorting = remove_redundant_units(
sorting_analyzer,
duplicate_threshold=0.9,
remove_strategy="min_shift"
)
# in order to have a SortingAnalyer with only the non-redundant units one must
# select the designed units remembering to give format and folder if one wants
# a persistent SortingAnalyzer.
clean_sorting_analyzer = sorting_analyzer.select_units(clean_sorting.unit_ids)
We recommend using the SortingAnalyzer
approach, since the min_shift
strategy keeps
the unit (among the redundant ones), with a better template alignment.
Auto-merging units
The compute_merge_unit_groups()
function returns a list of potential merges.
The list of potential merges can be then applied to the sorting output.
compute_merge_unit_groups()
has many internal tricks and steps to identify potential
merges. It offers multiple “presets” and the flexibility to apply individual steps, with different parameters.
Read the function documentation carefully and do not apply it blindly!
from spikeinterface import create_sorting_analyzer
from spikeinterface.curation import compute_merge_unit_groups
analyzer = create_sorting_analyzer(sorting=sorting, recording=recording)
# some extensions are required
analyzer.compute(["random_spikes", "templates", "template_similarity", "correlograms"])
# merges is a list of unit pairs, with unit_ids to be merged.
merge_unit_pairs = compute_merge_unit_groups(
analyzer=analyzer,
preset="similarity_correlograms",
)
# with resolve_graph=True, merges_resolved is a list of merge groups,
# which can contain more than two units
merge_unit_groups = compute_merge_unit_groups(
analyzer=analyzer,
preset="similarity_correlograms",
resolve_graph=True
)
# here we apply the merges
analyzer_merged = analyzer.merge_units(merge_unit_groups=merge_unit_groups)
There is also the convenient auto_merge_units()
function that combines the
compute_merge_unit_groups()
and merge_units()
functions.
This is a high level function that allows you to apply either one or several presets/lists of steps in one go. For example, let’s
assume you want to apply the “x_contamination” preset, but iteratively and with slightly different parameters: first,
you want to focus on the templates that are very similar, according to their template similarities, before
considering those that might be more distant. Such a greedy and iterative scheme has been proved to be less
prone to wrong merges. To do so, you’ll need to do the following:
from spikeinterface import create_sorting_analyzer
from spikeinterface.curation import auto_merge_units
analyzer = create_sorting_analyzer(sorting=sorting, recording=recording)
# some extensions are required
analyzer.compute(["random_spikes", "templates", "template_similarity", "correlograms"])
analyzer.compute("unit_locations", method="monopolar_triangulation")
template_diff_thresh = [0.05, 0.15, 0.25]
presets = ["x_contaminations"] * len(template_diff_thresh)
steps_params = [
{"template_similarity": {"template_diff_thresh": i}}
for i in template_diff_thresh
]
analyzer_merged = auto_merge_units(
analyzer,
presets=presets,
steps_params=steps_params,
recursive=True,
**job_kwargs,
)
The extra keyword recursive
specifies that for each presets/sequences of steps, merges are performed
until no further merges are possible. The job_kwargs
are the parameters for the parallelization.
Be careful that the merges can not be reverted, so be sure to not erase your analyzer and create a new variable
Manual curation
While automatic curation tools can be very useful, manual curation is still widely used to clean spike sorting outputs and it is sometoimes necessary to have a human in the loop.
Curation format
SpikeInterface internally supports a JSON-based manual curation format. When manual curation is necessary, modifying a dataset in place is a bad practice. Instead, to ensure the reproducibility of the spike sorting pipelines, we have introduced a simple and JSON-based manual curation format. This format defines at the moment : merges + deletions + manual tags. The simple file can be kept along side the output of a sorter and applied on the result to have a “clean” result.
This format has two part:
definition with the folowing keys:
“format_version” : format specification
“unit_ids” : the list of unit_ds
- “label_definitions”list of label categories and possible labels per category.
Every category can be exclusive=True onely one label or exclusive=False several labels possible
manual output curation with the folowing keys:
“manual_labels”
“merge_unit_groups”
“removed_units”
Here is the description of the format with a simple example (the first part of the format is the definition; the second part of the format is manual action):
{
"format_version": "1",
"unit_ids": [
"u1",
"u2",
"u3",
"u6",
"u10",
"u14",
"u20",
"u31",
"u42"
],
"label_definitions": {
"quality": {
"label_options": [
"good",
"noise",
"MUA",
"artifact"
],
"exclusive": "true"
},
"putative_type": {
"label_options": [
"excitatory",
"inhibitory",
"pyramidal",
"mitral"
],
"exclusive": "false"
}
},
"manual_labels": [
{
"unit_id": "u1",
"quality": [
"good"
]
},
{
"unit_id": "u2",
"quality": [
"noise"
],
"putative_type": [
"excitatory",
"pyramidal"
]
},
{
"unit_id": "u3",
"putative_type": [
"inhibitory"
]
}
],
"merge_unit_groups": [
[
"u3",
"u6"
],
[
"u10",
"u14",
"u20"
]
],
"removed_units": [
"u31",
"u42"
]
}
The curation format can be loaded into a dictionary and directly applied to
a BaseSorting
or SortingAnalyzer
object using the apply_curation()
function.
from spikeinterface.curation import apply_curation
# load the curation JSON file
curation_json = "path/to/curation.json"
with open(curation_json, 'r') as f:
curation_dict = json.load(f)
# apply the curation to the sorting output
clean_sorting = apply_curation(sorting, curation_dict=curation_dict)
# apply the curation to the sorting analyzer
clean_sorting_analyzer = apply_curation(sorting_analyzer, curation_dict=curation_dict)
Using the SpikeInterface GUI
We support several tools to perform manual curation of spike sorting outputs.
The first one is the SpikeInterface-GUI, a QT-based GUI that allows you to visualize and curate the spike sorting output.

To launch the GUI, you can use the plot_sorting_summary()
function
and select the backend='spikeinterface_gui'
.
from spikeinterface import create_sorting_analyzer
from spikeinterface.curation import apply_sortingview_curation
from spikeinterface.widgets import plot_sorting_summary
sorting_analyzer = create_sorting_analyzer(sorting=sorting, recording=recording)
# some extensions are required
sorting_analyzer.compute([
"random_spikes",
"noise_levels",
"templates",
"template_similarity",
"unit_locations",
"spike_amplitudes",
"principal_components",
"correlograms"
]
)
sorting_analyzer.compute("quality_metrics", metric_names=["snr"])
# this will open the GUI in a different window
plot_sorting_summary(sorting_analyzer=sorting_analyzer, curation=True, backend='spikeinterface_gui')
Using the sortingview
web-app
Within the sortingview
widgets backend (see Install sortingview), the
plot_sorting_summary()
produces a powerful web-based GUI that enables manual curation
of the spike sorting output.

The manual curation (including merges and labels) can be applied to a SpikeInterface
BaseSorting
object:
from spikeinterface import create_sorting_analyzer
from spikeinterface.curation import apply_sortingview_curation
from spikeinterface.widgets import plot_sorting_summary
sorting_analyzer = create_sorting_analyzer(sorting=sorting, recording=recording)
# some extensions are required
sorting_analyzer.compute([
"random_spikes",
"templates",
"template_similarity",
"unit_locations",
"spike_amplitudes",
"correlograms"]
)
# This loads the data to the cloud for web-based plotting and sharing
# curation=True required for allowing curation in the sortingview gui
plot_sorting_summary(sorting_analyzer=sorting_analyzer, curation=True, backend='sortingview')
# we open the printed link URL in a browser
# - make manual merges and labeling
# - from the curation box, click on "Save as snapshot (sha1://)"
# copy the uri
sha_uri = "sha1://59feb326204cf61356f1a2eb31f04d8e0177c4f1"
clean_sorting = apply_sortingview_curation(sorting=sorting_analyzer.sorting, uri_or_json=sha_uri)
Note that you can also “Export as JSON” and pass the json file as uri_or_json
parameter.
The curation JSON file can be also pushed to a user-defined GitHub repository (“Save to GitHub as…”)
Other curation tools
We have other tools for cleaning spike sorting outputs:
find_duplicated_spikes()
: find duplicated spikes in the spike trainsremove_excess_spikes()
: remove spikes whose times are greater than therecording’s number of samples (by segment)
The CurationSorting class (deprecated)
SpikeInterface offers machinery to manually curate a sorting output and keep track of the curation history. The curation has several “steps” that can be repeated and chained:
remove/select units
split units
merge units
This functionality is done with CurationSorting
class.
Internally, this class keeps the history of curation as a graph.
The merging and splitting operations are handled by the MergeUnitsSorting
and
SplitUnitSorting
. These two classes can also be used independently.
from spikeinterface.curation import CurationSorting
sorting = run_sorter(sorter_name='kilosort2', recording=recording)
cs = CurationSorting(parent_sorting=sorting)
# make a first merge
cs.merge(units_to_merge=['#1', '#5', '#15'])
# make a second merge
cs.merge(units_to_merge=['#11', '#21'])
# make a split
split_index = ... # some criteria on spikes
cs.split(split_unit_id='#20', indices_list=split_index)
# here is the final clean sorting
clean_sorting = cs.sorting