Core module

Overview

The spikeinterface.core module provides the basic classes and tools of the SpikeInterface ecosystem.

Several Base classes are implemented here and inherited throughout the SI code-base. The core classes are: BaseRecording (for raw data), BaseSorting (for spike-sorted data), and SortingAnalyzer (for postprocessing, quality metrics, and waveform extraction).

There are additional classes to allow to retrieve events (BaseEvent) and to handle unsorted waveform cutouts, or snippets, which are recorded by some acquisition systems (BaseSnippets).

All classes support:

metadata handling
data on-demand (lazy loading)
multiple segments, where each segment is a contiguous piece of data (recording, sorting, events).

Recording

The BaseRecording class serves as the basis of all Recording classes. It interfaces with the raw traces and has the following features:

retrieve raw and scaled traces from each segment
keep info about channel_ids VS channel indices
handle probe information
store channel properties
store object annotations
enable grouping, splitting, and slicing
handle time information

Here we assume recording is a BaseRecording object with 16 channels:

channel_ids = recording.channel_ids
num_channels = recording.get_num_channels()
sampling_frequency = recording.sampling_frequency

# get number of samples/duration
num_samples_segment = recording.get_num_samples(segment_index=0)
### NOTE ###
# 'segment_index' is required for multi-segment objects
num_total_samples = recording.get_total_samples()
total_duration = recording.get_total_duration()

# retrieve raw traces between frames 100 and 200
traces = recording.get_traces(start_frame=100, end_frame=200, segment_index=0)
# retrieve raw traces only for the first 4 channels
traces_slice = recording.get_traces(start_frame=100, end_frame=200, segment_index=0,
                                    channel_ids=channel_ids[:4])
# retrieve traces after scaling to uV
# (requires 'gain_to_uV' and 'offset_to_uV' properties)
traces_uV = recording.get_traces(start_frame=100, end_frame=200, segment_index=0,
                                 return_in_uV=True)
# set/get a new channel property (e.g. "quality")
recording.set_property(key="quality", values=["good"] * num_channels)
quality_values = recording.get_property("quality")
# get all available properties
property_keys = recording.get_property_keys()

# set/get an annotation
recording.annotate(date="Recording acquired today")
recording.get_annotation(key="date")

# get new recording with the first 10s of the traces
recording_slice_frames = recording.frame_slice(start_frame=0,
                                               end_frame=int(10*sampling_frequency))
# get new recording with the first 4 channels
recording_slice_chans = recording.select_channels(channel_ids=channel_ids[:4])
# remove last two channels
recording_rm_chans = recording.remove_channels(remove_channel_ids=channel_ids[-2:])

# set channel grouping (assume we have 4 groups of 4 channels, e.g. tetrodes)
groups = [0] * 4 + [1] * 4 + [2] * 4 + [3] * 4
recording.set_channel_groups(groups)
# split by property
recording_by_group = recording.split_by("group")
# 'recording_by_group' is a dict with group as keys (0,1,2,3) and channel
# sliced recordings as values

# set times (for synchronization) - assume our times start at 300 seconds
num_samples = recording.get_num_samples()
timestamps = np.arange(num_samples) / sampling_frequency + 300
recording.set_times(times=timestamps, segment_index=0)

Note

Raw data formats often store data as integer values for memory efficiency. To give these integers meaningful physical units (µV), you can apply a gain and an offset. Many devices have their own gains and offsets necessary to convert their data and these values are handled by SpikeInterface for its extractors. This is triggered by the return_in_uV parameter in get_traces(), (see above example), which will return the traces in µV. Read more in our how to guide, Work with physical units in SpikeInterface recordings.

Sorting

The BaseSorting class serves as the basis for all Sorting classes. It interfaces with a spike-sorted output and has the following features:

retrieve spike trains of different units
keep info about unit_ids VS unit indices
store channel properties
store object annotations
enable selection of sub-units
handle time information

Here we assume sorting is a BaseSorting object with 10 units:

unit_ids = sorting.unit_ids
num_units = sorting.get_num_units()
sampling_frequency = sorting.sampling_frequency

# retrieve spike trains for a unit (returned as sample indices)
unit0 = unit_ids[0]
spike_train = sorting.get_unit_spike_train(unit_id=unit0, segment_index=0)
# retrieve spikes between frames 100 and 200
spike_train_slice = sorting.get_unit_spike_train(unit_id=unit0,
                                                 start_frame=100, end_frame=200,
                                                 segment_index=0)
### NOTE ###
# 'segment_index' is required for multi-segment objects

# set/get a new unit property (e.g. "quality")
sorting.set_property(key="quality", values=["good"] * num_units)
quality_values = sorting.get_property("quality")
# get all available properties
property_keys = sorting.get_property_keys()

# set/get an annotation
sorting.annotate(date="Spike sorted today")
sorting.get_annotation(key="date")

# get new sorting within the first 10s of the spike trains
sorting_slice_frames = sorting.frame_slice(start_frame=0,
                                           end_frame=int(10*sampling_frequency))
# get new sorting with only the first 4 units
sorting_select_units = sorting.select_units(unit_ids=unit_ids[:4])

# register 'recording' from the previous example and get the spike trains in seconds
sorting.register_recording(recording=recording)
spike_train_s = sorting.get_unit_spike_train(unit_id=unit0, segment_index=0,
                                             return_times=True)
### NOTE ###
# When running spike sorting in SpikeInterface, the recording is  automatically registered. If
# times are not set, the samples are divided by the sampling frequency

Efficiency and caching when fetching spiketrains

By design a Sorting object is lazy so that the constructor is fast: no data are fetched during init. Sortings that read data (the extractors) or modify data (select unit, remove spikes) behave the same: they are lazy. The spiketrain data are only fetched when the user requests them, e.g. using sorting.get_unit_spike_train(). This can lead to very slow access depending on how the underlying data is organized, but with the advantage that the Sorting object is agnostic of the underlying data organization.

To fix the slow access problem, we can create different internal representations of the spiketrain data and automatically cache them. Then the next time sorting.get_unit_spike_train() is called, the cached data is instantaneously returned. Further, we can create and cache different reorganizations of the spiketrain, which are optimally organized for specific types of calculation.

Computations involving combined recording-sorting information, such as fetching recording chunks and spiketrain chunks to accumulate waveforms, are often quickest when spikes are time-ordered. For this use case, we use an internal representation called the spike_vector. This is a unique buffer: a numpy.array with dtype [(“sample_index”, “int64”), (“unit_index”, “int64”), (“segment_index”, “int64”)].

For computations which are done unit-by-unit, like computing isi-violations per unit, it is better that spikes from a single unit are concurrent in memory. For these other cases, we can re-order the spike_vector in different ways:

order by unit, then segment, then sample
order by segment, then unit, then sample

This is done using sorting.to_reordered_spike_vector(). The first time a reordering is done, the reordered spiketrain is cached in memory by default. Users should rarely have to worry about these details, but developers should keep memory layout in mind when implementing new features.

SortingAnalyzer

The SortingAnalyzer class is the core object to combine a BaseRecording and a BaseSorting object. This is the first step for additional analyses, and the basis of several postprocessing and quality metrics computations.

The SortingAnalyzer provides a convenient API to access the underlying BaseSorting and BaseRecording information, and it supports several extensions (derived from the AnalyzerExtension class) to perform further analysis, such as calculating waveforms and templates.

Importantly, the SortingAnalyzer handles the sparsity and the physical scaling. Sparsity defines the channels on which waveforms and templates are calculated using, for example, a physical distance from the channel with the largest peak amplitude (see the Sparsity section). Scaling, set by the return_in_uV argument, determines whether the data is converted from integer values to µV or not. By default, return_in_uV is true and all processed data voltage values are in µV (e.g., waveforms, templates, spike amplitudes, etc.).

Now we will create a SortingAnalyzer called sorting_analyzer.

from spikeinterface import create_sorting_analyzer

# create in-memory sorting analyzer object
sorting_analyzer = create_sorting_analyzer(
    sorting=sorting,
    recording=recording,
    sparse=True, # default
    format="memory", # default
)

print(sorting_analyzer)

>>> SortingAnalyzer: 4 channels - 10 units - 1 segments - memory - sparse - has recording
>>> Loaded 0 extensions:

The SortingAnalyzer by default is defined in memory, but it can be saved at any time (or upon instantiation) to one of the following backends:

zarr: the sorting analyzer is saved to a Zarr folder, and each extension is a Zarr group. This is the recommended backend, since Zarr files can be written to/read from the cloud and compression is applied.
binary_folder: the sorting analyzer is saved to a folder, and each extension creates a sub-folder. The extension data are saved to either npy (for arrays), csv (for dataframes), or pickle (for everything else).

If the sorting analyzer is in memory, the SortingAnalyzer.save_as function can be used to save it and all its extensions to disk. The function can also be used to switch a zarr or binary_folder into an in-memory object. This can be useful if you want to keep your original analysis, but want to test changing parameters. Once a SortingAnalyzer has been moved into memory it will only write to disk if SortingAnalyzer.save_as is run again with one of the backends supplied.

from pathlib import Path
# create a "processed" folder
processed_folder = Path("processed")

sorting_analyzer_zarr = sorting_analyzer.save_as(
    folder=processed_folder / "sorting_analyzer.zarr",
    format="zarr"
)
sorting_analyzer_binary = sorting_analyzer.save_as(
    folder=processed_folder / "sorting_analyzer_bin",
    format="binary_folder"
)
# sorting_analyzer_zarr and sorting_analyzer_binary are valid SortingAnalyzers,
# now associated to a Zarr storage / binary folder backend

# We can also create directly a SortingAnalyzer associated to a backend upon instantiation
# for instance, this create "zarr" SortingAnalyzer object
sorting_analyzer_with_backend = create_sorting_analyzer(
    sorting=sorting,
    recording=recording,
    sparse=True,
    format="zarr",
    folder="my-sorting-analyzer.zarr"
)

Once a SortingAnalyzer object is saved to disk, it can be easily reloaded with:

sorting_analyzer = si.load_sorting_analyzer(folder="my-sorting-analyzer.zarr")

Note

When saved to disk, the SortingAnalyzer will store a copy of the Sorting object, because it is relatively small and needed for most (if not all!) operations. The same is not true for the Recording object, for which only the main properties will be stored (e.g, sampling_frequency, channel_ids, channel_locations, etc.) and a provenance to reload the Recording. When loading a SortingAnalyzer from disk, an attempt is made to re-instantiate the Recording object from the provenance. In cases of failure, for example if the original file is not available, the SortingAnalyzer will be automatically instantiated in “recordingless” mode.

The sorting_analyzer object implements convenient functions to access the underlying recording and sorting objects’ information:

num_channels = sorting_analyzer.get_num_channels()
num_units = sorting_analyzer.get_num_units()
sampling_frequency = sorting_analyzer.sampling_frequency
total_num_samples = sorting_analyzer.get_total_samples()
total_duration = sorting_analyzer.get_total_duration()

# 'segment_index' is required for multi-segment objects
num_samples = sorting_analyzer.get_num_samples(segment_index=0)

# channel_ids and unit_ids
channel_ids = sorting_analyzer.channel_ids
unit_ids = sorting_analyzer.unit_ids

To calculate extensions, we need to have included the module they come from. Most of the extensions live in the postprocessing module (with the quality_metrics extension in the qualitymetrics module) , but there are some core extensions too:

random_spikes: select random spikes for downstream analysis (e.g., extracting waveforms or fitting a PCA model)
waveforms: extract waveforms for single spikes
templates: estimate templates (using raw data or waveforms)
noise_levels: compute channel-wise noise levels

Extensions have a parent/child structure. Children depend on parents, meaning that you can only compute a child after you’ve computed the parent. For the core extensions, the structure is fairly straightforward. random_spikes and noise_levels depend on nothing. waveforms depends on random_spikes. templates depends on waveforms or random_spikes, as it can be computed using either (albeit with different methods). If it is available templates is calculated using waveforms.

Note

Consider the case when an extension (e.g. waveforms) depends on another extension (the spikes which were randomly selected by random_spikes). If we were to recalculate random_spikes, the waveforms will change (a little). To avoid this inconsistency, spikeinterface deletes children if the parent is recalculated. E.g. if random_spikes is recalculated, waveforms is deleted. This keeps consistency between your extensions, and is better for provenance.

Since these core extensions are important for all other extensions it is important to understand how they work and what they are:

random_spikes allows the user fine control in how they wish to sample their raw data. For example, for a neuron with 10,000 spikes it may be too computationally expensive (& memory expensive) to load all spikes. So in this case random_spikes allows you to chose the number of spikes you wish to subsample for downstream analyses.
waveforms is the extension that goes through your raw data and creates a waveform for each spike within the random_spikes. You can control the time before (ms_before) and the time after (ms_after) to ensure that you have a full waveform. Because waveforms occur on multiple channels with multiple samples this can be a big data structure.
templates are calculated from the raw waveform data and are used for downstream analyses (e.g. spike_amplitudes are calculated based on the templates). This raises the question: if the templates are all we need, then why compute and save the waveforms? Well, there are two ways to obtain templates data: 1) directly from the raw data (based on the random_spikes) or 2) from the waveforms extension. When getting templates from the raw data we are limited to obtaining averages and standard deviations. If we calculate the templates from the waveforms, however, we can also calculate the template medians or percentiles in addition to the average or standard deviations of the waveforms. So it is important to think about the type of downstream analyses that you may want to do in deciding whether to calculate templates with random_spikes or using waveforms. Note that if the waveforms extension is available, templates will be calculated from it (the ms_before and ms_after parameters are also inherited from the waveforms).
noise_levels compute noise-levels in a channel-wise fashion. This provides important information about the specific recording session and is important for some downstream quality analyses.

Note

All of the core extensions rely on the recording being present (expect random_spikes which determine what parts of the raw data will be analyzed.) So if you plan on recomputing these values for downstream work you need the recording. The other extensions build off of these extensions and so can be recomputed without the recording being present.

In practice, we use the compute method to compute extensions. Provided the sorting_analyzer is instantiated, additional extensions are computed as follows:

# compute some additional extensions
random_spikes_extension = sorting_analyzer.compute("random_spikes")
waveforms_extension = sorting_analyzer.compute("waveforms")
templates_extension = sorting_analyzer.compute("templates")

# each extension has a .data field: a dictionary with computed data
print(templates_extension.data.keys())

>>> dict_keys(['average', 'std'])

You can also pass parameters to the compute, or compute several extensions at once. There are more details in the postprocessing docs and a simple examples is seen below:

# arguments can be passed directly to the compute function
# note that re-computing an extension will overwrite the existing one
waveform_extension_2 = sorting_analyzer.compute("waveforms", ms_before=2, ms_after=5)

# multiple extensions can be computed within the same `compute` call
sorting_analyzer.compute(
    ["random_spikes", "waveforms", "templates", "noise_levels"]
)

Note that any extension entered into the compute function will be computed. Thus, even if an extension has already been computed it will be recomputed when compute is called even if the parameters are the same. To check if an extension has already been computed and reload it, you can simply do:

if sorting_analyzer.has_extension("templates"):
    templates_extension = sorting_analyzer.get_extension("templates")

It is important when calculating extensions to remember which backend you are using. compute accepts an argument save which will write results to disk if using the zarr or binary_folder backends. If your SortingAnalyzer is in memory using save=True will not write to disk since spikeinterface does not know where to save it.

The reason to use save=False is it allows you to test parameters with the zarr or binary_folder backends without writing to disk. So, you can compute an extension in-memory with different parameters and then when you have decided on your desired parameters you can either use compute with save=True or use save_as to write everything out to disk.

Finally, the SortingAnalyzer object can be used directly to curate a spike sorting output by selecting/removing units and merging unit groups.

sorting_analyzer_select = sorting_analyzer.select_units(unit_ids=[0, 1, 2, 3])
sorting_analyzer_remove = sorting_analyzer.remove_units(remove_unit_ids=[0])
sorting_analyzer_merge = sorting_analyzer.merge_units([[0, 1], [2, 3]])

All computed extensions will be automatically propagated or merged when curating. Please refer to the Curation module documentation for more information.

Event

The BaseEvent class serves as basis for all Event classes. It allows one to retrieve events and epochs (e.g. TTL pulses). Internally, events are represented as numpy arrays with a structured dtype. The structured dtype must contain the time field, which represents the event times in seconds. Other fields are optional.

Here we assume event is a BaseEvent object with events from two channels:

channel_ids = event.channel_ids
num_channels = event.get_num_channels()
# get structured dtype for the first channel
event_dtype = event.get_dtype(channel_ids[0])
print(event_dtype)
# >>> dtype([('time', '<f8'), ('duration', '<f8'), ('label', '<U100')])

# retrieve events (with structured dtype)
events = event.get_events(channel_id=channel_ids[0], segment_index=0)
# retrieve event times
event_times = event.get_event_times(channel_id=channel_ids[0], segment_index=0)
### NOTE ###
# 'segment_index' is required for multi-segment objects

Snippets

The BaseSnippets class serves as basis for all Snippets classes (currently only NumpySnippets and WaveClusSnippetsExtractor are implemented).

It represents unsorted waveform cutouts. Some acquisition systems, in fact, allow users to set a threshold and only record the times at which a peak was detected and the waveform cut out around the peak.

Note

While we support this class (mainly for legacy formats), this approach is a bad practice and is highly discouraged! Most modern spike sorters, in fact, require the raw traces to perform template matching to recover spikes!

Here we assume snippets is a BaseSnippets object with 16 channels:

channel_ids = snippets.channel_ids
num_channels = snippets.get_num_channels()
# retrieve number of snippets
num_snippets = snippets.get_num_snippets(segment_index=0)
### NOTE ###
# 'segment_index' is required for multi-segment objects
# retrieve total number of snippets across segments
total_snippets = snippets.get_total_snippets()

# retrieve snippet size
nbefore = snippets.nbefore # samples before peak
nsamples_per_snippet = snippets.snippet_len # total
nafter = nsamples_per_snippet - nbefore # samples after peak

# retrieve sample/frame indices
frames = snippets.get_frames(segment_index=0)
# retrieve snippet cutouts
snippet_cutouts = snippets.get_snippets(segment_index=0)
# retrieve snippet cutouts on first 4 channels
snippet_cutouts_slice = snippets.get_snippets(channel_ids=channel_ids[:4],
                                              segment_index=0)

Handling probes

In order to handle probe information, SpikeInterface relies on the probeinterface package. Either a Probe or a ProbeGroup object can be attached to a recording and it loads probe information (particularly channel locations and sometimes groups). ProbeInterface also has a library of available probes, so that you can download and attach an existing probe to a recording with a few lines of code. When a probe is attached to a recording, the location property is automatically set. In addition, the contact_vector property will carry detailed information of the probe design.

Here we assume that recording has 64 channels and it has been recorded by a ASSY-156-P-1 probe from Cambridge Neurotech and wired via an Intan RHD2164 chip to the acquisition device. The probe has 4 shanks, which can be loaded as separate groups (and spike sorted separately):

import probeinterface as pi

# download probe
probe = pi.get_probe(manufacturer='cambridgeneurotech', probe_name='ASSY-156-P-1')
# add wiring
probe.wiring_to_device('ASSY-156>RHD2164')

# set probe (modifies the recording in place)
recording.set_probe(probe)
# set probe with group info derived from shank ids (in place)
recording.set_probe(probe, group_mode="by_shank")

# to get a *new* recording without modifying the original, use select_channels_with_probe
recording_w_probe = recording.select_channels_with_probe(probe)
recording_w_probe = recording.select_channels_with_probe(probe, group_mode="by_shank")

# multi-probe recordings use set_probegroup / select_channels_with_probegroup
recording.set_probegroup(probegroup)
recording_w_probegroup = recording.select_channels_with_probegroup(probegroup)

# retrieve probe
probe_from_recording = recording.get_probe()
# retrieve channel locations
locations = recording.get_channel_locations()
# equivalent to recording.get_property("location")

Probe information is automatically propagated in SpikeInterface, for example when slicing a recording by channels or applying preprocessing.

Note that several read_*** functions in the extractors module automatically load the probe from the files (including, SpikeGLX, Open Ephys - only NPIX plugin, Maxwell, Biocam, and MEArec).

Sparsity

In several cases, it is not necessary to have waveforms on all channels. This is especially true for high-density probes, such as Neuropixels, because the waveforms of a unit will only appear on a small set of channels. Sparsity is defined as the subset of channels on which waveforms (and related information) are defined. Of course, sparsity is not global, but it is unit-specific. Importantly, saving sparse waveforms, especially for high-density probes, dramatically reduces the size of the waveforms extension if computed.

Note

As of 0.101.0 all SortingAnalyzer’s have a default of sparse=True. This was first introduced in 0.99.0 for WaveformExtractor’s and will be the default going forward. To obtain dense waveforms you will need to set sparse=False at the creation of the SortingAnalyzer.

Sparsity can be computed from a SortingAnalyzer object with the estimate_sparsity() function:

# in this case 'analyzer' should be a dense SortingAnalyzer
sparsity = compute_sparsity(analyzer, method="radius", radius_um=40)

The returned sparsity is a ChannelSparsity object, which has convenient methods to access the sparsity information in several ways:

sparsity.unit_id_to_channel_ids returns a dictionary with unit ids as keys and the list of associated

channel_ids as values
sparsity.unit_id_to_channel_indices returns a similar dictionary, but instead with channel indices as

values (which can be used to slice arrays)

There are several methods to compute sparsity, including:

method="radius": selects the channels based on the channel locations. For example, using a

radius_um=40, will select, for each unit, the channels which are within 40um of the channel with the

largest amplitude (the extremum channel). This is the recommended method for high-density probes
method="best_channels": selects the best num_channels channels based on their amplitudes. Note that

in this case the selected channels might not be close to each other.
method="threshold": selects channels based on an SNR threshold (given by the threshold argument)
method="by_property": selects channels based on a property, such as group. This method is recommended

when working with tetrodes.

The computed sparsity can be used in several postprocessing and visualization functions. In addition, this sparsity can be used when creating a SortingAnalyzer which cause the sparse boolean to be ignored.

analyzer_sparse = si.create_sorting_analyzer(
    sorting=sorting,
    recording=recording,
    sparsity=sparsity,
    format='binary_folder',
    folder="sparse_analyzer"
)

Saving, loading, and compression

The Base SpikeInterface objects (BaseRecording, BaseSorting, and BaseSnippets) hold full information about their history to maintain provenance. Each object is in fact internally represented as a dictionary (si_object.to_dict()) which can be used to re-instantiate the object from scratch (this is true for all objects except in-memory ones, see Object “in-memory”).

The save() function allows to easily store SI objects to a folder on disk. BaseRecording objects are stored in binary (.raw) or Zarr (.zarr) format and BaseSorting and BaseSnippets object in numpy (.npz) format. With the actual data, the save() function also stores the provenance dictionary and all the properties and annotations associated to the object. The save function also supports parallel processing to speed up the writing process.

From a SpikeInterface folder, the saved object can be reloaded with the load_extractor() function. This saving/loading features enables us to store SpikeInterface objects efficiently and to distribute processing.

# n_jobs is related to the number of processors you want to use
# n_jobs=-1 indicates to use all available
job_kwargs = dict(n_jobs=8, chunk_duration="1s")
# save recording to folder in binary (default) format
recording_bin = recording.save(folder="recording", **job_kwargs)
# save recording to folder in zarr format (.zarr is appended automatically)
recording_zarr = recording.save(folder="recording", format="zarr", **job_kwargs)
# save snippets to NPZ
snippets_saved = snippets.save(folder="snippets")
# save sorting to NPZ
sorting_saved = sorting.save(folder="sorting")

NOTE: the Zarr format by default applies data compression with Blosc.Zstandard codec with BIT shuffling. Any other Zarr-compatible compressors and filters can be applied using the compressor and filters arguments. For example, in this case we apply LZMA and use a Delta filter:

from numcodecs import LZMA, Delta

compressor = LZMA()
filters = [Delta(dtype="int16")]

recording_custom_comp = recording.save(folder="recording", format="zarr",
                                       compressor=compressor, filters=filters,
                                       **job_kwargs)

Parallel processing and job_kwargs

The core module also contains the basic tools used throughout SpikeInterface for parallel processing of recordings. In general, parallelization is achieved by splitting the recording in many small time chunks and processing them in parallel (for more details, see the ChunkRecordingExecutor class).

Many functions support parallel processing (e.g., extract_waveforms(), save, and many more). All of these functions, in addition to other arguments, also accept the so-called job_kwargs. These are a set of keyword arguments which are common to all functions that support parallelization:

chunk_duration or chunk_size or chunk_memory or total_memory
- chunk_size: int
  Number of samples per chunk
- chunk_memory: str
  Memory usage for each job (e.g. ‘100M’, ‘1G’)
- total_memory: str
  Total memory usage (e.g. ‘500M’, ‘2G’)
- chunk_durationstr or float or None
  Chunk duration in s if float or with units if str (e.g. ‘1s’, ‘500ms’)
n_jobs: int
Number of jobs to use. With -1 the number of jobs is the same as number of cores. A float like 0.5 means half of the availables core.
progress_bar: bool
If True, a progress bar is printed
mp_context: “fork” | “spawn” | None, default: None
“fork” or “spawn”. If None, the context is taken by the recording.get_preferred_mp_context(). “fork” is only safely available on LINUX systems.

The default job_kwargs are n_jobs=1, chunk_duration="1s", progress_bar=True.

Any of these arguments, can be overridden by manually passing the argument to a function (e.g., extract_waveforms(..., n_jobs=16)). Alternatively, job_kwargs can be set globally (for each SpikeInterface session), with the set_global_job_kwargs() function:

global_job_kwargs = dict(n_jobs=16, chunk_duration="5s", progress_bar=False)
set_global_job_kwargs(**global_job_kwargs)
print(get_global_job_kwargs())
# >>> {'n_jobs': 16, 'chunk_duration': '5s', 'progress_bar': False}

Object “in-memory”

While most of the times SpikeInterface objects will be loaded from a file, sometimes it is convenient to construct in-memory objects (for example, for testing a new method) or “manually” add some information to the pipeline workflow.

In order to do this, one can use the Numpy* classes, NumpyRecording, NumpySorting, NumpyEvent, and NumpySnippets. These object behave exactly like normal SpikeInterface objects, but they are not bound to a file.

Also note the class SharedMemorySorting which is very similar to NumpySorting but with an underlying SharedMemory which is useful for parallel computing.

In this example, we create a recording and a sorting object from numpy objects:

import numpy as np
from spikeinterface.core import NumpyRecording, NumpySorting

# in-memory recording
sampling_frequency = 30_000.
duration = 10.
num_samples = int(duration * sampling_frequency)
num_channels = 16
random_traces = np.random.randn(num_samples, num_channels)

recording_memory = NumpyRecording(
    traces_list=[random_traces]
    sampling_frequency=sampling_frequency,
)
# with more elements in `traces_list` we can make multi-segment objects

# in-memory sorting
num_units = 10
num_spikes_unit = 1000
spike_trains = []
labels = []
for i in range(num_units):
    spike_trains_i = list(np.random.randint(low=0, high=num_samples, size=num_spikes_unit))
    labels_i = [i] * num_spikes_unit
    spike_trains += spike_trains_i
    labels += labels_i

# construct a mono-segment
samples_list = [np.array(spike_trains)]
labels_list = [np.array(labels)]

sorting_memory = NumpySorting.from_samples_and_labels(
    samples_list=samples_list, labels_list=labels_list, sampling_frequency=sampling_frequency
)

Any sorting object can be transformed into a NumpySorting or SharedMemorySorting easily like this:

# turn any sortinto into NumpySorting
sorting_np = sorting.to_numpy_sorting()

# or to SharedMemorySorting for parallel computing
sorting_shm = sorting.to_shared_memory_sorting()

Manipulating objects: slicing, aggregating

BaseRecording (and BaseSnippets) and BaseSorting objects can be sliced on the time or channel/unit axis.

These operations are completely lazy, as there is no data duplication. After slicing or aggregating, the new objects will be views of the original ones.

# here we load a very long recording and sorting
recording = read_spikeglx(folder_path='np_folder')
sorting =read_kilosort(folder_path='ks_folder')

# keep one channel of every tenth channel
keep_ids = recording.channel_ids[::10]
sub_recording = recording.select_channels(channel_ids=keep_ids)

# keep between 5min and 12min
fs = recording.sampling_frequency
sub_recording = recording.frame_slice(start_frame=int(fs * 60 * 5), end_frame=int(fs * 60 * 12))
sub_sorting = sorting.frame_slice(start_frame=int(fs * 60 * 5), end_frame=int(fs * 60 * 12))

# keep only the first 4 units
sub_sorting = sorting.select_units(unit_ids=sorting.unit_ids[:4])

We can also aggregate (or stack) multiple recordings on the channel axis using the aggregate_channels(). Note that for this operation the recordings need to have the same sampling frequency, number of segments, and number of samples:

recA_4_chans = read_binary('fileA.raw')
recB_4_chans = read_binary('fileB.raw')
rec_8_chans = aggregate_channels([recA_4_chans, recB_4_chans])

We can also aggregate (or stack) multiple sortings on the unit axis using the aggregate_units() function:

sortingA = read_npz_sorting('sortingA.npz')
sortingB = read_npz_sorting('sortingB.npz')
sorting_20_units = aggregate_units([sortingA, sortingB])

Working with multiple segments

Multi-segment objects can result from running different recording phases (e.g., baseline, stimulation, post-stimulation) without moving the underlying probe (e.g., just clicking play/pause on the acquisition software). Therefore, multiple segments are assumed to record from the same set of neurons.

We have several functions to manipulate segments of SpikeInterface objects. All these manipulations are lazy.

# recording2: recording with 2 segments
# recording3: recording with 3 segments

# `append_recordings` will append all segments of multiple recordings
recording5 = append_recordings([recording2, recording3])
# `recording5` will have 5 segments

# `concatenate_recordings` will make a mono-segment recording by virtual concatenation
recording_mono = concatenate_recordings([recording2, recording5])

# `split_recording` will return a list of mono-segment recordings out of a multi-segment one
recording_mono_list = split_recording(recording5)
# `recording_mono_list` will have 5 elements with 1 segment

# `select_segment_recording` will return a user-defined subset of segments
recording_select1 = select_segment_recording(recording5, segment_indices=3)
# `recording_select1` will have 1 segment (the 4th one)
recording_select2 = select_segment_recording(recording5, segment_indices=[0, 4])
# `recording_select2` will have 2 segments (the 1st and last one)

The same functions are also available for BaseSorting objects (append_sortings(), concatenate_sortings(), split_sorting(), select_segment_sorting()).

Note

append_recordings() and:py:func:~spikeinterface.core.concatenate_recordings have the same goal, aggregate recording pieces on the time axis but with 2 different strategies! One is keeping the multi segments concept, the other one is breaking it! See this example for more detail Append and/or concatenate segments.

Recording tools

The spikeinterface.core.recording_tools submodule offers some utility functions on top of the recording object:

get_random_data_chunks(): retrieves some random chunks of data:

get_noise_levels(): estimates the channel noise levels

get_chunk_with_margin(): gets traces with a left and right margin

get_closest_channels(): returns the num_channels closest channels to each specified channel

get_channel_distances(): returns a square matrix with channel distances

order_channels_by_depth(): gets channel order in depth

Template tools

The spikeinterface.core.template_tools submodule includes functionalities on top of the WaveformExtractor object to retrieve important information about the templates:

get_template_amplitudes(): returns the amplitude of the template for each unit on

every channel

get_template_extremum_channel(): returns the channel id (or index) where the

template has the largest amplitude

get_template_extremum_channel_peak_shift(): returns the misalignment in samples

(peak shift) of each template with respect to the center of the waveforms (nbefore)

get_template_extremum_amplitude(): returns the amplitude of the template for each

unit on the extremum channel

Generate toy objects

The core module also offers some functions to generate toy/simulated data. They are useful to make examples, tests, and small demos:

from spikeinterface.core import generate_recording, generate_sorting, generate_snippets

# recording with 2 segments and 4 channels
recording = generate_recording(num_channels=4, sampling_frequency=30000.,
                               durations=[10.325, 3.5], set_probe=True)

# sorting with 2 segments and 5 units
sorting = generate_sorting(num_units=5, sampling_frequency=30000., durations=[10.325, 3.5],
                           firing_rates=15, refractory_period_ms=1.5)

# snippets of 60 samples on 2 channels from 5 units
snippets = generate_snippets(nbefore=20, nafter=40, num_channels=2,
                             sampling_frequency=30000., durations=[10.325, 3.5],
                             set_probe=True,  num_units=5)

There are also some more advanced functions to generate sorting objects with varioues “mistakes” (mainly for testing purposes):

synthesize_random_firings()

clean_refractory_period()

inject_some_duplicate_units()

inject_some_split_units()

synthetize_spike_train_bad_isi()

Downloading test datasets

The NEO package is maintaining a collection of many electrophysiology file formats: https://gin.g-node.org/NeuralEnsemble/ephy_testing_data

The download_dataset() function is capable of downloading and caching locally dataset from this repository. The function depends on the datalad python package, which internally depends on git and git-annex.

The download_dataset() is very useful to perform local tests on small files from various formats:

# Spike" format
local_file_path = download_dataset(remote_path='spike2/130322-1LY.smr')
rec = read_spike2(local_file_path)

# MEArec format
local_file_path = download_dataset(remote_path='mearec/mearec_test_10s.h5')
rec, sorting = read_mearec(local_file_path)

# SpikeGLX format
local_folder_path = download_dataset(remote_path='/spikeglx/multi_trigger_multi_gate')
rec = read_spikeglx(local_folder_path)

LEGACY objects

This is now a legacy object that can still be accessed through the MockWaveformExtractor. It is kept for backward compatibility. You can convert a WaveformExtractor to a SortingAnalyzer easily, using this guide.

The WaveformExtractor class is the core object to combine a BaseRecording and a BaseSorting object. Waveforms are very important for additional analyses, and the basis of several postprocessing and quality metrics computations.

The WaveformExtractor allows us to:

extract waveforms
sub-sample spikes for waveform extraction
compute templates (i.e. average extracellular waveforms) with different modes
save waveforms in a folder (in numpy / Zarr) for easy retrieval
save sparse waveforms or sparsify dense waveforms
select units and associated waveforms

In the default format (mode='folder') waveforms are saved to a folder structure with waveforms as .npy files. In addition, waveforms can also be extracted in-memory for fast computations (mode='memory'). Note that this mode can quickly fill up your RAM… Use it wisely! Finally, an existing WaveformExtractor can be saved also in zarr format.

# extract dense waveforms on 500 spikes per unit
we = extract_waveforms(recording=recording,
                       sorting=sorting,
                       sparse=False,
                       folder="waveforms",
                       max_spikes_per_unit=500
                       overwrite=True)
# same, but with parallel processing! (1s chunks processed by 8 jobs)
job_kwargs = dict(n_jobs=8, chunk_duration="1s")
we = extract_waveforms(recording=recording,
                       sorting=sorting,
                       sparse=False,
                       folder="waveforms_parallel",
                       max_spikes_per_unit=500,
                       overwrite=True,
                       **job_kwargs)
# same, but in-memory
we_mem = extract_waveforms(recording=recording,
                           sorting=sorting,
                           sparse=False,
                           folder=None,
                           mode="memory",
                           max_spikes_per_unit=500,
                           **job_kwargs)

# load pre-computed waveforms
we_loaded = load_waveforms(folder="waveforms")

# retrieve waveforms and templates for a unit
waveforms0 = we.get_waveforms(unit_id=unit0)
template0 = we.get_template(unit_id=unit0)

# compute template standard deviations (average is computed by default)
# (this can also be done within the 'extract_waveforms')
we.precompute_templates(modes=("std",))

# retrieve all template means and standard deviations
template_means = we.get_all_templates(mode="average")
template_stds = we.get_all_templates(mode="std")

# save to Zarr
we_zarr = we.save(folder="waveforms_zarr", format="zarr")

# extract sparse waveforms (see Sparsity section)
# this will use 50 spikes per unit to estimate the sparsity within a 40um radius from that unit
we_sparse = extract_waveforms(recording=recording,
                              sorting=sorting,
                              folder="waveforms_sparse",
                              max_spikes_per_unit=500,
                              method="radius",
                              radius_um=40,
                              num_spikes_for_sparsity=50)

IMPORTANT: to load a waveform extractor object from disk, it needs to be able to reload the associated sorting object (the recording is optional, using with_recording=False). In order to make a waveform folder portable (e.g. copied to another location or machine), one can do:

# create a "processed" folder
processed_folder = Path("processed")

# save the sorting object in the "processed" folder
sorting = sorting.save(folder=processed_folder / "sorting")
# extract waveforms using relative paths
we = extract_waveforms(recording=recording,
                       sorting=sorting,
                       folder=processed_folder / "waveforms",
                       use_relative_path=True)
# the "processed" folder is now portable, and the waveform extractor can be reloaded
# from a different location/machine (without loading the recording)
we_loaded = si.load_waveforms(folder=processed_folder / "waveforms",
                              with_recording=False)