Challenges in Utilizing the Neurodata Without Borders (NWB) Format for Human Single Unit Data
We released a precious dataset and processing pipeline of human singe-neuron activity recorded from the medial temporal lobe. The components of this dataset have been standardized in the NWB format, permitting seamless reuse of this data for research and teaching.
Imagine you wake up one morning and are overcome with dense amnesia: you forget where you are, lose the ability to recognize your family, or even forget your own name. Unfortunately, such amnesia burdens many of those with dementia and cognitive impairment every day. Finding effective treatments for such neurological disorders requires a detailed understanding of the mechanisms and principles by which the brain operates.
Understanding the neural mechanisms underlying cognition (learning, memory, decision making) remains an ongoing challenge, particularly in humans. Within the past two decades, major technological advances have greatly increased our capabilities for observing and measuring the activity of individual neurons and their structure. These immense amounts of data are posing a new unresolved problem: the standardization of the data formats utilized to describe and store the resulting data. Neurodata Without Borders: Neurophysiology (NWB) is a new standardized data format to organize, describe, store, and share cellular-level data. This provides researchers with a common language to share and analyze electrophysiology datasets (Figure 1).
Figure 1: Overview of NWB workflow.
Our team records in-vivo single-neuron activity from the human brain in patients with medically refractory epilepsy implanted with hybrid depth electrodes with embedded microwires (Figure 2). One large dataset that we have acquired is of a declarative memory task involving new/old recognition memory judgments1,2. This dataset consists of 1,863 neurons in the medial temporal lobe from 59 patients. This data, however, is stored in a proprietary data format, making it difficult to share it with other researchers wanting to utilize the dataset for secondary analysis or validation with theoretical models. We have published several key insights derived from this dataset, including the properties of memory selective neurons that report the familiarity or novelty of a stimulus, and visually selective neurons that signal the visual category of a stimulus3.
Figure 2: Illustration of the Brain Areas in the Medial Temporal Lobe that we recorded from.
As a next step, we wanted to release this precious data set in the NWB format to subsequently share it. The result of this effort is documented in our new paper, A NWB-based Dataset and Processing Pipeline of Human Single-Neuron Activity During a Declarative Memory Task. In addition to describing the data we shared, a key goal of this paper is to provide a pipeline for others to use to share human single neuron data in the NWB format, and to demonstrate how to analyze human single-neuron data stored in NWB (see our github for the full pipeline/code).
As we began transforming our data into the NWB format, we began to realize the immediate advantages of this new format. NWB permitted us to store all the components of this complex experiment in a single file, including all electrophysiological recordings, behavior, stimuli shown to the patient, and meta-data regarding electrode localization. In addition, the flexibility that NWB allows permitted us to include information in the file that the format did not anticipate would be needed, such as meta-data explaining specific deviations in the data from the standard necessitated by PHI (patient health information) requirements.
We encountered several difficulties in transferring our internal data structures to NWB. These demonstrate why standardizing data is a significant challenge that needs a significant investment for each dataset. First, in our data, trial-by-trial behavioral markers (i.e., ‘TTLs’) are encoded as numeric values that each denote different events of significance, such as time of trial start, stimulus onset, or response time. We considered encoding these markers in text form, i.e. ‘Trial Start’, or ‘Stimulus Onset’. However, this would be highly redundant so eventually we decided to encode them in their original numeric form and specified the meaning of each marker in the corresponding meta-data component of this field. As a result, understanding what a given marker means requires manual intervention by reading the information provided in the meta data. Due to this, the data is not fully understandable by a machine without user intervention, a limitation of the format. Second, we encountered situations were multiple ways of storing a given piece of data were possible, with no clear guidelines on what the intent of the format was. Close consultation with the NWB development team helped us in deciding what the best way is in these instances. Third, it was sometimes the case that we were not able to import an NWB file exported from one programming language in another (i.e. exporting with Matlab, reading with Python and vice-versa). These cases identified ways by which the ways we utilized NWB were not as intended/specified. It thus turned out to be critical for us to perform such cross-programing language interoperability testing. The code provided as part of this paper provides a template for the community to utilize NWB in a straightforward manner without going through all these debugging steps.
Throughout this process, the NWB development team offered advice on best practices and has helped us resolve software bugs we have encountered in this project. Additionally, we profited from attending the 2019 NWB:N Developer Hackathon and User Days hosted by Janelia/HHMI (see Figure 3), an event we highly recommend for potential users of NWB. Lastly, the data conversion project was inspired and made possible by a grant from the Kavli Foundation – we envisioned this project after reading a call for applications for projects to release datasets in NWB.
Figure 3: Participants of the NWB:N User Days, May 13-14, 2019. Photo by Matt Staley, HHMI Janelia. Used by permission by Matt Staley.
We hope that our data descriptor and corresponding processing pipeline is helpful to other researchers utilizing NWB for data standardization. We welcome feedback and questions from other researchers. You can find our data here, and our full code/pipeline here.
1. Rutishauser, U. et al. Representation of retrieval confidence by single neurons in the human medial temporal lobe. Nat. Neurosci. 18, 1041–1050 (2015).
2. Faraut, M. C. M. et al. Dataset of human medial temporal lobe single neuron activity during declarative memory encoding and recognition. Sci. Data 5, 180010 (2018).
3. Rutishauser, U. Testing Models of Human Declarative Memory at the Single-Neuron Level. Trends Cogn. Sci. 23, 510–524 (2019).