A Standard for Neuroscience Data
Thanks to standardized image file formats—like JPEG, PNG or TIFF—which store information every time you take a digital photo, you can easily share selfies and other pictures with anybody connected to a computer, mobile phone or the Internet. Nobody needs to download any special software to see your picture.
But in many science fields—like neuroscience—sharing data isn’t that simple because no standard data format exists. So in November 2014, the Neurodata without Borders initiative—which is supported by the Kavli Foundation, GE, Janelia Farm, Allen Institute for Brain Science and the International Neuroinformatics Coordinating Facility (INCF)—hosted a hackathon to consolidate ideas for designing and implementing a standard neuroscience file format. And BrainFormat, a neuroscience data standardization framework developed at the Lawrence Berkeley National Laboratory (Berkeley Lab), is among the candidates selected for further investigation. It is now a strong contender to contribute to and develop a community-wide data format and storage standard for the neuroscience research community. BrainFormat is free to use, and can be downloaded here:https://bitbucket.org/oruebel/brainformat.
“This issue of standardizing data formats and sharing files isn’t unique to neuroscience. Many science areas, including the global climate community, have grappled with this,” says Oliver Ruebel, Berkeley Lab Computational Scientist who developed BrainFormat. “Sharing data allows researchers to do larger, more comprehensive studies. This in-turn increases confidence in scientific results and ultimately leads to breakthroughs.”
In conjunction with this work, Berkeley Lab’s National Energy Research Scientific Computing Center (NERSC)is also working with Jeff Teeters and Fritz Sommer of the Redwood Center for Theoretical Neuroscience at UC Berkeley on the Collaborative Research Computational Neuroscience (CRCNS) data-sharing portal, which will allow neuroscience researchers worldwide to easily share files without having to download any special software.
Both BrainFormat and CRCNS are being developed as part of a tri-institutional partnership between Berkeley Lab, UC Berkeley and UC San Francisco (UCSF). The computational tools could also help facilitate the White House’s Brain Research through Advancing Innovative Neurotechnologies (BRAIN) Initiative.
Dealing With the Deluge of Brain Data
In 2013, President Barack Obama challenged the neuroscience community to gain fundamental insights into how the mind develops and functions, and discover new ways to address brain diseases and trauma. He called this the BRAIN Initiative.
This work is expected to generate a deluge of data for the neuroscience community. After all, measuring activity from a fraction of neurons in the brain of a single mouse could generate almost as much data as the Hadron Collider, which is 17-miles in circumference. So before researchers can even begin taking measurements, they must first develop a standard format for labeling and organizing data, sharing files, and scaling up analytical and visualization methods and software to handle massive amounts of information.
“Neuroscience is currently a field of individual principle investigators, doing individual experiments, and analyzing that data on customized software. This means that data is stored in many different formats and described in different ways, which hinders community access to data,” says Kristofer Bouchard, a neuroscientist at Berkeley Lab. “As data volumes grow, we are going to need more people to look at the same data in different ways.”
Berkeley Lab is actively seeking ways to expand its contribution to the BRAIN Initiative, and as a scientist in the Computational Research Division (CRD) Ruebel is familiar with helping scientists from a variety of disciplines organize, store, access, analyze, share and massive complex datasets.
To come up with a convention for labeling, organizing, storing and accessing neuroscience data, Ruebel worked closely with Bouchard for applications from UCSF neurosurgeon Edward Chang and Berkeley Lab physicist Peter Denes to design BrainFormat using open source Hierarchical Data Format (HDF) technologies. Over the last 15 years, HDF has helped a variety of scientific disciplines organize and share their data. One prominent user of HDF is NASA's Earth Observing System, the primary data repository for understanding global climate change.
In addition to data format standardization, HDF is also optimized to run on supercomputers. So by buildingBrainFormat on this technology, neuroscientists will be able to use supercomputers to process and analyze their massive datasets.
“This work really highlights the unique strength of a Berkeley Lab, UC Berkeley and UCSF partnership,” says Denes. “UCSF is renowned for its clinical and experimental neuroscience experience with in vivo cortical electrophysiology; UC Berkeley contributes world-class expertise in theoretical neuroscience, statistical learning and data analysis; and Berkeley Lab brings supercomputing and applied mathematics expertise together with electronics and micro- and nano-fabrication.”
Denes heads Berkeley Lab’s contingent of the tri-institutional partnership to develop instrumentation and computational methods for recording neuroscience data. In addition to developing tools to deal with the data deluge, the BRAIN Initiative is also going to require new hardware to collect more data at higher-resolution, and process it in real-time. Researchers will also need novel algorithms for analyzing data. The tri-institutional partnership is also leveraging tools and expertise from different areas of science to tackle these challenges as well.
“Berkeley Lab’s strength has always been in science of scale,” says Prabhat, Berkeley Lab computational scientist. “Over the years, many science areas have struggled with issues of file format standardization, as well as managing and sharing massive datasets, and our staff built similar infrastructures for them. This isn’t a new problem, with BrainFormat and the CRCNS portal we’ve just extended these solutions to the field of neuroscience.”
More information: