Project Jupyter gets $6M to expand collaborative data science software

July 7, 2015
By: Sarah Yang

A powerful, interactive tool popular among academics and scientists who wrestle with large datasets in multiple formats is getting a big infusion of support to broaden its capabilities for collaborative data science and to reach ever wider audiences.

UC Berkeley's Fernando Perez and Brian Granger
Fernando Perez and Brian Granger discuss the architecture of Project Jupyter, as its scope expands to reach data science applications in more than 40 programming languages. Photo: Adriana Restrepo

The open-source platform, known as Project Jupyter, will receive $6 million in grants over three years from three foundations. The Alfred P. Sloan Foundation and the Gordon and Betty Moore Foundation are each providing $1.5 million to the University of California, Berkeley, and the Helmsley Charitable Trust is providing $3 million to California Polytechnic University, San Luis Obispo.

Fernando Pérez, an associate researcher and founding co-investigator at the Berkeley Institute for Data Science (BIDS), is the principal investigator of the UC Berkeley grants, and Brian Granger, an associate professor of physics, is the principal investigator of the Cal Poly award.

This effort expands upon Jupyter Notebook, a Web-based platform developed by an open collaboration co-led by Pérez and Granger, which allows scientists, researchers and educators to combine data from multiple formats – live code, equations, narrative text and rich media – into a single, interactive document.

With funding over the next three years, the researchers will expand and improve upon capabilities of the Jupyter Notebook.

Granger and Pérez estimate that more than 1 million people in fields ranging from astronomy to finance currently use Jupyter. Applications include the analysis of massive gene-sequencing datasets, processing images from the Hubble Space Telescope and developing models of financial markets.

“Project Jupyter serves not only the academic and scientific communities, but also a much broader constituency of data scientists in research, education, industry and journalism,” said Pérez. “Given the importance of computing across modern society, we see uses of our tools that range from high school education in programming to the nation’s supercomputing facilities and the leaders of the tech industry.”

Educators can write instructions in the notebook, include a coding exercise after the instructions, and then ask for their interpretation of the results immediately after that. Pérez noted that Jupyter is now being adapted for some data science courses in the fall at the UC Berkeley campus.

The Jupyter Notebook is itself an evolution of IPython, an interactive computing environment that Pérez began in 2001 while he was a graduate student in particle physics at the University of Colorado, Boulder. The Jupyter Notebook has been compared to a scientist’s lab notebook, allowing scientists to dive into another researcher’s work and, by poring through detailed annotations and explanatory text, understand the raw data.

For more information on Project Jupyter, see the project’s website.