"Tree of Life" to harness big data, visualize complex relationships

May 23, 2012
By: Karyn Houston, College of Natural Resources

Scientists today deal with daunting volumes of data, and one of the most basic challenges facing researchers is how to organize that information into a usable format that can inspire new scientific insights. Now a nationwide team of university and private industry collaborators, including University of California, Berkeley plant biologist Chelsea Specht, have come up with a way to visually portray data so scientists can see, at a glance, how organisms are related.

It’s called “tree-thinking,” and the team will create a software package that will enable scientists and researchers to analyze data across the tree of life, enabling new evolutionary-based research to emerge that spans a tremendous range of fields, including medicine, public health, agriculture, ecology, and genetics.

The software will be developed with a $2 million grant from the National Science Foundation, shared by Specht and seven other university and private researchers in a unique collaboration to be announced this week. Taking available tools to a new level, the open-source software, called Arbor: Comparative Analysis Workflows for the Tree of Life, will be an intuitive interface based on visual workflows.

“With this application a researcher can identify patterns of molecular, developmental and morphological evolution, and then further explore those patterns and the processes driving diversification using novel analytical and visualization tools,” Specht said.  “The visualization capacity of Arbor will enable researchers to study previously unrecognized phylogenetic connections across space and time.”

Analyzing on a Grand Scale

In this era of “big data” in science, the tree of life is evolving rapidly and dramatically. A “phylogeny” is the evolutionary history of the development or evolution of a particular group of organisms. We now have available large-scale phylogenies and massive amounts of data, including character traits, plant traits, mammalian traits and gene data, Specht said. Better software to be able to view that complex data is a must for scientists and researchers, with the lofty goal of being able to better understand evolutionary processes.

The software will enable developmental biologists, geneticists, ecologists, geographers, paleobiologists, educators and students to analyze diverse types of comparative data on a grand scale. They will be able to see, at a glance, how organisms are interrelated and how they interact in geographical space and geological time. 

“Arbor will enable any scientist to address comparative questions applicable to crop improvement, human disease profiling, or developmental genetics – it will enable scientists from various disciplines to place their research questions into a phylogenetic context and use comparative tools to uncover novel patterns leading to new ideas on how to understand relationships or solve," Specht said.

The interface is a radical departure from existing software for comparative analyses. Building an Arbor workflow will have more in common with building a structure using Legos than with programming a computer. This will enable scientists trained in various fields of biology to harness the power of comparative evolutionary statistics to analyze their data in a sophisticated manner, generating easy to use workflows and publishable graphics.

Classroom Component

The new application will also have an outreach component.

The group included in the grant a module that will help K-12 teachers show “tree thinking” and enable them to take that next step with their students to envision and analyze data. They will be working closely with Judy Scotchmoor, assistant director at the UC Museum of Paleontology, to develop education-based modules and to provide support for training high school teachers in using Arbor for teaching comparative biology and evolution to their biology classes.

In addition, graduate students and advanced undergraduates will be invited to “hackathons” where they can contribute to the development of Arbor workflows, and to week-long Arbor training camps aimed at training students in the utility of Arbor for analyzing phylogenetic patterns and the role of geographic distributions, species interactions, or community ecology and structure on phylogenetic diversity.

The total grant is for $2 million. The Specht Lab, in the Department of Plant & Microbial Biology in the College of Natural Resources at UC Berkeley, will receive $600,000 dedicated to training of graduate students and postdocs in comparative evolutionary biology.

A Unique Collaboration

The project brings together a diverse group of biologists and private industry experts working together on the project.
The Principal Investigators are:

Luke Harmon, associate professor of Biology at the University of Idaho, is the lead investigator on the project. Harmon primarily studies ecological and evolutionary aspects of adaptive radiations. The Harmon Lab is interested in the causes and effects of both speciation and trait change, and how species interactions shape macroevolution.

Chelsea Specht, associate professor and plant evolutionary biologist at UC Berkeley. The Specht lab researches the processes and patterns involved in the evolution and diversification of plants, particularly monocots. They focus on the use of systematic in comparative biology, and the evolution of development, comparative genomics and the genetics of interspecies interactions.

Robert Thacker, professor of marine and freshwater ecology at the University of Alabama. Thacker uses molecular systematic to place studies of organism interaction in to a comparative phylogenetic context. He extensively studies sponges, and, yes – his name is Bob.

Jorge Soberón, professor in the Department of Ecology and Evolutionary Biology at The University of Kansas.  Soberón documents and researches large-scale spatial patterns in the biodiversity of terrestrial species. He extensively uses databases of specimens from scientific collections or observations along with Geographical Information Systems software and he is currently a Vice Chair of the Executive Committee for the Global Biodiversity Information Facility (GBIF).

• Curtis Lisle is the CEO of Knowledge Vis, a company that specializes in custom software solutions for scientific and medical visualization. The Florida-based small business uses interactive computer graphics visualization and novel software and data management strategies to assist clients in data analysis and decision-making processes.
• Wes Turner, Technical Leader for Kitware, a software company that creates and develops open-source software that provides visualization, computer vision, medical imaging and data publishing to a variety of academic and government institutions as well as private businesses.

Charles Hughes, Pegasus Professor in the Computer Science Division of the Electrical Engineering and Computer Science department at the University of Central Florida. Hughes is also the Director of the Synthetic Reality Lab and Professor in the School of Visual Arts and Design. Hughes is interested in digital puppetry, human-technology interaction, mixed and virtual reality, computer graphics and the theory of computation.

The core members of the group met at an NSF-sponsored AVAToL workshop last year and developed the idea for the grant to create the Tree of Life analysis & visualization software at that workshop. AVAToL stands for Assembling, Visualizing, and Analyzing the Tree of Life.  Drs. Turner and Hughes were asked to join the group to provide support in the areas of large-scale computational analyses and visualization.

The first version of Arbor is scheduled to be released later this year, with updates and expanded operations to continue over the course of the three years of funding.