2013-14 Data Science Lectures

Dramatic increases in the scale of data collection, analysis, and dissemination are fostering a revolution across the natural, mathematical, and social sciences. Progress in all these fields has come to depend on individuals and collaborative teams that combine domain expertise, computational knowledge, and statistical skills. Please join us for a monthly data science lecture series featuring outstanding UC Berkeley faculty and researchers from a wide range of disciplines. Please write to datascience@berkeley.edu if you would like to receive e-mail notifications about these events and other periodic updates regarding data science at UC Berkeley.

Location: Banatao Auditorium, Sutardja Dai Hall (CITRIS building) , UC Berkeley (unless otherwise listed)


Wednesday, April 16th, 4 - 5 pm

The Hearts and Minds of Data Science

Speaker: Cecilia R. Aragon, Associate Professor, Department of Human Centered Design & Engineering, University of Washington

Location: 306 Soda Hall (HP Auditorium) 

Thanks in part to the recent popularity of the buzzword "big data," it is now generally understood that many important scientific breakthroughs are made by interdisciplinary collaborations of scientists working in geographically distributed locations, producing and analyzing vast and complex data sets. The extraordinary advances in our ability to acquire and generate data in physical, biological, and social sciences are transforming the fundamental nature of science discovery across domains. Much of the research in this area, which has become known as data science, has focused on automated methods of analyzing data such as machine learning and new database techniques. Less attention has been directed to the human aspects of data science, including how to build interactive tools that maximize scientific creativity and human insight, and how to train, support, motivate, and retain the individuals with the necessary skills to produce the next generation of scientific discoveries.

In this talk, I will argue for the importance of a human centered approach to data science as necessary for the success of 21st century scientific discovery. Further, I attest that we need to go beyond well-designed user interfaces for data science software tools to consider the entire ecosystem of software development and use: we need to study scientific collaborations interacting with technology as socio-technical systems, where both computer science and sociological approaches are interwoven. I will discuss promising research in this area and speculate upon future directions for data science.

Cecilia Aragon is an associate professor in the Department of Human Centered Design & Engineering at the University of Washington, where she directs the Scientific Collaboration and Creativity Lab. She holds a faculty position with the UW eScience Institute and courtesy appointments in Computer Science and Engineering, Electrical Engineering, and the Information School, and leads UW’s Ethnography and Evaluation Working Group for the Moore/Sloan Data Science Environment. Before arriving at UW in 2010, she held an appointment in the Computational Research Division at Lawrence Berkeley National Laboratory for six years after earning her Ph.D. in computer science from UC Berkeley in 2004. She received her B.S. in mathematics from the California Institute of Technology.

Her current research focuses on human centered data science and computer-supported cooperative work (CSCW), visual analytics, emotion in informal text communication, and how social media and new methods of computer-mediated communication are changing data-intensive scientific practice.

She has authored or co-authored over 70 refereed and 100 non-refereed publications in HCI, CSCW, visual analytics, machine learning, and astrophysics. Her research has been recognized with six Best Paper awards since 2004. She won the Distinguished Alumni Award in Computer Science from UC Berkeley in 2013, the Faculty Innovator in Teaching Award from her department at UW that same year, and was named one of the Top 25 Women of 2009 by Hispanic Business Magazine. In 2008, she received the Presidential Early Career Award for Scientists and Engineers (PECASE) for her work in data-intensive science. Aragon has an interdisciplinary background, including over 15 years of software development experience in industry and NASA, and a three-year stint as the founder and CEO of a small company.


Friday, November 15th, 12 – 1 pm                    view webcast

Place, space and time: Rescuing and integrating biological and environmental data in the face of global change

Speaker: Charles Marshall

Professor, Integrative Biology
Director, University of California Museum of Paleontology
Chair, Berkeley Natural History Museums

With the ever-growing footprint of human activity, a central challenge of 21st century science is developing a predictive understanding of the processes that sustain Earth’s ecosystems and our impact on them.  Unraveling this complexity requires a great depth and breadth of data, from specimens in natural history museums, field data, aerial and satellite imagery, measurements from environmental sensor networks, to algorithms of predictive models of global change.  Despite the disparate nature of these data, all are bound by place, space, and time.  The Berkeley Institute in Global Change Biology (BigCB) is integrating these data as part of its mission to understand the complexity of the natural world and our impact on it.  Relevant to the Data Science Initiative, the BigCB’s activities include coordination and support of the IT developers and the scientists needed to meet the challenges of data integration.  Prime examples of our activities include: the development of the Berkeley Ecoinformatics Engine (Holos), which will provide an open technical infrastructure for researchers and students to make sense of this wealth of information; the rescue and digitization of dark data sets; and the running of training workshops.

Charles Marshall's talk will be followed by a panel discussion with Kevin Koy, Geospatial Innovation Facility; Maggi Kelly, Environmental Science, Policy, and Management; Rosemary Gillespie, Essig Museum of Entomology; Michelle Koo, Biodiversity Informatics, Museum of Vertebrate Zoology.

Charles Marshall is the Director of the University of California Museum of Paleontology, Chair of the Berkeley Natural History Museums and a Professor of Integrative Biology. He is broadly interested in how paleontology can inform our understanding of the history of life, and the processes that control it. Charles' research often takes advantage of data from genomics, molecular phylogenies, developmental biology, and functional studies.




Thursday, October 17, 2013, 12:30 – 1:30 pm                    View Webcast   

Movie reconstruction from brain signals and statistical stability

Speaker: Bin Yu

Chancellor's Professor, Statistics; Electrical Engineering & Computer Science

One of the major scientific challenges of our time is to understand how the brain works. Recently, researchers have attempted to answer one of the important questions in computational neuroscience:  Can the vast quantities of high-dimensional neuroscience data available today be used to decode brain activities?

I report on a thrilling breakthrough at the intersection of neuroscience and statistical machine learning that is based on joint work with the Gallant Lab on campus.  We have used penalized Least Squares methods to construct a "mind-reading" algorithm that reconstructs movies from fMRI  brain signals. The story of this algorithm is a fascinating tale of the interdisciplinary collaboration that was behind the development of the predictive system that was selected  as one of Time Magazine’s 50 Best Inventions of 2011.

Our interest in gaining "knowledge" from predictive models has led to statistical stability considerations that are necessary for scientific reproducibility beyond our project. Combining an estimation stability measure with predictive cross-validation (CV) in fitting L1-penalized LS (Lasso+ESCV), we have obtained much simpler and more reliable models for interpretation without losing prediction performance.

Bin Yu's talk will be followed by a panel discussion with Michael Franklin, EECS; Alex Huth, Helen Wills Neuroscience Institute; and Jasjeet Sekhon, Statistics and Political Science.

Bin Yu is the Chancellor's Professor of Statistics and Electrical Engineering & Computer Science. She works on statistical machine learning theory, methodologies, and algorithms for solving high-dimensional data problems, for example, arising from neuroscience, remote sensing, and document summarization.




Friday, September 27, 2013, 12 – 1 pm                    View Webcast                       

Warning California: Extracting earthquake signal from noise before the shaking starts

Speaker: Richard Allen

Professor, Earth and Planetary Science
Director, Berkeley Seismological Laboratory

When an earthquake occurs you could get a warning.  The challenge is the rapid detection and classification of earthquake signals given the continuous and variable noise that the Earth generates.  This month the California Legislature unanimously passed a bill that would bring earthquake alerts to the public.  As the interest in providing public alerts grows, the challenges in delivering accurate information as fast as possible are brought into sharp focus.  At the same time, low-cost low-quality accelerometers are becoming prevalent in consumer electronics providing an opportunity for massive expansion of our monitoring efforts.  In this talk we will review the status of earthquake early warning in California and around the world, the challenges that remain, and the opportunities for making use of massive new datasets.

Richard Allen is the Director of the Berkeley Seismological Laboratory and a Professor in Earth and Planetary Science.  His research interests range from 3D imaging of deep Earth processes--processes responsible for the motion of tectonic plates--to rapid detection and classification of surface shaking for real-time information systems, including earthquake early warning. 


Friday, June 21, 2013, 12 – 1 pm                        View Webcast                       

Extracting actionable insight from dirty time-series data

Speaker: Joshua Bloom

Professor, Astronomy; Director, Center for Time Domain Informatics

Earthquakes. Supernovae. Social unrest. Traffic accidents. Smart thermostats. They all generate noisy and incomplete streaming sensor data that require some form of action to be taken on the resultant inferences.  But how do researchers with domain expertise build the capability to generate actionable inferences when such tools often require cutting-edge frameworks from statistics and computer science?  If you're a domain scientist, who do you turn to for collaboration? If you are in statistics or EECS what do you get out of such collaborations? Bloom's lecture addressed these questions and discussed the practice of collaboration around time-series data on campus and beyond.Professor Bloom's presentation was followed by a panel discussion with Richard M. Allen, Professor, Earth and Planetary Science, Berkeley Seismological Lab; Michael Silver, Professor, Optometry and Vision Science and Neuroscience, School of Optometry and Helen Wills Neuroscience Institute; and Philip B. Stark, Professor and Chair, Statistics. The panel was moderated by Fernando Perez, Henry H. Wheeler Jr. Brain Imaging Center.

Joshua Bloom is a Professor of Astronomy and serves as the Director of the Center for Time-Domain Informatics. His research focuses understanding the nature of explosive transient phenomena.