Data Science

Institutes and Programs

Areas of Focus

data science methodologies
UC Berkeley researchers have made groundbreaking contributions in mathematics, statistics, and computer science, including pioneering success in integrating modern computational frameworks for inference algorithms with traditional statistics to tackle data-rich scientific problems. An array of data management, visualization and curation services support these research efforts.
honeybee cropped
Domain Applications
UC Berkeley's scientific impact across the natural and social science domains reflects revolutionary techniques to collect, mine, and analyze unprecedented volumes and velocities of data. UC Berkeley researchers are engaged in collaborative, data science efforts across across the disciplines.
Python boot camp lecture copy
The advancement of data-driven research at UC Berkeley is coupled to a commitment to bring data science into the educational program. The campus has recently established new courses and degree programs aimed at training a new generation of data scientists including four new data science-focused graduate degrees.


UC Berkeley researchers have made groundbreaking contributions in mathematics, statistics, and computer science, including machine-learning and visualization techniques. These efforts are complemented by an array of data collection, management and curation services. The centers, institutes and programs highlighted below advance a wide range of data science methodologies:

Large-scale  interdisciplinary efforts 


Working at the intersection of three massive trends: powerful machine learning, cloud computing, and crowdsourcing, the AMPLab is integrating Algorithms, Machines, and People to make sense of Big Data. They are creating a new generation of analytics tools to answer deep questions over dirty and heterogeneous data by extending and fusing machine learning, warehouse-scale computing and human computation. They validate these ideas on real-world problems including participatory sensing, urban planning, and personalized medicine with their application and industrial partners.

citrisCenter for Information Technology Research in the Interest of Society (CITRIS)

The Center for Information Technology Research in the Interest of Society (CITRIS) creates information technology solutions for many of our most pressing social, environmental, and health care problems. CITIRS was created to “shorten the pipeline” between world-class laboratory research and the creation of start-ups, larger companies, and whole industries. CITRIS facilitates partnerships and collaborations among more than 300 faculty members and thousands of students from numerous departments at four University of California campuses with industrial researchers from over 60 corporations.

SDAV2SDAV - Scalable Data Management, Analysis, and Visualization

SDAV provides comprehensive expertise in scientific data management, analysis, and visualization aimed at transferring state of the art techniques into operational use by application scientists on leadership-class computing facilities.  It is a collaboration tapping the expertise of researchers at six laboratories and and in seven universities.

simons instituteSimons Institute for the Theory of Computing

The Simons Institute for the Theory of Computing facilitates collaborative research in theoretical computer science. Established in July 2012 with support from the Simons Foundation, its goal is to bring together the world's leading researchers in theoretical computer science and related fields, as well as the next generation of outstanding young scholars, to explore deep unsolved problems about the nature and limits of computation.

Computer Science 


CRD Computational Research at the Berkeley Lab

The Computational Research Division conducts research and development in mathematical modeling and simulation, algorithm design, data storage, management and analysis, computer system architecture and high-performance software implementation. They collaborate directly with scientists across the Berkeley Lab, the Department of Energy and industry to solve some of the world’s most challenging computational and data management and analysis problems in a broad range of scientific and engineering fields, including materials science, biology, climate modeling, astrophysics, fusion science, and many others.

eecsElectrical Engineering and Computer Sciences (EECS)

EECS offers research and instructional programs in electrical engineering and computer science. Their key strengths lie in the integration of fundamental theoretical ideas with practical applications, leading to a wide range of cross-disciplinary, collaborative projects. The integration of electrical engineering and computer science forms the core, with strong interactions that extend into biological sciences, mechanical and civil engineering, physical sciences, chemistry, mathematics, operations research, and more.


Founded at UC Berkeley, IPython is an interactive shell for the Python programming language that offers enhanced introspection, additional shell syntax, tab completion and rich history.

nerscNational Energy Research Scientific Computing Center (NERSC)

The National Energy Research Scientific Computing Center (NERSC), a division of the Berkeley Lab, is the primary scientific computing facility for the Office of Science in the U.S. Department of Energy. As one of the largest facilities in the world devoted to providing computational resources and expertise for basic scientific research, NERSC is a world leader in accelerating scientific discovery through computation.


statisticsDepartment of Statistics

The Department of Statistics is engaged in research and education in probability and statistics. In addition to developing fundamental theory and methodology, they are actively involved in statistical problems that arise in such diverse fields as molecular biology, geophysics, astronomy, AIDS research, neurophysiology, sociology, political science, education, demography, and the U.S. Census.

Geospatial Data Analysis & Collection

arfArchaeological Research Facility (ARF)

The Archaeological Research Facility (ARF) encourages and carries out archaeological field and laboratory research conducted by UC Berkeley archaeologists and related specialists. As a field of research, archaeology is inherently interdisciplinary and collaborative; not only are there intimate research collaborations among natural scientists, social scientists and/or humanities scholars, but archaeology is practiced by scholars who expectedly hold faculty and/or research positions in a variety of departments, ranging from classics to earth and planetary science.

ECAIElectronic Cultural Atlas Initiative (ECAI)

The Electronic Cultural Atlas Initiative, established in 1997 by Emeritus Prof. Lewis Lancaster of UC Berkeley, is a digital humanities initiative involving numerous professors and institutions around the world with the stated goal of creating a networked digital atlas by creating tools and setting standards for dynamic, digital maps.

gisGeographic Information Systems (GIS)

The Geographic Information Systems at UC Berkeley coordinate GIS activities across campus, such as: classes, talks, workshops, jobs, experience.

Geospatial Innovation Facility Geospatial Innovation Facility (GIF)

The Geospatial Innovation Facility at UC Berkeley's College of Natural Resources provides leadership and training across a broad array of integrated mapping technologies, such as Remote Sensing, Geographic Information Systems (GIS), Global Positioning Systems (GPS), and modeling. The GIF offers innovative geospatial approaches to environmental research projects and grant opportunities.


UrbanSim is an open source urban simulation system designed by Paul Waddell at UC Berkeley and developed with numerous collaborators to support metropolitan land use, transportation, and environmental planning.

Data Collection and Digitalization 

CDLCalifornia Digital Library

In collaboration with the UC libraries and other partners, the California Digital Library has assembled one of the world’s largest digital research libraries and changed the way that faculty, students, and researchers discover and access information.

Collection SpaceCollectionSpace

CollectionSpace is an open-source collections management application that meets the needs of museums, historical societies, and other collection-holding organizations. The Phoebe A. Hearst Museum of Anthropology uses CollectionSpace to manage and provide online access to its collection of more than 3.8 million cataloged objects of material culture from around the world. The University and Jepson Herbaria is working to expand CollectionSpace to support research-driven interoperability, including the aggregation of content for the Consortium of California Herbaria.


BCNMBerkeley Center for New Media (BCNM)

The Berkeley Center for New Media is a focal point for research and teaching about new media, led by a highly trans-disciplinary community of 120 affiliated faculty, advisors, and scholars, from 35 UC Berkeley departments. Their mission is to critically analyze and help shape developments in new media from cross-disciplinary and global perspectives that emphasize humanities and the public interest.

ischool2School of Information (ISchool)

The School of Information is a graduate research and education community committed to expanding access to information and to improving its usability, reliability, and credibility while preserving security and privacy. This requires the insights of scholars from diverse fields such as, information and computer science, design, social sciences, management, law, and policy.


visualizationVisualization Group

The Visualization Group aims to assist researchers in achieving their scientific goals – solving some of the world's most challenging problems in scientific data understanding – through visualization and analytics while simultaneously advancing the state-of-the-art in visualization through their own research.



UC Berkeley researchers are engaged in collaborative, data science efforts across across many academic disciplines. Listed below are some of the centers, institutes and programs that help to facilitate this research in different domain areas:


Astronomy and physics 

ATLAS experiment ATLAS at the Berkeley Lab

ATLAS is a particle physics experiment at the Large Hadron Collider at the European Organization for Nuclear Research (CERN). The ATLAS detector is searching for new discoveries in the head-on collisions of protons of extraordinarily high energy. ATLAS will learn about the basic forces that have shaped our Universe since the beginning of time and that will determine its fate. Among the possible unknowns are the origin of mass, extra dimensions of space, unification of fundamental forces, and evidence for dark matter candidates in the Universe.

Cosmological PhysicsBerkeley Center for Cosmological Physics (BCCP) 

The Berkeley Center for Cosmological Physics is focused on understanding the origin and evolution of the universe through a series of programs to define the observations, experiments, concepts, and simulations needed to answer the fundamental questions in cosmology. Combining experimentation, computation, and theory, BCCP continues to develop the foundation of an accurate, reliable model of the cosmos. They compare the implications of this evolving model against observations — thus opening new horizons and expanding our knowledge of the universe.

Kam LandBerkeley KamLand Group

The Berkeley KamLAND (Kamioka Liquid-scintillator Anti-Neutrino Detector) group consists of physicists from both the Berkeley Lab and the physics department at UC Berkeley. KamLand  has demonstrated convincingly that neutrinos are massive and undergo flavor oscillations - a profound discovery. Many questions of fundamental significance remain open; but with a new understanding of neutrino propagation, neutrino science is now poised to provide illuminating answers to some of society's most probing questions concerning the Earth, the Sun and fantastic astrophysical events such as supernovae.


The BigBOSS experiment is a proposed DOE-NSF Stage IV ground-based dark energy experiment to study baryon acoustic oscillations (BAO) and the growth of structure with an allsky galaxy redshift survey. The project is designed to unlock the mystery of dark energy using existing ground-based facilities operated by National Optical Astronomy Observatory (NOAO).

Time Domain Informatics Center for Time Domain Informatics (CTDI)

The Center for Time Domain Informatics came from the newly emerging discipline – Time-Domain Astronomy and Informatics – which involves astronomers, statisticians, and computer scientists. At the most basic level, they are interested in extracting optimal (and novel) information from a finite dataset of time-series data in a computational-constrained environment. In other words, they aim to understand the huge landscape of variable stars and transient events in the Universe, using computers (and in particular, machine-learning) to do this more efficiently. 

C3.Computational Cosmology Center (C3)

The Computational Cosmology Center is a focused collaboration of astrophysicists and computational scientists whose goals are to develop the tools, techniques and technologies to meet the analysis challenges posed by present and future cosmological data sets. Members of C³ conduct research in a number of areas where high performance computing is needed to support theoretical and observational cosmology, or where massively parallel cosmology codes can help to drive computational science research and development.

RAL.Radio Astronomy Laboratory (RAL)

The Radio Astronomy Laboratory was created in 1958 to foster research in radio astronomy, a discipline that naturally extends beyond the borders of traditional academic departments at Berkeley. Over the years, faculty and graduate students from the astronomy, physics, chemistry, electrical engineering and computer science, and geology and geophysics departments have made use of the RAL's facilities.

Space Sciences Space Sciences Laboratory (SSL)

The Space Sciences Lab was initiated in 1958 by a committee of faculty members who recognized that emerging rocket and satellite technology opened up new investigative realms for the physical, biological, and engineering sciences. As a campus-wide multidisciplinary organization, SSL serves to integrate the space sciences on campus and stimulate new faculty-student research programs. Amongst other projects, SSL developed and maintains the SETI@home project which pioneered the application of distributed computing to the space sciences.


TAC cTheoretical Astrophysics Center (TAC)

The Theoretical Astrophysics Center includes faculty, research scientists, postdoctoral researchers, and students working on a wide variety of problems in theoretical astrophysics. Their specialties include cosmology, planetary dynamics, the interstellar medium, star and planet formation, and compact objects.


Chemistry and Materials Science Nanosciences and Nanoengineering Institute (BNNI)

The Berkeley Nanosciences and Nanoengineering Institute is the umbrella organization for expanding and coordinating Berkeley research and educational activities in nanoscale science and engineering. BNNI aims to serve as a catalyst for greater interdisciplinary collaboration between Berkeley faculty and students in disciplines such as physics, chemistry, biology and engineering as well as deepen and expand collaborations with industry, national labs, and government agencies.

EFRC Energy Frontier Research Center (EFRC)

The Energy Frontier Research Center for gas separations relevant to clean air technologies at UC Berkeley focuses on the energy costs associated with the separation of CO2 from gas mixtures. The long-term goal of this EFRC is to develop the science and materials that will contribute to the reduction of the parasitic energy costs of Carbon Capture and Sequestration (CCS).

Molecular FOundryMolecular Foundry

The Molecular Foundry is a nanoscience user facility located at the Berkeley Lab. It is a critical part of the DOE’s National Nanotechnology Initiative, a multi-agency framework designed to improve human health, economic well-being and national security through leadership in nanotechnology. The Foundry supports broad nanoscience research efforts in both "hard" nanomaterials (nanocrystals, tubes and lithographically patterned structures) and "soft" nanomaterials (polymers, dendrimers, DNA, proteins and whole cells), as well as in the design, fabrication and study of multi-component, complex, functional assemblies of such materials.

Climate Science 

BASC.Berkeley Atmospheric Sciences Center (BASC)

The Berkeley Atmospheric Sciences Center is a multi-college unit at UC Berkeley, with the goal to broaden the atmospheric sciences beyond its traditional boundaries to embrace the biogeochemical frontier and the human dimension. The Center facilitates communication and integration across these traditional boundaries. In doing so, they aim to define a new paradigm for investigating changes in the atmosphere by integrating the microscopic mechanisms of chemical, physical, and biological processes with large-scale ecological and geological interactions between the geosphere, biosphere, and oceans, and how these interactions alter atmospheric composition.

Computational  Bioscience 

CCB 2Center for Computational Biology

The Center for Computational Biology (CCB) was established in 2003 through the Chancellor’s New Ideas Initiative, an outgrowth of the 2002 Strategic Academic Plan, to expand the research base at the University and produce the next generation of leaders in the fundamental and applied biological sciences. Administratively housed in the Berkeley component of the California Institute for Quantitative Biosciences (QB3), the mission of CCB is to support interdisciplinary research on the broad array of subjects that cover the interface between computation and biology, and to foster graduate and undergraduate education in the field.

CGRLComputational Genomics Resource Laboratory (CGRL)

The Computational Genomics Resource Laboratory (CGRL) at UC Berkeley’s QB3 aims to facilitate research programs employing computational biology with computational infrastructure for data analysis, training in analytical tools for next-generation sequence data, and project-specific consultation on experiment design and analysis.

KBaseDepartment of Energy Systems Biology Knowledgebase (KBase)

The Department of Energy Systems Biology Knowledgebase (KBase) is an emerging software and data environment designed to enable researchers to collaboratively generate, test and share new hypotheses about gene and protein functions, perform large-scale analyses on a scalable computing infrastructure, and model interactions in microbes, plants, and their communities. KBase provides an open, extensible framework for secure sharing of data, tools, and scientific conclusions in predictive and systems biology.

JGI Joint Genome Institute (JGI)

The mission of the U.S. Department of Energy (DOE) Joint Genome Institute (JGI) is to advance genomics in support of the DOE missions related to clean energy generation and environmental characterization and cleanup. Supported by the DOE Office of Science, the DOE JGI unites the expertise of five national laboratories— the Berkeley Lab, Lawrence Livermore, Los Alamos, Oak Ridge, and Pacific Northwest—along with the HudsonAlpha Institute for Biotechnology. JGI is operated by the University of California for the U.S. Department of Energy and the facility provides integrated high-throughput sequencing and computational analysis that enable systems-based scientific approaches to these challenges.

qb3QB3: California Institutes for Science and Innovation

 The California Institute for Quantitative Biosciences (QB3) is one of four Governor Gray Davis Institutes for Science and Innovation established in 2000 to ensure the future of the California economy by promoting research and innovation. QB3 is a cooperative effort between the state of California, private industry, venture capital, and the University of California campuses at Berkeley, San Francisco, and Santa Cruz. QB3 harnesses the quantitative sciences of physics and engineering to unify our understanding of biological systems at all levels of complexity, from atoms and molecules to cells, tissues, and entire living organisms. QB3 scientists make discoveries that drive the development of technologies, products, and wholly new industries, ensuring that California remains competitive in the 21st century. 

synberc Synthetic Biology Engineering Research Center (SYNBERC)

In 2006, the NSF funded the first synthetic biology engineering research center – Synberc  – to develop engineered biological systems that will catalyze new technologies for processing information, producing energy, manufacturing chemicals and pharmaceuticals and fabricating materials. Synberc is a consortium of UC Berkeley, UC San Francisco, Stanford, Harvard and MIT. These universities are located in the two “hubs” of synthetic biology - Boston and San Francisco’s Bay Area.

Digital Humanities 


Mangala ResearchBuddhist Translators Workbench - Mangalam Research

The vast scope of Buddhist literature makes collaboration in translation essential. Widely practiced in ancient times, collaboration has not characterized recent work. BTW will model and enhance ancient collaborative approaches using cutting-edge tools. As an open, cumulative, and modular system, BTW will let users select levels of access and input and join working groups based on their interest/expertise. This flexibility also allows aspects of BTW to be decoupled for use in other projects.

berkeleyBerkeley Prosopography Services

Berkeley Prosopography Services (BPS) is an open-source prosopographical toolkit that generates interactive visualizations of the biological and social connections that link documented individuals, providing a dynamic and heuristic tool for researching historical communities documented in legal and administrative archives. We are currently exploring and developing a prototype application with a single target corpus, but will soon expand to support multiple corpora. The initial corpus is a set of Hellenistic Babylonian legal texts (cuneiform tablets).

berkeley wordseerBerkeley WordSeer

The increasing prevalence of digitized source material in the humanities has led to uncertainty about how this suddenly available information will change scholars’ research methods. What balance will scholars strike between in-depth examination of a few sources, and a more distant reading of a large number of them? As computer scientists and literary scholars, WordSeer sees this as an opportunity to tackle a shared challenge between human-computer interaction and the humanities. Their focus is specifically on text collections: comparing texts, getting a sense of style and theme similarities, and tracing patterns of language use. 

townsendTownsend Center for the Humanities

Established in 1987, the Doreen B. Townsend Center for the Humanities encourages an interdisciplinary approach to scholarship, fostered innovation in research, and promoted intellectual conversation among individuals from the humanities and related academic disciplines. The Center offers an array of fellowship and grant programs designed to support research and scholarship at all levels of the university community. They also support more than 60 interdisciplinary working groups on a wide range of topics—ranging from Hip Hop Studies to Orientalism, from Latin American Colonial Studies to New Media—and co-sponsor a wide variety of lectures and conferences with other departments and units on campus. 


BiGCB Berkeley Initiative in Global Change Biology (BiGCB)

The Berkeley Initiative in Global Change Biology uses state-of-the-art tools and technologies to mobilize historic and modern biological data to understand how organisms and ecological systems have responded to past global change events. This work will improve the forecasting of biological system response to future global change.

Berkeley MapperBerkeley Mapper

Berkeley Mapper 2.0 is a mapping interface for Collections (or other) Databases built on top of Google Maps. Users can configure their mapping interface through a simple XML configuration script, mapping data from tab-delimited text files.

BioGeomancer.BioGeomancer (BG) Project
The BioGeomancer (BG) Project is a worldwide collaboration of natural history and geospatial data experts. The primary goal of the project is to maximize the quality and quantity of biodiversity data that can be mapped in support of scientific research, planning, conservation, and management. The project promotes discussion, manages geospatial data and data standards, and develops software tools in support of this mission.

BiSciColBiological Science Collections (BiSciCol)

The Biological Science Collections Tracker is a NSF funded collaborative project with the goal of building an infrastructure designed to tag and track scientific collections and all of their derivatives.


Cal-Adapt has been designed to provide access to the wealth of data and information that has been, and continues to be, produced by the State's scientific and research community. The data available on their website offers a view of how climate change might affect California at the local level. Their site allows the user to work with visualization tools, access data, and participate in community sharing. It has been developed by UC Berkeley's Geospatial Innovation Facility (GIF) with funding and advisory oversight by the California Energy Commission’s Public Interest Energy Research (PIER) Program, and advisory support from


Calbug is a collaborative project among nine California museums with a goal to digitize and geographically reference over one million specimens. These specimen’s labels encode data denoting species, location, and date captured and are used to study biogeographic patterns, spread of invasive species, and responses to land use, climate, and other environmental changes.


CalPhotos has been on the web since 1995 and was one of the first online image databases specializing in natural history subjects. The database currently contains roughly 400,000 digital images of plants, animals, and other natural history subjects, along with descriptive information including scientific and common names, location and dates of photos, and other information provided by the person or organization that contributed the photos. As of early 2008, CalPhotos received more than 120,000 specific queries and served more than one million images per day.

Moorea BiocodeMoorea Biocode Project

The Moore Biocode Project aims to create the first comprehensive inventory of all non-microbial life in a complex tropical ecosystem. From 2008-2010 the project sent researchers climbing up jagged peaks, trekking through lush forests and diving down to coral reefs to sample the French Polynesian island's animal and plant life. A library of genetic markers and physical identifiers for every species of plant, animal and fungi on the island is being constructed. This database will be publicly available as a resource for ecologists and evolutionary biologists around the world.

Energy and Smart Grid 


i4energyi4Energy Center

i4Energy is a vibrant community of researchers and innovators, rooted in its three founding institutions: the University of California’s Center for Information Technology Research in the Interest of Society (CITRIS), the California Institute for Energy and Environment (CIEE), and Lawrence Berkeley National Laboratory (LBNL). With partners in industry and government, this powerful research collaboration is focused on creating an integrated information infrastructure that will transform our energy grid into a cooperative, “aware” energy network that is both efficient and able to use sustainable energy sources.


LoCal is a network architecture for localized electrical energy reduction, generation and sharing. It investigates Information Age approaches for managing energy, society's most limited resource. Taking guidance from the design principles of the dominant triumph of the cyber age, the Internet, LoCal investigates how to design an essentially more scalable, flexible and resilient electric power infrastructure. One that encourages efficient use, integrates local generation, and manages demand through omnipresent awareness of energy availability and use over time. 

sMap.Simple Measurement and Actuation Profile (sMAP)

sMAP is a specification for a protocol which easily and quickly exposes and publishes time-series data from a wide variety of sensors simply and flexibly. An enormous amount of physical information; that is, information from and about the world is available today as the cost of communication and instrumentation has fallen. However, making use of that information is still challenging. The information is frequently siloed into proprietary systems, available only in batch, fragmentary, and disorganized. The sMAP project aims to change this by making available and usable a specification for transmitting physical data and describing its contents, a large set of free and open drivers with communicating with devices using native protocols and transforming it to the sMAP profile, and tools for building, organizing, and querying large repositories of physical data.


HWNI cHelen Wills Neuroscience Institute (HWNI)

The Helen Wills Neuroscience Institute (HWNI) is an active, collaborative research community that investigates fundamental questions about how the brain functions. Using approaches from many disciplines (including biophysics, chemistry, cognitive science, computer science, genetics, mathematics, molecular and cell biology, physics, and physiology), they seek to understand how the brain generates behavior and cognition, and how to better understand, diagnose and treat neurological disorders.

BIC - Henry WheelerHenry H. Wheeler, Jr. Brain Imaging Center (BIC)

The Henry H. Wheeler, Jr. Brain Imaging Center is one of four technology centers established under the auspices of Helen Wills Neuroscience Institute. It is a campus-wide resource that supports advance brain imaging technologies dedicated solely to basic brain research.


Seismological Research 

Seismological LabBerkeley Seismological Lab (BSL)

The Berkeley Seismological Lab supports fundamental research into all aspects of earthquakes, solid earth processes and their effects on society through the collection, archival and delivery of high quality geophysical data and through fostering a dynamic research environment that connects researchers across disciplines and to geophysical observations systems.

Social Sciences 

CCRDC California Census Research Data Center (CCRDCs)

The California Census Research Data Centers (CCRDCs) at UCLA and UC Berkeley are two of nine Research Data Centers (RDCs) established by the Center for Economic Studies (CES) of the U.S. Bureau of the Census in order to provide secure physical locations for researchers to study non-public microdata collected by the Census Bureau.

CCI logoCenter for Causal Inference

The Center for Causal Inference and Program Evaluation seeks to further research on developing tools for making causal inferences in the social sciences. The study of causality has become increasing interdisciplinary, and the Center seeks to foster greater dialogue between the various disciplines that are contributing to the growing literature, including political science, economics, statistics, biostatistics, and computer science.

CCSLComputational Cognitive Science Lab

The Computational Cognitive Science Lab's research goal is to understand the computational and statistical foundations of human inductive inference, and using this understanding to develop better accounts of human behavior and better automated systems for solving the challenging computational problems that people solve effortlessly in everyday life. They pursue this goal by analyzing human cognition in terms of optimal or "rational" solutions to computational problems.

DDIData and Democracy Initiative (DDI) 

Founded in 2011, the Data and Democracy Initiative brings creativity and innovation from computer science, electrical engineering, and social media to bear on issues of democracy building and civic participation. DDI collaborates with faculty members and research centers on UC campuses as well as with companies, government agencies, and nonprofit organizations in the United States and internationally. DDI seeks to enhance individual and collective awareness, understanding, and engagement for people of diverse backgrounds on critical social, political, and economic issues.

DLabD-Lab (Social Sciences Data Laboratory)

D-Lab helps Berkeley faculty, staff, and graduate students move forward with world-class research in data intensive social science. They offer a venue for methodological exchange from all corners of campus and across its bounds. D-Lab's signature focus is research design – intelligent, rigorous, and tuned to the transformative opportunities opened up by a data-rich world. They provide cross-disciplinary resources for in-depth consulting and advising, access to staff support, and training and provisioning for software and other infrastructure needs.

xlabExperimental Social Science Laboratory (XLab)

Founded in 2004, the XLab is a laboratory for conducting experiment-based investigations on issues of interest to social scientists. XLab enables researchers to explore the well-springs of human decision-making, especially where it involves decisions with monetary consequences. The XLab is thus an "economics wind-tunnel" whereby social scientists can test out various theories that help us understand economics and other forms of human behavior.


UC Berkeley's data science research activities are coupled to a commitment to bring data science into the educational program. The campus has already established several new courses and degree programs aimed at training a new generation of data scientists. The campus is launching four new data science-inflected graduate degrees in 2012 and 2013 that include a Masters Degree in Data Science at the School of Information, a professional Masters Degree in Engineering with a concentration in Data Intensive Systems in Computer Science, a Ph.D. in Computational Biology, and a one-year Masters Degree program in Statistics emphasizing data science. In addition, UC Berkeley is embedded in the Bay Area innovation ecosystem and has strong connections to partners in higher education, industry, and public service. The university’s leadership is committed to UC Berkeley’s pivotal role in the data science field and has devoted significant resources to support these efforts. 

M.A. degrees:
Ph.D. programs:
Designated Emphasis:
Undergraduate degrees:


Master of Engineering with a concentration in Data Science and Systems
Electrical Engineering and Computer Sciences

The Master of Engineering (M.Eng) is a professional masters designed for students who plan to join the engineering profession immediately following graduation. This accelerated program is designed to develop professional engineering leaders of the future who understand the technical, economic, and social issues of technology. This one-academic year interdisciplinary experience includes three major components: an area of technical concentration, courses in leadership skills, and a rigorous capstone project experience. The concentration in Data Science and Systems prepares students for engineering careers in data-centric industries requiring understanding of data management fundamentals as well as the latest technologies and techniques for the collection, storage, and analysis of information. Read more.

Master of Information and Data Science
School of Information (I School)

Berkeley’s School of Information (I School) will offer the first fully online Master of Information and Data Science degree program, beginning in January 2014. Students will participate in live, face-to-face classes with fellow students and professors via the Web. Classes are small, with no more than 15-20 students. Additional coursework will include lectures, interactive case studies, and collaborative assignments. Classes will use 2U, Inc.’s online platform featuring high-quality I School faculty developed self-paced content and a state-of-the art learning management system. 

I School faculty will teach their curriculum alongside experienced data science professionals. Classes will range from an introduction to machine learning (the intersection of computer science and statistics that focuses on finding patterns in data) and data storage and retrieval to the privacy, security, and ethics of data. Read more.   

Master of Statistics emphasizing Data Science
Department of Statistics

The MA program in Statistics is designed to prepare students for careers in industries that require statistical skills. The focus is on tackling statistical challenges encountered by industry rather than preparing for a PhD.  The program is for full-time students and is designed to be completed in two semesters (fall and spring). In order to obtain the MA in Statistics, admitted students must complete a minimum of 24 units of courses and pass a comprehensive examination. In the first semester, all students will take intensive graduate courses in probability, theoretical statistics, and statistical computing. In the second semester, students will take an advanced course in modern applied statistics, an elective, and a capstone course. Read more


Ph.D. in Computational Biology
Center for Computational Biology (CCB)

The Computational Biology Ph.D. at UC Berkeley will train the next generation of scientists who are interested in exploring the interface of computation and biology, and committed to functioning at a high level in both computational and biological fields. The program emphasizes multidisciplinary competency, interdisciplinary collaboration, and transdisciplinary research, and offers an integrated and customizable curriculum that consists of two semesters of didactic course work tailored to each student’s background and interests, research rotations with faculty mentors spanning computational biology’s core disciplines, and dissertation research jointly supervised by computational and biological faculty mentors. Read more.

Ph.D. in Computer Science
Electrical Engineering and Computer Sciences

The Electrical Engineering and Computer Sciences (EECS) Department offers a Ph.D. degree in Computer Science. The principal requirements for the Ph.D. are (I) coursework (a major field and two minor fields), (II) departmental preliminary requirement (oral exam and breadth courses) which are different for EE and CS, (III) the qualifying exam, and (IV) the dissertation. The EECS Department requires that a student establish a major subject area and two minor subject areas. The median time of completion for the Ph.D. is five and a half years. Read more.

Ph.D in Statistics
Department of Statistics

The Statistics PhD program at UC Berkeley is rigorous, yet welcoming to students with interdisciplinary interests and different levels of preparation. The program requires four semesters of residence. Read more.

Designated Emphasis in Computational Science and Engineering
Various departments across the campus

This Designated Emphasis is intended for Ph.D. students who seek to focus on the mathematical, statistical and computational techniques, to help them solve Computational Science and Engineering (CSE) problems across a wide range of disciplines. The CSE program actively supports the training and multidisciplinary education of scientists, engineers and technical specialists who are experts in relevant areas. Read more


Bachelor of Arts in Computer Science
Electrical Engineering and Computer Sciences

UC Berkeley construes computer science broadly to include complexity theory, the design and analysis of algorithms, machine architecture and logic design, digital devices and circuits, programming systems and languages, operating systems, computer graphics, database systems, and artificial intelligence. The goal is to prepare students both for a possible research career and long-term technical leadership in industry. The B.A. in Computer Science at UC Berkeley is for students enrolled in the College of Letters & Science (L&S). There is no difference in the computer science course content between the B.S. and B.A. programs. The difference is in what else you take: mainly engineering, or mainly humanities and social sciences. In particular, an interest in hardware suggests the EECS route; an interest in double majoring (for example, in math or cognitive science) suggests the L&S route. Read more

Bachelor of Arts in Statistics
Department of Statistics

The undergraduate major in Statistics at UC Berkeley provides a systematic and thorough grounding in applied and theoretical statistics, and in probability. A major in Statistics from Berkeley is an excellent preparation for a career in science, in industry, or as a preparation for further academic study in a wide variety of fields. Read more.

Bachelor of Science in Computer Science and Engineering
Electrical Engineering and Computer Sciences

UC Berkeley's B.S. degree in Computer Science and Engineering (CSE) is offered through the College of Engineering (COE). It combines fundamentals of computer science and electrical engineering in one major. Students working for the B.S. degree select an option within their program and are then assigned an appropriate advisor on the basis of their selection. Read more.