Kathy Yelick Testifies on 'Big Data Challenges and Advanced Computing Solutions'
Kathy Yelick, Associate Laboratory Director for Computing Sciences at Berkeley Lab, was one of four witnesses testifying before the U.S. House of Representatives’ Committee on Science, Space, and Technology at 7 a.m. PDT / 10 a.m. EDT on Thursday, July 12. The discussion focused on big-data challenges and advanced computing solutions.
Data-driven scientific discovery is poised to deliver breakthroughs across many disciplines, and the U.S. Department of Energy, through its national laboratories, is well-positioned to play a leadership role in this revolution. Driven by DOE innovations in instrumentation and computing, however, the scientific data sets being created are becoming increasingly challenging to sift through and manage.
Big data challenges are often characterized by four Vs: volume (the total size), velocity (the speed at which it is being produced), variability (the diversity of data types) and veracity (noise, errors and other quality issues). Scientific data has all of these, and DOE’s user facilities are a big source of the challenges and opportunities to use large data sets for new discoveries due to increasing data rates, reduced costs of collecting data and total data volumes.
Machine learning represents a promising approach for analytics in science, complementing but not replacing modeling and simulation. In her testimony, Yelick discussed the emerging role of machine-learning methods that have revolutionized the field of artificial intelligence and may similarly impact scientific discovery. She talked about how Berkeley Lab and other national laboratories are applying machine learning tools and techniques to better analyze these data sets and empower scientists to ask and answer increasingly complex questions.
“Machine learning has revolutionized the field of artificial intelligence and it requires three things: Large amounts of data, fast computers and good algorithms," Yelick stated. "DOE has all of these.”
Other key points in her testimony included:
- Examples of large-scale scientific data challenges in the DOE Office of Science, such as analyzing billions of microbes in complicated communities or millions of supernovae millions of light years away
- The unique opportunities for machine learning in science, leveraging DOE’s national role as a leader in high performance computing, applied mathematics, user facilities, and interdisciplinary team science
- A vision for the national laboratories that includes foundational research in data science and an interconnected network of experimental and computational facilities to address some of the most challenging data analytics problems in science.