News

Matei Zaharia Awarded ACM Prize in Computing

April 8, 2026
By: Tiffany Lohwater
headshot of man in glasses
UC Berkeley EECS faculty member Matei Zaharia is the recipient of the 2025 ACM Prize in Computing. (Photo/ Drew Kelly)

Matei Zaharia, associate professor of electrical engineering and computer sciences (EECS) at UC Berkeley, has been awarded the ACM Prize in Computing for his visionary development of distributed data systems and computing infrastructure. In the prize announcement, the Association for Computing Machinery (ACM) noted Zaharia’s development of open-source systems helped enable large-scale machine learning (ML), analytics and AI at a global scale. 

ACM is the world’s largest educational and scientific computing society, and their ACM Prize in Computing recognizes early-to-mid-career computer scientists whose work has had broad and lasting impact. Recipients receive a $250,000 prize, with financial support provided by an endowment from Infosys Ltd.

Enabling open-source computing

Much of Zaharia’s work is in open-source computing and addresses a central challenge: how to work with and efficiently analyze rapidly growing volumes of data, at a scale previously accessible only to large tech companies. Early distributed data systems were limited in speed and poorly suited to emerging workloads for machine learning and interactive analysis. Through their development of open source systems, Zaharia and colleagues changed what any organization could do with massive datasets. Zaharia is co-founder and chief technology officer at Databricks, which was founded by Berkeley researchers in 2013.

“Matei Zaharia’s work has had a lasting impact on how data is used at scale,” said ACM President Yannis Ioannidis. “By addressing key limitations in earlier systems, he developed technologies that quickly became standard tools for data analytics, machine learning and artificial intelligence. Matei’s open-source philosophy has been essential: he made these tools accessible to all. His contributions continue to influence both research and industry, and I look forward to seeing where his current work on AI systems takes us next.”

As a Ph.D. student at Berkeley in 2009, Zaharia started developing Apache Spark, an approach to distributed computing that reliably leverages memory to accelerate computations. This design made Spark dramatically faster than existing frameworks for the iterative computations essential to machine learning, while its unified architecture allowed batch processing, streaming, graph computation and interactive queries to run within a single system. Spark quickly moved from research into widespread use and is now one of the most widely used frameworks for large-scale data analytics, deployed across tens of thousands of organizations and integrated into major cloud platforms. 

For his dissertation on Spark, Zaharia received the 2014 ACM Doctoral Dissertation Award. He’s also received a National Science Foundation CAREER Award, the ACM Special Interest Group on Operating Systems’ Mark Weiser Award, and the Presidential Early Career Award for Scientists and Engineers.

Developing new data architectures

With the shift to the cloud, Zaharia turned to a different problem: the lack of reliability and consistency in sprawling cloud data lakes – or the massive, centralized and often unmanaged repositories storing vast amounts of raw data. He co-developed Delta Lake to bring transactional guarantees and principled data management to cloud object stores, making data pipelines more dependable and enabling a new class of architecture – the data lakehouse – that combines the flexibility of data lakes with the reliability of traditional data warehouses. Delta Lake is now widely adopted across industries, handling exabytes of data daily.

The growing use of machine learning introduced additional complexity. Zaharia developed MLflow – another open-source platform – to address fragmentation in machine learning workflows, where teams struggled to track experiments, reproduce results and deploy models consistently. MLflow provided a structured framework for managing the ML lifecycle – from experiment tracking and model versioning to deployment across diverse tools and environments – and has become a leading platform for operationalizing AI at scale. Together, these systems reshaped how data is leveraged in practice.

By building tools that any organization could freely use and extend, Zaharia ensured that the benefits of scalable computing became accessible to researchers, nonprofits and enterprises across every industry. As investment in AI accelerates, the infrastructure he built remains key to how data is processed, managed and used to train and deploy AI applications and agents.

Zaharia’s current research is focused on AI development, specifically how to build and scale reliable agents. He is a co-author on recent open source research, including DSPy and GEPA, which focuses on auto-optimizing prompts and models to improve agent quality for specific tasks.

Zaharia will be formally presented with the ACM Prize in Computing on June 13 at the ACM Awards Banquet in San Francisco.

For more information