AI Agents That Do What We Want

November 29, 2023
By: Jackie Brown
Anca Dragan describing her research during the CITRIS Research Exchange
Anca Dragan, UC Berkeley professor of Electrical Engineering and Computer Science, spoke at UC Berkeley on October 4, 2023, about her research in artificial intelligence. She was the third in a series on the Future of AI hosted by the CITRIS Research Exchange, the Berkeley College of Computing, Data Science and Society, and the Berkeley AI Research (BAIR) Lab. In the presentation titled “AI Agents That Do What We Want: Progress and Open Challenges,” Dragan covered how lessons learned in robotics concerning the importance of training and positive and negative reinforcement can apply to other virtual agents like large language models. Watch a recording of her talk here.

Objectives in robotics

Dragan began her talk by highlighting the evolution of objectives in robotics. During her graduate program, the focus was primarily on building robots that could complete tasks, but little attention was paid to defining clear objectives for these robots. As a result, early robots had to operate with simplified and often dumbed-down objectives to be optimized effectively.

Over time, the field of AI and robotics made significant advancements in optimization, models, and reinforcement learning, notably with breakthroughs like AlphaGo. However, Dragan points out that manipulation tasks, such as robots pouring coffee, remained challenging. She emphasized the importance of improving objectives to enable robots to understand not just what to do but also how to do it efficiently and safely.

image of woman on stage giving a talk


The pitfalls of reward functions

A central theme in Dragan’s talk is the concept of reward functions. These functions, often referred to as the “lingo for robots,” define what the AI system should aim to achieve. However, Dragan highlighted how specified rewards can be misspecified by humans, leading to unintended and sometimes undesired behavior. She presented the example of Disney’s “The Sorcerer’s Apprentice” in Fantasia, where the broom assistant was instructed to pour water but ended up flooding the room because the spell did not tell it when to stop. This example underscores the importance of refining reward functions through trial and error. 

This issue becomes more complex when robots attempt to optimize objectives that involve multiple parameters, such as safety, efficiency, and legality. Dragan discussed scenarios where robots may prioritize efficiency over courtesy, which can lead to unsafe or uncomfortable interactions, highlighting the importance of incorporating all relevant factors into reward functions.

The challenge of noisy irrationality

Dragan explained how humans are not entirely rational. We often exhibit suboptimal behaviors which pose a significant challenge for aligning AI systems with human behavior. For instance, Dragan mentioned a study that found social media platforms like Twitter, now X, can amplify anger and polarization. The X algorithm was found to suggest tweets that were likely to prompt emotional or angry responses, especially with political topics. 

“How do we encourage tech awareness from a systemic level in academia?”

Education cannot be focused on one topic like it used to; the educational model must adapt to the complex environment. The UC Berkeley College of Computing, Data Science, and Society is a good example of a new institution that blends interdisciplinary research and learning. Dragan hopes to teach a graduate-level course about AI models meeting humans in the future. 

About Anca Dragan

Anca Dragan is an associate professor at UC Berkeley in the Electrical Engineering and Computer Science department. Dragan runs the InterACT Lab which focuses on developing algorithms for human-robot interactions. The lab’s researchers focus on numerous projects including optimal control, game theory, reinforcement learning, Bayesian inference, and cognitive science. Dragan is also involved with the Berkeley AI Research (BAIR) Lab and is a co-principal investigator at the Center for Human-Compatible AI. She has received prestigious honors, including the Sloan Fellowship, MIT Innovators Under 35 (TR35), the Okawa Prize, an NSF CAREER award, and the Presidential Early Career Awards for Scientists and Engineers (PECASE).

About CITRIS and the Banatao Institute

The CITRIS Research exchange will continue in the spring with new guest speakers and rescheduled appearance for Timnit Gebru, originally slated to present in the fall. See the Future of AI series recordings here.