The Art of Grasping
This article appeared in Berkeley Engineer magazine, Summer 2026
Dozens of impeccably styled art aficionados packed themselves into the di Rosa Museum in San Francisco to reflect on trees, time and technology. A crowd gathered around a lean, bespectacled artist who stood beside a large photograph depicting an industrial robot arm emerging from a large planter lush with greenery. The presenting artist was Ken Goldberg, a roboticist and UC Berkeley’s William S. Floyd Distinguished Professor of Engineering. He explained that the photo depicted the Telegarden, the first interactive robot on the internet.
The Telegarden expanded on a concept Goldberg pioneered in 1994, when he first trained a webcam on a robotic arm and streamed it online. He later tasked the robot with tending a small garden of living plants and allowed website visitors to control the robot by directing where it should plant and water seedlings. In 1995, converting commands from a two-dimensional web interface into something that could be parsed by a robotic arm operating in three dimensions was a major challenge. It was also one of the first marriages of art and engineering on the nascent internet.
“Everybody expects instant gratification on the internet,” Goldberg told his audience. “But nature doesn’t work that way.”
Though the Telegarden debuted more than 30 years ago, its invitation to question expectations may be more salient than ever. Artificial intelligence (AI) evangelists say we are at the threshold of an era of endless leisure as AI agents and robots render white-collar work obsolete. Perhaps they’re right, but Goldberg is skeptical. While he doesn’t believe all-purpose robots will replace human labor anytime soon, his research shows that robotic competence in a variety of controlled tasks is well within reach, but only if we stop expecting robots to evolve just like text-based AI. Goldberg’s research has already proven itself via a commercial package-sorting robot that he helped design, which has successfully sorted over 100 million packages. He believes that the next leap in robotics capability will be the result of anchoring advances in AI training atop a foundation of what he calls “good old-fashioned engineering.”
Unorthodox ideas
Goldberg, a professor of industrial engineering and operations research and of electrical engineering and computer sciences, has been trying to teach robots how to hold things since his college days. He worked with tactile sensors as an undergraduate at the University of Pennsylvania and then pivoted to the mathematics of manipulation for his graduate studies at Carnegie Mellon University (CMU), where he developed an algorithm that would allow a robot to orient a polygonal object without any sensory input. That algorithm earned him an assistant professor position at the University of Southern California, where he developed a theoretical model that would allow a robotic manipulator to automatically identify and rigidly hold any polygonal object. Those early projects on industrial parts handling were cited by the National Academy of Engineering in February when it elected Goldberg as a member.
For Goldberg, unconventional ideas are the key to breakthroughs in engineering, and he’s found that one of the best ways to generate those ideas is by combining science and art. Bringing two very different ways of thinking together has been his mission since he came to UC Berkeley in 1995. Fresh from his success with the Telegarden, Goldberg founded Berkeley’s Art, Technology, and Culture Colloquium, which has gathered artists and technologists for monthly talks on unorthodox projects for almost 30 years. He brought that same spirit to the Berkeley Automation Laboratory (AUTOLab), where he and his students pursue research in robot manipulation and automation.
In addition to advising hundreds of undergrads and 60 Ph.D. students and postdoctoral researchers over the years, Goldberg recruits art-minded engineers to expand AUTOLab’s capabilities. One night in 2015, as Goldberg was listening to a couple of his students jam on guitars in his lab, some other students arrived with a pile of robot parts they’d scavenged from a dumpster and asked Goldberg if they could try to repair them. Getting the dumpster robots running again became a lab-wide project, which informed AUTOLab’s development of the Dexterity Network (Dex-Net) — a novel approach to robot grasping based on deep learning. Though inspired by salvage, their system outperformed the winners of the Amazon Robotics Challenge in 2017.
That unorthodox thinking paid off the next year when Goldberg and students from the AUTOLab were invited to demonstrate their latest robot for Jeff Bezos, the founder of Amazon, at a private robotics and AI event in Palm Springs, California. Their robot had been tested on hundreds of different items, but one of Bezos’ assistants challenged it by tossing a shoe into the sorting area. Goldberg held his breath — they had never tested it on shoes — but the robot gripped the shoe with ease and promptly dropped it in a bin. Goldberg recalls that Bezos told the AUTOLab team that the robot was just the sort of thing that he needed for Amazon.
“As we packed up the robot that night, we all agreed to form a company,” says Goldberg.
Robots for logistics
It felt like perfect timing. Those dumpster-diving students from the AUTOLab — Jeff Mahler (Ph.D.’18 EECS), Stephen McKinley (Ph.D.’16 ME), David Gealy (B.S.’15, M.S.’17 ME) and Matt Matl (M.S.’19 EECS) — were just about to graduate, and their complementary skills as computer scientists and mechanical engineers were ideal for a start-up robotics company. With Goldberg, they founded Ambi Robotics in 2019 and began designing their first package-sorting robot, though the key to making the whole thing work was the robot grasping software.
Ambi’s robot uses a simulation to reality (Sim2Real) AI model, which means that it learns to operate real world robotic manipulators by repeating thousands of 3D simulated manipulations. Such simulated training models require huge datasets of object models. Goldberg and the Ambi Robotics team combined open-source datasets like Dex-Net with their own proprietary data to train the software that controlled their robotic sorting system, AmbiSort. Because it would be working with mostly flat and smooth packages, they fitted the AmbiSort’s robotic arm with vacuum-based suction cups. They boosted reliability by training the robot on simulations that were more challenging than real-world objects.
Today, Ambi’s robots have sorted over 100 million packages for logistics and e-commerce companies. They do excellent work in controlled spaces with a predictable array of packages, but they don’t quite achieve what roboticists refer to as universal picking — the ability to hold and manipulate an unpredictable range of objects with very different shapes, sizes and textures. This is a common challenge for Sim2Real robotic training. While simulation very effectively trains robots to control their own components, it’s not as effective in teaching them to interact with the real world. Models can’t simulate every aspect of physical objects — some qualities of texture, space and orientation are very difficult to replicate. Ambi Robotics addresses this problem by continuing to collect data from the robots as they sort, but the amount of data needed for true universal picking is staggering.
A new class of AI systems for controlling robots are known as vision-language-action models (VLAs), and like the large language models (LLMs) behind text-based AI systems, they largely teach themselves based on the data provided. Simulation is one type of data source, but where LLMs have the internet — which contains so much text that it would take a human 100,000 years to read it all — there is no comparable archive of physical data. There are certainly a lot of videos on the internet, and some researchers are trying to use them to train VLAs, but video contains limited and sometimes misleading information on space, weight and texture. Another way of procuring data is teleoperation, in which a human-operated interface guides a robotic manipulator through thousands of movements, logging success and failure to teach the VLA, a process known as behavior cloning.
“Even if you have a lot of people working eight hours a day, it’s still going to take a long time to collect 100,000 years’ worth of data,” says Goldberg.
A hybrid approach
One afternoon in January, Goldberg checked in with several of his students at the Berkeley Artificial Intelligence Research (BAIR) Lab. They were huddled around a robotic arm, moving a tray around as the arm tried and failed to drop a towel into it. They weren’t taunting the struggling machine but trying to debug its camera. The arm clung to the towel and fruitlessly chased after the moving tray, then it rotated in the opposite direction, seemingly stalling.
Justin Kerr, a Ph.D. student advised by Goldberg, explained that the robot was being trained in a new technique. Rather than accepting visual information passively, they gave the robot’s camera the ability to pan and pivot, and they rewarded the AI system when it focused on key objects. Through reinforcement training, the robotic eye learned to look at specific things. The arm, however, was trained with behavior cloning, which explained its unresponsive behavior. It hadn’t yet learned how to open its pincers and release the towel.
“We’re trying to emulate how human vision emerges,” said Kerr. “The eye behavior is an emergent phenomenon. It figures out how to look around just like humans do.”
Goldberg walked over to his expansive lab packed with various robotic arms and teleoperation rigs. Here, he caught up with different Ph.D. students who were working on the AI behind a new data generation method that combines elements of video-based training with 3D simulation training. Real-to-render-to-real training involves scanning a rigid object, then recording a video of a human performing a task with it. The AI then develops its own 3D model of the object and runs simulations aimed at replicating the human’s movements and finding many other possible paths and trajectories to achieve the same result. In creating its own model, the AI discovers which simulated parameters are necessary for achieving the goal, thus eliminating human modeling effort, which may contain extraneous information. It also generates a large amount of data, which is analogous to the synthetic data currently being fed into LLMs.
One challenge the students have not yet overcome is unreliable outputs — similar to hallucinations in LLMs. Goldberg explained that the hybrid training regimes they are developing may be the key to overcoming such problems, which would allow more robots to leave the lab and make an impact in the wider world. The current predominant zeitgeist amongst roboticists is end-to-end AI — feeding robot AIs massive amounts of data and freeing them to seek out patterns and learn how to operate with little human intervention. Tech titans like Jensen Huang promote the idea that end-to-end AI will soon deliver a ChatGPT moment for robots — a leap in capability like that achieved by LLMs in 2022.
Goldberg believes that emergent behavior is far more likely to come from working robots that are bootstrapped with algorithmic engineering models that have been developed over two centuries. He points to Ambi Robotics and the automated taxi company Waymo, which both develop their robots in this way and are making a visible impact in the real world. In April, Goldberg and his students — together with colleagues from NVIDIA, Stanford University, CMU and University of Texas at Austin — demonstrated that agentic coding for robots has the potential to combine the best of traditional engineering principles with advances in LLMs, VLAs and vision-language models (VLMs).
As he closed the tour of his joint art exhibition with his wife and collaborator, Tiffany Shlain (B.S.’92 Interdisciplinary Studies), Goldberg led the audience to an AI-driven, crowdsourced project that allows anyone to submit pictures of their favorite tree. The photos go to an AI system that he and his students developed, which researches the history of the tree and generates a customized visual and textual tribute to it. While some speculate on the emergence of a superintelligent machine, Goldberg has tasked a machine with encouraging people to get outside and spend more time with trees.
“I love teaching and publishing papers,” says Goldberg. “But I also strive to get things out into the world that will make an impact.”
Learn more: Are we truly on the verge of the humanoid robot revolution? (Berkeley News); The entrepreneurial university (Scientific American)