VASC Seminar - Yutong Bai

— 4:30pm

Location:
In Person - Newell-Simon 3305

Speaker:
YUTONG BAI , Postdoctoral Researcher
Berkeley AI Research Lab
University of California, Berkeley

http://yutongbai.com/

Whole-Body Conditioned Egocentric Video Prediction

We train models to Predict Ego-centric Video from human Actions (PEVA), given the past video and an action represented by the relative 3D body pose. By conditioning on kinematic pose trajectories, structured by the joint hierarchy of the body, our model learns to simulate how physical human actions shape the environment from a first-person point of view. We train an auto-regressive conditional diffusion transformer on Nymeria, a large-scale dataset of real-world egocentric video and body pose capture. We further design a hierarchical evaluation protocol with increasingly challenging tasks, enabling a comprehensive analysis of the model's embodied prediction and control abilities. Our work represents an initial attempt to tackle the challenges of modeling complex real-world environments and embodied agent behaviors with video prediction from the perspective of a human. 



Yutong Bai is currently a Postdoc Researcher at UC Berkeley (Berkeley AI Research), advised by Prof. Alexei (Alyosha) Efros, Prof. Jitendra Malik, and Prof. Trevor Darrell. Prior to that, she obtained her PhD in Computer Science at Johns Hopkins University advised by Prof. Alan Yuille. She has interned at Meta AI (FAIR Labs) and Google Brain, and was selected as a 2023 Apple Scholar and an MIT EECS Rising Star. Her work was nominated for the CVPR 2022 Best Paper Award.  

Her research aims to build intelligent systems from first principles—systems that do not merely fit patterns or follow instructions, but that gradually develop structure, abstraction, and behavior through learning itself. She is interested in how intelligence emerges not from handcrafted pipelines or task-specific heuristics, but from exposure to behaviorally rich, understructured environments where models must learn what to attend to, how to reason, and how to improve. This involves designing learning systems that are not narrowly optimized for a single goal, but that can self-organize and grow increasingly competent through interaction, experience, and computation. While she sees scale as a powerful tool, she does not view it as the whole solution: larger models open up capacity, but what fills that capacity—and how it forms—is just as important. Her research explores how to use scale to amplify the right signals—not just data quantity, but the structural richness of behavior and the dynamics of learning itself. 
 

For More Information:
cdowney@andrew.cmu.edu


Add event to Google
Add event to iCal