Kristen Grauman,Professor, UT-Austin
Bio: Kristen Grauman is a Professor in the Department of Computer Science at the University of Texas at
Austin. Her research in computer vision and machine learning focuses on video, visual recognition, and
action for perception or embodied AI. Before joining UT-Austin in 2007, she received her Ph.D. at MIT.
She is an IEEE Fellow, AAAS Fellow, AAAI Fellow, Sloan Fellow, a Microsoft Research New Faculty Fellow,
and a recipient of NSF CAREER and ONR Young Investigator awards, the PAMI Young Researcher Award in 2013,
the 2013 Computers and Thought Award from the International Joint Conference on Artificial Intelligence
(IJCAI), the Presidential Early Career Award for Scientists and Engineers (PECASE) in 2013. She was
inducted into the UT Academy of Distinguished Teachers in 2017. She and her collaborators have been
recognized with several Best Paper awards in computer vision, including a 2011 Marr Prize and a 2017
Helmholtz Prize (test of time award). She served for six years as an Associate Editor-in-Chief for the
Transactions on Pattern Analysis and Machine Intelligence (PAMI) and for ten years as an Editorial Board
member for the International Journal of Computer Vision (IJCV). She also served as a Program Chair of the
IEEE Conference on Computer Vision and Pattern Recognition (CVPR) in 2015, Neural Information Processing
Systems (NeurIPS) in 2018, and IEEE International Conference on Computer Vision (ICCV) in 2023.
Title:
4D Activity Understanding in Egocentric Video
Abstract:
The first-person or “egocentric” perspective offers a special window into an agent’s attention, goals, and
interactions, making it an exciting avenue for the future of both augmented reality and robot learning.
This talk will describe our recent explorations for 4D first-person perception, motivated by learning
about human skills from video. Key challenges are fine-grained activity understanding and relating first-
and third- (actor and observer) perspectives. Towards addressing these challenges, we introduce new ideas
for learning view-invariant video representations, dynamically selecting informative viewpoints, and
anticipating behavior in 4D. I’ll also overview how we are advancing the frontier of egocentric
perception for the broader community via large-scale open-sourced datasets called Ego4D and
Ego-Exo4D—multi-year, multi-institutional efforts to capture daily-life and skilled activity of people
around the world.