大会嘉宾

- Keynote Speakers -

Marc Pollefeys

Professor, ETH Zurich

Title: Spatial AI to assist humans and enable robots

Bio: Marc Pollefeys is a Professor of Computer Science at ETH Zurich and the Director of the Microsoft Spatial AI Lab in Zurich where he works with a team of scientists and engineers to develop advanced perception capabilities for AI assistants and robotic agents. He is a Fellow of IEEE, ACM, AAIA and ELLIS, as well as a member of the Academia Europaea. He obtained his PhD from the KU Leuven in 1999 and was a professor at UNC Chapel Hill before joining ETH Zurich.He is best known for his work in 3D computer vision, having been the first to develop a software pipeline to automatically turn photographs into 3D models, but also works on robotics, graphics and machine learning problems. Other noteworthy projects he worked on are real-time 3D scanning with mobile devices (2013), a real-time pipeline for 3D reconstruction of cities from vehicle mounted-cameras (2007), camera-based self-driving cars and the first fully autonomous vision-based drone (2012). Most recently his academic research has focused on combining 3D reconstruction with semantic scene understanding.

Marc Pollefeys

Professor, ETH Zurich

张皓（Richard Zhang）

Professor, Simon Fraser University

Title: Discovering the Right Representations for 3D Vision

Bio: Hao (Richard) Zhang is a professor in the School of Computing Science at Simon Fraser University, Canada. He is a Fellow of the IEEE, holds a Distinguished University Professorship, and is an Amazon Scholar. Richard earned his Ph.D. from the University of Toronto, and MMath and BMath degrees from the University of Waterloo. His research is in computer graphics and visual computing with special interests in geometric and generative modeling, shape analysis, 3D vision, geometric deep learning, as well as computational design and fabrication. Awards won by Richard include a Canadian Human-Computer Communications Society Achievement Award in Computer Graphics (2022), a Google Faculty Award (2019), an NSERC Discovery Accelerator Supplement Award (2014), and a Best Dataset Award from ChinaGraph (2020). He and his students have won the CVPR 2020 Best Student Paper Award and Best Paper Awards at Symposium on Geometry Processing 2008 and CAD/Graphics 2017. Richard has served as an editor-in-chief for Computer Graphics Forum (2014-2018), the Technical Papers Assistant Chair for SIGGRAPH Asia 2024, paper co-chairs for SGP 2013, GI 2015, and CGI 2018, and a conference chair for International Geometry Summit 2019. Richard is the Technical Papers Chair for SIGGRAPH 2025.

张皓（Richard Zhang）

Professor, Simon Fraser University

Kristen Grauman

Professor, UT-Austin

Title: 4D Activity Understanding in Egocentric Video

Bio: Kristen Grauman is a Professor in the Department of Computer Science at the University of Texas at Austin. Her research in computer vision and machine learning focuses on visual recognition, video, and embodied perception. Before joining UT-Austin in 2007, she received her Ph.D. at MIT. She is an IEEE Fellow, AAAS Fellow, AAAI Fellow, Sloan Fellow, and recipient of the 2013 Computers and Thought Award. She and her collaborators have been recognized with several Best Paper awards in computer vision, including a 2011 Marr Prize and a 2017 Helmholtz Prize (test of time award). She has served as Associate Editor-in-Chief for PAMI and Program Chair of CVPR 2015, NeurIPS 2018, and ICCV 2023.

Kristen Grauman

Professor, UT-Austin

David Forsyth

Professor, UIUC

Title: What do image generators know?

Bio: I am currently Fulton-Watson-Copp chair in computer science at U. Illinois at Urbana-Champaign, where I moved from U.C Berkeley, where I was also full professor. I have occupied the Fulton-Watson-Copp chair in Computer Science at the University of Illinois since 2014. I have published over 170 papers on computer vision, computer graphics and machine learning. I have served as program co-chair for IEEE Computer Vision and Pattern Recognition in 2000, 2011, 2018 and 2021, general co-chair for CVPR 2006 and 2015 and ICCV 2019, program co-chair for the European Conference on Computer Vision 2008, and am a regular member of the program committee of all major international conferences on computer vision. I have served six years on the SIGGRAPH program committee, and am a regular reviewer for that conference. I have received best paper awards at the International Conference on Computer Vision and at the European Conference on Computer Vision. I received an IEEE technical achievement award for 2005 for my research. I became an IEEE Fellow in 2009, and an ACM Fellow in 2014. My textbook, "Computer Vision: A Modern Approach" (joint with J. Ponce and published by Prentice Hall) is now widely adopted as a course text (adoptions include MIT, U. Wisconsin-Madison, UIUC, Georgia Tech and U.C. Berkeley). A further textbook, “Probability and Statistics for Computer Science”, is in print; yet another (“Applied Machine Learning”) has just appeared. I have served two terms as Editor in Chief, IEEE TPAMI. I have served on a number of scientific advisory boards.

David Forsyth

Professor, UIUC

Yasuyuki Matsushita

Sr. Director, Microsoft Research Asia - Tokyo

Title： Making sense of the real-world via 3D Computer Vision

Bio: Yasuyuki Matsushita is a Senior Director of Microsoft Research Asia - Tokyo since 2024. He received his B.S., M.S. and Ph.D. degrees in EECS from the University of Tokyo in 1998, 2000, and 2003, respectively. From April 2003 to March 2015, he was with Visual Computing group at Microsoft Research Asia. From April 2015 to September 2024, he was a Professor at Osaka University. His research area includes computer vision, machine learning and optimization. He is an Editor-in-Chief of International Journal of Computer Vision (IJCV) and is/was on editorial board of IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), The Visual Computer journal, IPSJ Transactions on Computer Vision Applications (CVA), and Encyclopedia of Computer Vision. He served/is serving as a Program Co-Chair of PSIVT 2010, 3DIMPVT 2011, ACCV 2012, ICCV 2017, and a General Co-Chair for ACCV 2014 and ICCV 2021. He has won the Osaka Science Prize in 2022. He is a Fellow of IEEE and a member of IPSJ.

Yasuyuki Matsushita

Sr. Director, Microsoft Research Asia - Tokyo

Andrea Vedaldi

Professor, University of Oxford

Title： Towards a 3D foundation of AI

Bio: Andrea Vedaldi is Professor of Computer Vision and Machine Learning at the University of Oxford, where he co-leads the Visual Geometry Group since 2012. He is also a senior research scientist and technical lead at Meta. He researches generative AI in 3D computer vision, applied to the generation of 3D content from text and images and to image understanding. He is the author of more than 200 peer-reviewed publications in computer vision and machine learning. He is the recipient of the IEEE Thomas Huang Memorial Prize, the IEEE Mark Everingham Prize, and the Test of Time Award by the ACM, and the best paper award from the Conference on Computer Vision and Pattern Recognition.

Andrea Vedaldi

Professor, University of Oxford

Marc Pollefeys，Professor, ETH Zurich
Bio: Marc Pollefeys is a Professor of Computer Science at ETH Zurich and the Director of the Microsoft Spatial AI Lab in Zurich where he works with a team of scientists and engineers to develop advanced perception capabilities for AI assistants and robotic agents. He is a Fellow of IEEE, ACM, AAIA and ELLIS, as well as a member of the Academia Europaea. He obtained his PhD from the KU Leuven in 1999 and was a professor at UNC Chapel Hill before joining ETH Zurich.He is best known for his work in 3D computer vision, having been the first to develop a software pipeline to automatically turn photographs into 3D models, but also works on robotics, graphics and machine learning problems. Other noteworthy projects he worked on are real-time 3D scanning with mobile devices (2013), a real-time pipeline for 3D reconstruction of cities from vehicle mounted-cameras (2007), camera-based self-driving cars and the first fully autonomous vision-based drone (2012). Most recently his academic research has focused on combining 3D reconstruction with semantic scene understanding.

Title: Spatial AI to assist humans and enable robots
Abstract: In this talk we’ll discuss how to build rich 3D representations of the environment to assist people and robots to perform tasks. We’ll first discuss how to build visual 3D maps of environments and use those for visual (re)localization, spatial data access and navigation. We’ll cover recent methods based on geometry, learning and combining both. One of the questions we will consider is what is best learned and where we should use explicit geometric concepts. We’ll also discuss how to build rich 3D semantic representations that enable queries and interactions with the scene. Our approach allows open vocabulary queries by leveraging foundation models. While these models are very powerful in recognizing arbitrary objects, there are some aspects that are still missing to enable robotic interactions. We’ll also briefly cover some of our work on action recognition which is key in building AI assistants and could also be useful to enable robots to learn from examples.

张皓（Richard Zhang）, Professor, Simon Fraser University
Bio: Hao (Richard) Zhang is a professor in the School of Computing Science at Simon Fraser University, Canada. He is a Fellow of the IEEE, holds a Distinguished University Professorship, and is an Amazon Scholar. Richard earned his Ph.D. from the University of Toronto, and MMath and BMath degrees from the University of Waterloo. His research is in computer graphics and visual computing with special interests in geometric and generative modeling, shape analysis, 3D vision, geometric deep learning, as well as computational design and fabrication. Awards won by Richard include a Canadian Human-Computer Communications Society Achievement Award in Computer Graphics (2022), a Google Faculty Award (2019), an NSERC Discovery Accelerator Supplement Award (2014), and a Best Dataset Award from ChinaGraph (2020). He and his students have won the CVPR 2020 Best Student Paper Award and Best Paper Awards at Symposium on Geometry Processing 2008 and CAD/Graphics 2017. Richard has served as an editor-in-chief for Computer Graphics Forum (2014-2018), the Technical Papers Assistant Chair for SIGGRAPH Asia 2024, paper co-chairs for SGP 2013, GI 2015, and CGI 2018, and a conference chair for International Geometry Summit 2019. Richard is the Technical Papers Chair for SIGGRAPH 2025.

Title: Discovering the Right Representations for 3D Vision
Abstract: One of the major advances in 3D vision in recent years, NeRF, has pushed the boundaries in many areas dominated by AI. Yet, its "key insight may actually simply be in the idea of a continuous volumetric representation," according to one of NeRF's authors. Despite their popularity, 3D Gaussian splatting models do not represent how our 3D worlds are built, nor would they offer the best support for robots in manipulation or collaborative tasks. CAD representations, on the other hand, are likely more suitable, considering that the robots themselves have been predominantly designed in CAD software. Unlike images or text, 3D objects are not confined to one standard representation. For many 3D vision tasks, discovering and learning the right representation is often the key ingredient for success. In this talk, I will highlight several such examples aimed at addressing some of the main challenges in 3D vision, including input sparsity, occlusion, geometric and structural variations, and mimicking human functions. Several of our works on learning multi-view, layered, structural, or motion-/interaction-aware functional representations will be covered, with applications spanning 3D vision, GenAI, and robotics. Most of these works are contributing to a concerted effort in building a foundational model for robotics, for which robotics itself can play a critical role through active 3D reconstruction. I conclude my talk with our latest work on real-time spatial reasoning by mobile robots for 3D reconstruction and navigation in dynamic scenes, trying to replicate how humans and most animals accomplish this biological feat using the internal GPS in their brains.