PVG Seminar Series

The Physical Vision Group seminar series brings together researchers in computer vision, machine learning, graphics, robotics, Spatial AI and physical AI.

Upcoming Seminars

Talks, guest lectures, and group-wide research discussions hosted by PVG.

Wednesday, 15/04/2026, 10:00

Xingjian Bai, PhD Student (supervised by Prof. Kaiming He), MIT

End-to-End Training for Unified Tokenization and Latent Denoising

Abstract

Training state-of-the-art latent diffusion models requires complex staging: a tokenizer must first be trained before a diffusion model can operate in its frozen latent space. We propose UNITE — an architecture for unified tokenization and latent diffusion. A single Generative Encoder serves as both image tokenizer and latent generator via weight sharing, trained in a single stage that jointly optimizes both tasks. UNITE learns a common latent language for tokenization and generation.

📍 MICL Meeting Room (N4-B1c-17)🗓️ Virtual seminar💻 Zoom Link

Past Seminars

Archived talks and research meetings, grouped by year for easy browsing.

2026

Wednesday, 01/04/2026, 13:00

Andrea Vedaldi, Professor, VGG, University of Oxford

The new 3D

Abstract

In this talk, I will discuss our progress in making 3D a first-class citizen in AI. I will argue that understanding 3D structures is a prerequisite for understanding, acting, and creating in the physical world. I will then explore how a new generation of 3D foundation models has changed the way we approach 3D reconstruction. I will show that these ideas extend to novel-view synthesis as well, benefiting from transformers trained for 3D reconstruction. Next, I will consider 4D reconstruction, where the goal is to recover 3D shape and motion from videos, and demonstrate how transformers can be further extended to this task via dynamic point maps. I will then examine the role of scale in learning these models, introducing VGGT-Omega, a new, more efficient architecture trained on 15× more data than our previous model. I will discuss the importance and challenges of 3D data engineering and demonstrate significant performance improvements on benchmarks. Finally, I will address the problem of 3D generation for content creation, simulation, and design. I will emphasize that generating 3D shape alone is insufficient, and discuss approaches to compositional and articulated generation as well.

📍 Tutorial Room 1 (TR1, NS4-05-79)🗓️ In-person seminar

Wednesday, 04/02/2026, 17:00

Edgar Sucar, Postdoctoral Fellow, VGG, University of Oxford

A Unified Model for 4D Reconstruction

Abstract

In this talk I will give an overview of our recent V-DPM: 4D Video Reconstruction with Dynamics Point Maps. In this work we tackle the goal of 4D feed-forward reconstruction with an emphasis on versatility, the ability to work in diverse in-the-wild videos. I will go through the evolution of designs to reach this point, starting with the original version on Dynamic Point Maps which built on top of DUSt3R and processed pairs of frames. I will motivate why we want and end-to-end model for 4D reconstructions compared to decoupled designs common in 3D trackers. I will then describe how we build on top of a pre-trained model for static scenes VGGT, to adapt it to 4D reconstructions with a training recipe that does not need massive data, showing some of the early designs that failed. Finally will show results highlighting the versatility of the method but also the current limitations for future work.

📍 MICL Meeting Room (N4-B1c-17)🗓️ Virtual seminar💻 Zoom Link📄 Slides (PDF)