Physical Vision Group

2025

Geo4D: Leveraging Video Generators for Geometric 4D Scene Reconstruction

ICCV, 2025

Zeren Jiang, Chuanxia Zheng, Iro Laina, Diane Larlus, Andrea Vedaldi

A method to repurpose video diffusion models for monocular 3D reconstruction of dynamic scenes.

🌐 website

arxiv code

DSO: Aligning 3D Generators with Simulation Feedback for Physical Soundness

ICCV, 2025

Ruining Li, Chuanxia Zheng, Christian Rupprecht, Andrea Vedaldi

A feed-forward approach to increase the likelihood that the 3D generator outputs stable 3D objects directly.

🌐 website

arxiv code 🔗 data

Amodal3R: Amodal 3D Reconstruction from Occluded 2D Images

ICCV, 2025

Tianhao Wu, Chuanxia Zheng, Frank Guan, Andrea Vedaldi, Tat-Jen Cham

A model for completed 3D reconstruction from partially visible inputs.

🌐 website

arxiv code 🔗 demo

Puppet-Master: Scaling Interactive Video Generation as a Motion Prior for Part-Level Dynamics

ICCV, 2025

Ruining Li, Chuanxia Zheng, Christian Rupprecht, Andrea Vedaldi

An interactive video generative model that can serve as a motion prior for part-level dynamics.

🌐 website

arxiv code 🔗 data 🔗 demo

Semantix: An Energy Guided Sampler for Semantic Style Transfer

ICLR, 2025

Huiang He, Minghui Hu, Chuanxia Zheng, Chaoyue Wang, Tat-Jen Cham

Semantic Style Transfer that simultaneously guides both style and appearance transfer.

🌐 website pdf

arxiv code

Flash3D: Feed-Forward Generalisable 3D Scene Reconstruction from a Single Image

3DV, 2025

Stanislaw Szymanowicz*, Eldar Insafutdinov*, Chuanxia Zheng*, Dylan Campbell, João Henriques, Christian Rupprecht, Andrea Vedaldi

A fast, super efficient, trainable on a single GPU in one day for scene 3D reconstruction from a single image.

🌐 website

arxiv code

One-shot Human Motion Transfer via Occlusion-Robust Flow Prediction and Neural Texturing

TMM, 2025

Yuzhu Ji, Chuanxia Zheng, Tat-Jen Cham

A unified framework to recover better 2D appearance and 2.5D geometry.

pdf

arxiv code

2024

Splatt3R: Zero-shot Gaussian Splatting from Uncalibrated Image Pairs

arXiv, 2024

Brandon Smart, Chuanxia Zheng, Iro Laina, Victor Adrian Prisacariu

A Feed-Forward model that reconstruct 3D structure and apperance with uncalibrated images.

🌐 website

arxiv code 🔗 demo

Explicit Correspondence Matching for Generalizable Neural Radiance Fields

arXiv, 2024

Yuedong Chen, Haofei Xu, Qianyi Wu, Chuanxia Zheng, Tat-Jen Cham, Jianfei Cai

Employing explicit correspondence matching as a geometry prior enables NeRF to generalize across scenes.

🌐 website

arxiv code

A General Protocol to Probe Large Vision Models for 3D Physical Understanding

NeurIPS, 2024

Guanqi Zhan, Chuanxia Zheng, Weidi Xie, Andrew Zisserman

Probe large vision models to determine to what extent they 'understand' different physical properties in an image.

🌐 website pdf

arxiv code

MVSplat360: Feed-Forward 360 Scene Synthesis from Sparse Views

NeurIPS, 2024

Yuedong Chen, Chuanxia Zheng, Haofei Xu, Bohan Zhuang, Andrea Vedaldi, Tat-Jen Cham, Jianfei Cai

A feed-forward approach for 360-degree scene-level novel view synthesis using only sparse observations.

🌐 website pdf

arxiv code

MVSplat: Efficient 3D Gaussian Splatting from Sparse Multi-View Images

ECCV, 2024(oral)

Yuedong Chen, Haofei Xu, Chuanxia Zheng, Bohan Zhuang, Marc Pollefeys, Andreas Geiger, Tat-Jen Cham, Jianfei Cai

A feed-forward approach for efficiently predicting 3D Gaussians from sparse multi-view images in a single forward pass.

🌐 website pdf

arxiv code

DragAPart: Learning a Part-Level Motion Prior for Articulated Objects

ECCV, 2024

Ruining Li, Chuanxia Zheng, Christian Rupprecht, Andrea Vedaldi

A physical interaction with objects in vision for part-level dragging.

🌐 website pdf

arxiv code

ClusteringSDF: Self-Organized Neural Implicit Surfaces for 3D Decomposition

ECCV, 2024

Tianhao Wu, Chuanxia Zheng, Tat-Jen Cham, Qianyi Wu

A self-organized 3D segmentation/decomposition model via neural implicit surface representation.

🌐 website pdf

arxiv code

Free3D: Consistent Novel View Synthesis without 3D Representation

CVPR, 2024

Chuanxia Zheng, Andrea Vedaldi

A method to synthesize consistent novel views from a single image on open-set categories without the need of explicit 3D representations.

🌐 website pdf

arxiv 🎥 video code

Amodal Ground Truth and Completion in the Wild

CVPR, 2024

Guanqi Zhan, Chuanxia Zheng, Weidi Xie, Andrew Zisserman

Setting up a Stable Diffusion based network to solve the amodal completion problem for any category and without occluder mask provided.

🌐 website pdf

arxiv code

A Versatile Plug-and-Play Module for Rectifying Diffusion Schedule Flaws and Enhancing Low-Frequency Controls

CVPR, 2024

Minghui Hu, Jianbin Zheng, Chuanxia Zheng, Chaoyue Wang, Dacheng Tao, Tat-Jen Cham

A versatile plug-and-play module to fix the scheduler flaws for diffusion models.

🌐 website pdf

arxiv code 🔗 demo

PanoDiffusion: 360-degree Panorama Outpainting via Diffusion

ICLR, 2024

Tianhao Wu, Chuanxia Zheng, Tat-Jen Cham

An indoor panorama outpainting model using latent diffusion models with view-consistent.

🌐 website pdf

arxiv code

Bridging Global Context Interactions for High-Fidelity Pluralistic Image Completion

T-PAMI, 2024

Chuanxia Zheng, Guoxian Song, Tat-Jen Cham, Jianfei Cai, Linjie Luo, Dinh Phung

PICFormer achieves pluralistic image completion with multiple and diverse solutions using a transformer based architecture.

pdf 🎥 video code

2023

Cocktail🍸: Mixing Multi-Modality Controls for Text-Conditional Image Generation

NeurIPS, 2023

Minghui Hu, Jianbin Zheng, Daqing Liu, Chuanxia Zheng, Chaoyue Wang, Dacheng Tao, Tat-Jen Cham

A generalized framework for multi-modality control based on text-to-image generation.

🌐 website pdf

arxiv 🎥 video code

Online clustered codebook

ICCV, 2023

Chuanxia Zheng, Andrea Vedaldi

A simple approach to avoid codebook collapse and achive 100% codebook utilisation.

pdf

arxiv 🎥 video code

Vector Quantized Wasserstein Auto-Encoder

ICML, 2023

Long Tung Vuong, Trung Le, He Zhao, Chuanxia Zheng, Mehrtash Harandi, Jianfei Cai, Dinh Phung

Minimize the codebook-data distortion as the Wasserstein distance.

pdf

arxiv

UniD3: Unified Discrete Diffusion for Simultaneous Vision-Language Generation

ICLR, 2023

Minghui Hu, Chuanxia Zheng, Heliang Zheng, Tat-Jen Cham, Chaoyue Wang, Zuopeng Yang, Dacheng Tao, P.N.Suganthan

A unified discrete diffusion model for simultaneous vision-language generation.

🌐 website pdf

arxiv code

2022

MoVQ: Modulating Quantized Vectors for High-Fidelity Image Generation

NeurIPS, 2022(spotlight)

Chuanxia Zheng, Long Tung Vuong, Jianfei Cai, Dinh Phung

A spatially conditional normalization is introduced to address the repeated artifacts in vector quantized methods.

pdf

arxiv 🔗 code(Kandinsky2)

Object-Compositional Neural Implicit Surfaces

ECCV, 2022

Qianyi Wu, Xian Liu, Yuedong Chen, Kejie Li, Chuanxia Zheng, Jianfei Cai, Jianmin Zheng

Automatically decompose a scene into 3D instance, trained using only 2D semantic lables and images.

🌐 website pdf

arxiv code

Sem2NeRF: Converting Single-View Semantic Masks to Neural Radiance Fields

ECCV, 2022

Yuedong Chen, Qianyi Wu, Chuanxia Zheng, Tat-Jen Cham, Jianfei Cai

A model that transfers the 2D semantic map into 3D NeRF, and lets users edit 3D model through 2D semantic input.

🌐 website pdf

arxiv 🎥 video code

Bridging global context interactions for high-fidelity image completion

CVPR, 2022

Chuanxia Zheng, Tat-Jen Cham, Jianfei Cai, Dinh Phung

TFill fills in reasonable contents for both foreground object removal and content completion.

🌐 website pdf

arxiv 🎥 video code

2021

Visiting the Invisible: Layer-by-Layer Completed Scene Decomposition

IJCV, 2021

Chuanxia Zheng, Duy-Son Dao, Guoxian Song, Tat-Jen Cham, Jianfei Cai

A high-level scene understanding system that simultaneously models the completed shape and appearance for all instances.

pdf

arxiv 🎥 video code

AgileGAN: Stylizing Portraits by Inversion-Consistent Transfer Learning

SIGGRAPH, 2021

Guoxian Song, Linjie Luo, Jing Liu, Wan-Chun Ma, Chuanxia Zheng, Tat-Jen Cham

A GAN inversion model is trained for Stylizing Portraits.

🌐 website pdf 🎥 video code

The Spatially-Correlative Loss for Various Image Translation Tasks

CVPR, 2021

Chuanxia Zheng, Tat-Jen Cham, Jianfei Cai

A novel spatially-correlative loss that is simple, efficient and yet effective for preserving scene structure consistency while supporting large appearance changes during unpaired I2I translation.

🌐 website pdf

arxiv 🎥 video code

2019

Pluralistic (Free-Form) Image Completion

CVPR, 2019IJCV, 2021

Chuanxia Zheng, Tat-Jen Cham, Jianfei Cai

Given a masked image, the proposed pic model is able to generate multiple and diverse plausible results.

🌐 website pdf

arxiv 🎥 video code

2018

T2Net: Synthetic-to-Realistic Translation for Depth Estimation Tasks

ECCV, 2018

Chuanxia Zheng, Tat-Jen Cham, Jianfei Cai

Without using any real depth map, the proposed model evaluates depth maps on real scenes using only synthetic datasets.

pdf

arxiv 🎥 video code

Publications

2025

Geo4D: Leveraging Video Generators for Geometric 4D Scene Reconstruction

DSO: Aligning 3D Generators with Simulation Feedback for Physical Soundness

Amodal3R: Amodal 3D Reconstruction from Occluded 2D Images

Puppet-Master: Scaling Interactive Video Generation as a Motion Prior for Part-Level Dynamics

Semantix: An Energy Guided Sampler for Semantic Style Transfer

Flash3D: Feed-Forward Generalisable 3D Scene Reconstruction from a Single Image

One-shot Human Motion Transfer via Occlusion-Robust Flow Prediction and Neural Texturing

2024

Splatt3R: Zero-shot Gaussian Splatting from Uncalibrated Image Pairs

Explicit Correspondence Matching for Generalizable Neural Radiance Fields

A General Protocol to Probe Large Vision Models for 3D Physical Understanding

MVSplat360: Feed-Forward 360 Scene Synthesis from Sparse Views

MVSplat: Efficient 3D Gaussian Splatting from Sparse Multi-View Images

DragAPart: Learning a Part-Level Motion Prior for Articulated Objects

ClusteringSDF: Self-Organized Neural Implicit Surfaces for 3D Decomposition

Free3D: Consistent Novel View Synthesis without 3D Representation

Amodal Ground Truth and Completion in the Wild

A Versatile Plug-and-Play Module for Rectifying Diffusion Schedule Flaws and Enhancing Low-Frequency Controls

PanoDiffusion: 360-degree Panorama Outpainting via Diffusion

Bridging Global Context Interactions for High-Fidelity Pluralistic Image Completion

2023

Cocktail🍸: Mixing Multi-Modality Controls for Text-Conditional Image Generation

Online clustered codebook

Vector Quantized Wasserstein Auto-Encoder

UniD3: Unified Discrete Diffusion for Simultaneous Vision-Language Generation

2022

MoVQ: Modulating Quantized Vectors for High-Fidelity Image Generation

Object-Compositional Neural Implicit Surfaces

Sem2NeRF: Converting Single-View Semantic Masks to Neural Radiance Fields

Bridging global context interactions for high-fidelity image completion

2021

Visiting the Invisible: Layer-by-Layer Completed Scene Decomposition

AgileGAN: Stylizing Portraits by Inversion-Consistent Transfer Learning

The Spatially-Correlative Loss for Various Image Translation Tasks

2019

Pluralistic (Free-Form) Image Completion

2018

T2Net: Synthetic-to-Realistic Translation for Depth Estimation Tasks