Chuanxia Zheng

Nanyang Assistant Professor

Physical Visual Group (PVG)

College of Computing and Data Science

Nanyang Technological University

🌐

Research Interests:

Physical AI

Spatial AI

Generative AI

Resume Google Scholar

I am a Nanyang Assistant Professor at the College of Computing and Data Science, Nanyang Technological University, where I lead the Physical Visual Group (PVG). My research focuses on Physical AI and Spatial AI, aiming to develop systems that perceive, reconstruct and interact with the physical world. The broader goal is to create realistic digital twins of the natural world, with various physical properties in a simulator, capturing not only appearance, content, and geometry but also occlusion, dynamics, gravity, interaction, sound and more.

Before joining NTU, I was a Marie Skłodowska-Curie Actions (MSCA) Fellow in VGG at the University of Oxford, working with Andrea Vedaldi on feed-forward realistic 3D and 4D reconstruction. I was very fortunate to also collaborate with Andrew Zisserman, Christian Rupprecht, and Iro Laina in VGG. I received my Ph.D. degree in Computer Science from the Nanyang Technological University (NTU), advised by Prof. Tat-Jen Cham and Prof. Jianfei Cai.

📍If you are looking for research positions, please see here.

News

[29/07/2025]I am one of eleven researchers selected for Singapore National Research Foundation (NRF) Fellow of the Year 2025, awarded outstanding young scientists from around the world to conduct independent research in Singapore, over a five-year period.
[29/10/2024]I have been selected as a DAAD AInet fellow, awarded to excellent international researchers from the field of AI.
[13/02/2024]I am awarded the prestigious Marie Skłodowska-Curie Actions (MSCA) Fellowship. The Marie Skłodowska-Curie Actions (MSCA) are among Europe's most competitive and prestigious research and innovation fellowships (Wikipedia).

Projects

View All Projects ...

NRF Fellowship, National Research Foundation

From Pixels to Physics: Integrating Physical Properties in Natural World Creation

National Research Foundation, PI

The grant was awarded on 12/02/2025 (S$ 3,078,720.00), and started on 01/09/2025 and will end on 30/08/2030.

The aim of From Pixels to Physics is to create a realistic natural world adhering to physical principles, rather than dealing with only the realistic pixels as in traditional image, video, and 3D synthesis. Physics-based natural world creation is challenging because it requires a holistic interpretation of scenes and objects within it, including but not limited to appearance, geometry, materials, occlusion, motion, gravity, interaction, mass and sound.

NRF Fellowship

Selected Publications

More Publications and Google Scholar

2025

Geo4D: Leveraging Video Generators for Geometric 4D Scene Reconstruction

ICCV, 2025(highlight)

Zeren Jiang, Chuanxia Zheng, Iro Laina, Diane Larlus, Andrea Vedaldi

A method to repurpose video diffusion models for monocular 3D reconstruction of dynamic scenes.

🌐 website

arxiv code

DSO: Aligning 3D Generators with Simulation Feedback for Physical Soundness

ICCV, 2025(highlight)

Ruining Li, Chuanxia Zheng, Christian Rupprecht, Andrea Vedaldi

A feed-forward approach to increase the likelihood that the 3D generator outputs stable 3D objects directly.

🌐 website

arxiv code 🔗 data

Amodal3R: Amodal 3D Reconstruction from Occluded 2D Images

ICCV, 2025

Tianhao Wu, Chuanxia Zheng, Frank Guan, Andrea Vedaldi, Tat-Jen Cham

A model for completed 3D reconstruction from partially visible inputs.

🌐 website

arxiv code 🔗 demo

Puppet-Master: Scaling Interactive Video Generation as a Motion Prior for Part-Level Dynamics

ICCV, 2025

Ruining Li, Chuanxia Zheng, Christian Rupprecht, Andrea Vedaldi

An interactive video generative model that can serve as a motion prior for part-level dynamics.

🌐 website

arxiv code 🔗 data 🔗 demo

Flash3D: Feed-Forward Generalisable 3D Scene Reconstruction from a Single Image

3DV, 2025

Stanislaw Szymanowicz*, Eldar Insafutdinov*, Chuanxia Zheng*, Dylan Campbell, João Henriques, Christian Rupprecht, Andrea Vedaldi

A fast, super efficient, trainable on a single GPU in one day for scene 3D reconstruction from a single image.

🌐 website

arxiv pdf code

2024

A General Protocol to Probe Large Vision Models for 3D Physical Understanding

NeurIPS, 2024

Guanqi Zhan, Chuanxia Zheng, Weidi Xie, Andrew Zisserman

Probe large vision models to determine to what extent they 'understand' different physical properties in an image.

🌐 website pdf

arxiv code

MVSplat360: Feed-Forward 360 Scene Synthesis from Sparse Views

NeurIPS, 2024

Yuedong Chen, Chuanxia Zheng, Haofei Xu, Bohan Zhuang, Andrea Vedaldi, Tat-Jen Cham, Jianfei Cai

A feed-forward approach for 360-degree scene-level novel view synthesis using only sparse observations.

🌐 website pdf

arxiv code

MVSplat: Efficient 3D Gaussian Splatting from Sparse Multi-View Images

ECCV, 2024(oral)

Yuedong Chen, Haofei Xu, Chuanxia Zheng, Bohan Zhuang, Marc Pollefeys, Andreas Geiger, Tat-Jen Cham, Jianfei Cai

A feed-forward approach for efficiently predicting 3D Gaussians from sparse multi-view images in a single forward pass.

🌐 website pdf

arxiv code

DragAPart: Learning a Part-Level Motion Prior for Articulated Objects

ECCV, 2024

Ruining Li, Chuanxia Zheng, Christian Rupprecht, Andrea Vedaldi

A physical interaction with objects in vision for part-level dragging.

🌐 website pdf

arxiv code

Free3D: Consistent Novel View Synthesis without 3D Representation

CVPR, 2024

Chuanxia Zheng, Andrea Vedaldi

A method to synthesize consistent novel views from a single image on open-set categories without the need of explicit 3D representations.

🌐 website pdf

arxiv 🎥 video code

PanoDiffusion: 360-degree Panorama Outpainting via Diffusion

ICLR, 2024

Tianhao Wu, Chuanxia Zheng, Tat-Jen Cham

An indoor panorama outpainting model using latent diffusion models with view-consistent.

🌐 website pdf

arxiv code

2023

Online clustered codebook

ICCV, 2023

Chuanxia Zheng, Andrea Vedaldi

A simple approach to avoid codebook collapse and achive 100% codebook utilisation.

pdf

arxiv 🎥 video code

UniD3: Unified Discrete Diffusion for Simultaneous Vision-Language Generation

ICLR, 2023

Minghui Hu, Chuanxia Zheng, Heliang Zheng, Tat-Jen Cham, Chaoyue Wang, Zuopeng Yang, Dacheng Tao, P.N.Suganthan

A unified discrete diffusion model for simultaneous vision-language generation.

🌐 website pdf

arxiv code

2022

MoVQ: Modulating Quantized Vectors for High-Fidelity Image Generation

NeurIPS, 2022(spotlight)

Chuanxia Zheng, Long Tung Vuong, Jianfei Cai, Dinh Phung

A spatially conditional normalization is introduced to address the repeated artifacts in vector quantized methods.

pdf

arxiv 🔗 code(Kandinsky2)

Sem2NeRF: Converting Single-View Semantic Masks to Neural Radiance Fields

ECCV, 2022

Yuedong Chen, Qianyi Wu, Chuanxia Zheng, Tat-Jen Cham, Jianfei Cai

A model that transfers the 2D semantic map into 3D NeRF, and lets users edit 3D model through 2D semantic input.

🌐 website pdf

arxiv 🎥 video code

Bridging global context interactions for high-fidelity image completion

CVPR, 2022

Chuanxia Zheng, Tat-Jen Cham, Jianfei Cai, Dinh Phung

TFill fills in reasonable contents for both foreground object removal and content completion.

🌐 website pdf

arxiv 🎥 video code

2021

Visiting the Invisible: Layer-by-Layer Completed Scene Decomposition

IJCV, 2021

Chuanxia Zheng, Duy-Son Dao, Guoxian Song, Tat-Jen Cham, Jianfei Cai

A high-level scene understanding system that simultaneously models the completed shape and appearance for all instances.

pdf

arxiv 🎥 video code

The Spatially-Correlative Loss for Various Image Translation Tasks

CVPR, 2021

Chuanxia Zheng, Tat-Jen Cham, Jianfei Cai

A novel spatially-correlative loss that is simple, efficient and yet effective for preserving scene structure consistency while supporting large appearance changes during unpaired I2I translation.

🌐 website pdf

arxiv 🎥 video code

2019

Pluralistic (Free-Form) Image Completion

CVPR, 2019IJCV, 2021

Chuanxia Zheng, Tat-Jen Cham, Jianfei Cai

Given a masked image, the proposed pic model is able to generate multiple and diverse plausible results.

🌐 website pdf

arxiv 🎥 video code

2018

T2Net: Synthetic-to-Realistic Translation for Depth Estimation Tasks

ECCV, 2018

Chuanxia Zheng, Tat-Jen Cham, Jianfei Cai

Without using any real depth map, the proposed model evaluates depth maps on real scenes using only synthetic datasets.

pdf

arxiv 🎥 video code

Professional Services

Area chair: BMVC (2024-2025), ACM MM 2024, ICLR 2026
Conference reviewer: CVPR (2020-2025), ICCV (2019-2025), ECCV (2020-2024), NeurIPS (2022-2025), ICLR (2021-2025), ICML 2023
Journal reviewer: TPAMI, IJCV, TIP, PR, TMM (Outstanding Reviewer Award, 2021), TCSVT

Awards and Honors

Singapore NRF Fellow, National Research Foundation, 2025
DAAD Ainet Fellow, 2024
Marie Skłodowska-Curie Actions (MSCA) Fellow, 2024
NTU Outstanding PhD Thesis Award, 2022