Ziyang Song

I am a researcher at Tencent Video, working on multimodal video generation & world models for filmmaking.

I got my PhD degree (2021 ~ 2025) from The Hong Kong Polytechnic University, fortunately supervised by Prof. Bo Yang. My PhD research focused on 3D reconstruction and scene understanding. Prior to that, I got my M.Eng and B.Eng degrees (Honors Youth Program) from Xi'an Jiaotong University.

During my PhD study, I interned at TikTok (San Jose, CA) with Xinyu Gong. During my M.Eng study, I interned at SenseTime with Dongliang Wang, and Tencent Robotics X with Wanchao Chi.

Email / CV / LinkedIn / Google Scholar / Github

Selected Publications

	OSN: Infinite Representations of Dynamic 3D Scenes from Monocular Videos Ziyang Song, Jinxi Li, Bo Yang International Conference on Machine Learning (ICML), 2024 arXiv / Video / Code The first framework to represent dynamic 3D scenes in infinitely many ways from a monocular RGB video.
	Unsupervised 3D Object Segmentation of Point Clouds by Geometry Consistency Ziyang Song, Bo Yang IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2024 IEEE Xplore / Video / Code The journal version of our OGC (NeurIPS 2022). More experiments and analysis are included.
	NVFi: Neural Velocity Fields for 3D Physics Learning from Dynamic Videos Jinxi Li, Ziyang Song, Bo Yang Advances in Neural Information Processing Systems (NeurIPS), 2023 arXiv / Code A novel representation of dynamic 3D scenes by disentangling physical velocities from geometry and appearance, enabling: 1) future frame extrapolation, 2) unsupervised semantic scene decomposition, and 3) velocity transfer.
	ActFormer: A GAN-based Transformer towards General Action-Conditioned 3D Human Motion Generation Liang Xu, Ziyang Song, Dongliang Wang, Jing Su, Zhicheng Fang, Chenjing Ding, Weihao Gan, Yichao Yan, Xin Jin, Xiaokang Yang, Wenjun Zeng, Wei Wu International Conference on Computer Vision (ICCV), 2023 arXiv / Project Page / Code (* denotes equal contribution) A GAN-based Transformer for general action-conditioned 3D human motion generation, including single-person actions and multi-person interactive actions.
	OGC: Unsupervised 3D Object Segmentation from Rigid Dynamics of Point Clouds Ziyang Song, Bo Yang Advances in Neural Information Processing Systems (NeurIPS), 2022 arXiv / Video / Code We propose the first unsupervised 3D object segmentation method, learning from dynamic motion patterns in point cloud sequences.

Last update: 2026.01. Thanks.