About Me
Hi! I’m Haiyang Wang (汪海洋), the 5th year Ph.D at Peking University (PKU), advised by Prof. Liwei Wang. Before that, I received my bachelor’s degree from the Zhiyuan College in Shanghai Jiaotong University (SJTU) at 2020, supervised by Prof. Cewu Lu. I also work very close with Shaoshuai Shi. I was fortunate to work with Prof. Bernt Schiele at Max Planck Institute for Informatics, Prof. Jifeng Dai at Sensetime, and Prof. Wei Wang at UCLA.
My research interests focus on machine learning and computer vision, particularly in exploring machine learning for AGI. This includes fundamental neural network designing, large language modeling, unified visual modeling, multi-modal understanding, and other topics related to foundation models.
Here are several research areas I worked on:
- Developing fundamental neural networks for foundation models.
- Creating a unified computational framework for general visual modeling.
- Advancing 3D scene understanding for autonomous driving and robotics.
- Exploring Embodied AI through reinforcement learning techniques.
If you are interested in collaborating with me or want to have a chat, always feel free to contact me through e-mail (wanghaiyang [at] stu.pku.edu [dot] cn).
📝 Selected Publications
* means equal contribution.
TokenFormer: Rethinking Transformer Scaling with Tokenized Model Parameters.
Haiyang Wang, Yue Fan, Muhammad Ferjad Naeem, Yongqin Xian, Jan Eric Lenssen, Liwei Wang, Federico Tombari, Bernt Schiele. In ArXiV 2024. [Code]GiT: Towards Generalist Vision Transformer through Universal Language Interface.
Haiyang Wang*, Hao Tang*, Li Jiang, Shaoshuai Shi, Muhammad Ferjad Naeem, Hongsheng Li, Bernt Schiele, Liwei Wang. In ECCV 2024. [Code] (Oral Presentation, 2.32% acceptance rate)PRED: Pre-training via Semantic Rendering on LiDAR Point Clouds.
Hao Yang, Haiyang Wang, Di Dai, Liwei Wang. In NeurIPS 2023.UniTR: A Unified and Efficient Multi-Modal Transformer for Bird’s-Eye-View Representation.
Haiyang Wang*, Hao Tang*, Shaoshuai Shi, Aoxue Li, Zhenguo Li, Bernt Schiele, Liwei Wang. In ICCV 2023. [Code]DSVT: Dynamic Sparse Voxel Transformer with Rotated Sets.
Haiyang Wang*, Chen Shi*, Shaoshuai Shi, Meng Li, Sen Wang, Di He, Bernt Schiele, Liwei Wang. In CVPR 2023. [Code]CAGroup3D: Class-Aware Grouping for 3D Object Detection on Point Clouds.
Haiyang Wang*, Lihe Ding*, Shaocong Dong, Shaoshuai Shi, Aoxue Li, Jianan Li, Zhenguo Li, Liwei Wang. In NeurIPS 2022. [Code]MsSVT: Mixed-scale Sparse Voxel Transformer for 3D Object Detection on Point Clouds.
Shaocong Dong*, Lihe Ding*, Haiyang Wang, Tingfa Xu, Xinli Xu, Jie Wang, Ziyang Bian, Ying Wang, Jianan Li. In NeurIPS 2022. [Code]RBGNet: Ray-based grouping for 3d object detection.
Haiyang Wang, Shaoshuai Shi, Ze Yang, Rongyao Fang, Qi Qian, Hongsheng Li, Bernt Schiele, Liwei Wang. In CVPR 2022. [Code]Non-convex Distributionally Robust Optimization:Non-asymptotic Analysis.
Jikai Jin*, Bohang Zhang*, Haiyang Wang, Liwei Wang. In NeurIPS 2021.Explicit shape encoding for real-time instance segmentation.
Wenqiang Xu*, Haiyang Wang*, Fubo Qi, Cewu Lu. In ICCV 2019. [Code]
🎖 Selected Awards
- National Scholarship (博士国家奖学金), 2022-2023. Awarded annually to top 1 student in all grades of Center for Data Science, Peking University.
- National Scholarship (博士国家奖学金), 2021-2022. Awarded annually to top 1 student in all grades of Center for Data Science, Peking University.
- Excellent Student at Shanghai Jiaotong University, 2020. Top 5% of graduated students.
📝 Experiences
- Visiting Student at MPI-INF D2. Advisor: Bernt Schiele (2023.12 - 2024.12)
- Internship at SenseTime. Advisor: Jifeng Dai and Wenguan Wang (2020.07 - 2021.07)
- Visiting Student at ScAi-UCLA. Advisor: Wei Wang (2019.09 - 2020.04)
- Undergraduate at MVIG-SJTU. Advisor: Cewu Lu (2018.02 - 2019.05)
💬 Invited Talks
- TokenFormer: Rethinking Transformer Scaling with Tokenized Model Parameters.
- 2024.10. Hosted by BAAI(智源). [Video]
- 2024.11. Hosted by Huawei Noah’s Ark Lab.
- 2024.10. Hosted by Google VIA Center
- GiT: Towards Generalist Vision Transformer through Universal Language Interface.
- 2024.4. Hosted by Huawei Technologies Co., Ltd.
- 2024.4. Hosted by Google VIA Center
- 2024.1. Hosted by Prof Bernt Schiele in MPI-INF
- A Unified and Efficient Multi-Modal Transformer for Bird’s-Eye-View Representation.
🏫 Academic Services
- Reviewer for NeurIPS’21, CVPR’22, ECCV’22, ICML’22, NeurIPS’22, CVPR’23, ICML’23, ICCV’23, NeurIPS’23, IROS’23, CVPR’24, ICML’24, NeurIPS’24 (Top Reviewer), ICLR’25