Towards Fusing Point Cloud and Visual Representations for Imitation Learning
Published in arXiv preprint, 2025
Point clouds capture 3D geometry while images provide rich semantic context, yet most imitation learning policies rely on only one of the two modalities. This work studies how to fuse point cloud and visual representations so that policies benefit from both, and evaluates the resulting architecture on a range of robotic manipulation benchmarks.
Recommended citation: Atalay Donat, Xiaogang Jia, Xi Huang, Aleksandar Taranovic, Denis Blessing, Ge Li, Hongyi Zhou, Hanyi Zhang, Rudolf Lioutikov, Gerhard Neumann. (2025). "Towards Fusing Point Cloud and Visual Representations for Imitation Learning." arXiv preprint arXiv:2502.12320.
Download Paper
