Towards Fusing Point Cloud and Visual Representations for Imitation Learning

Published in arXiv preprint, 2025

Point clouds capture 3D geometry while images provide rich semantic context, yet most imitation learning policies rely on only one of the two modalities. This work studies how to fuse point cloud and visual representations so that policies benefit from both, and evaluates the resulting architecture on a range of robotic manipulation benchmarks.

Recommended citation: Atalay Donat, Xiaogang Jia, Xi Huang, Aleksandar Taranovic, Denis Blessing, Ge Li, Hongyi Zhou, Hanyi Zhang, Rudolf Lioutikov, Gerhard Neumann. (2025). "Towards Fusing Point Cloud and Visual Representations for Imitation Learning." arXiv preprint arXiv:2502.12320.
Download Paper

Share on

Bluesky Facebook LinkedIn X (formerly Twitter)