UPDF AI

PointCLIP V2: Adapting CLIP for Powerful 3D Open-world Learning

Xiangyang Zhu,Renrui Zhang,3 Authors,Peng Gao

2022 · DOI: 10.48550/arXiv.2211.11682
arXiv.org · 83 Citations

TLDR

This work proposes PointCLIP V2, a powerful 3D open-world learner, to fully unleash the potential of CLIP on 3D point cloud data, and introduces a realistic shape projection module to generate more realistic depth maps for CLIP’s visual encoder.

Abstract

Contrastive Language-Image Pre-training (CLIP) has shown promising open-world performance on 2D image tasks, while its transferred capacity on 3D point clouds, i.e., PointCLIP, is still far from satisfactory. In this work, we propose PointCLIP V2, a powerful 3D open-world learner, to fully unleash the potential of CLIP on 3D point cloud data. First, we introduce a realistic shape projection module to generate more realistic depth maps for CLIP’s visual encoder, which is quite efficient and narrows the domain gap between projected point clouds with natural images. Second, we leverage large-scale language models to automatically design a more descriptive 3D-semantic prompt for CLIP’s textual encoder, instead of the previous hand-crafted one. Without introducing any training in 3D domains, our approach significantly surpasses Point-CLIP by +42.90% , +40.44% , and +28.75% accuracy on three datasets for zero-shot 3D classification. Furthermore, PointCLIP V2 can be extended to few-shot classifi-cation, zero-shot part segmentation, and zero-shot 3D object detection in a simple manner, demonstrating our superior generalization ability for 3D open-world learning. Code will be available at https://github.com/

Cited Papers
Citing Papers