A Category-Specific Prompt Strategy for Semantic 3D Indoor Mapping Using RGB-D Camera
Keywords: 3D Semantic Mapping, RGB-D SLAM, Segment Anything Model, Prompt-Based Segmentation, Indoor Mapping
Abstract. Semantic 3D indoor mapping often depends on supervised learning and large annotated datasets, limiting scalability across diverse environments. This work introduces a category-specific prompt strategy for semantic 3D mapping using RGB-D cameras, integrating RGB-D SLAM with the Segment Anything Model 2 (SAM2) to enable annotation-efficient reconstruction. Keyframes and trajectories extracted from SLAM provide spatial references, while SAM2 performs zero-shot segmentation guided by a Category- Wise Prompt Segmentation Strategy (CPSS), which segments structural and functional elements (e.g., floors, doors, staircases) by category to reduce prompt interference and manual effort. The segmented keyframes are then fused with depth and pose data to produce instance-level semantic point clouds. Experiments on custom RGB-D sequences and selected ScanNet scenes demonstrate centimeter-scale geometric consistency and strong semantic consistency, with mIoU values up to 0.89 on the custom dataset and 0.98 on ScanNet. The resulting semantic point clouds are clean, structured, and require minimal post-processing, showing that the proposed strategy provides an efficient and scalable solution for semantic 3D indoor mapping without retraining or environment-specific supervision.
