A Category-Specific Prompt Strategy for Semantic 3D Indoor Mapping Using RGB-D Camera

Hou, Jiwei; Volland, Vivien; Karam, Samer; Iwaszczuk, Dorota

doi:10.5194/isprs-annals-XI-1-2026-255-2026

Articles | Volume XI-1-2026

https://doi.org/10.5194/isprs-annals-XI-1-2026-255-2026

© Author(s) 2026. This work is distributed under
the Creative Commons Attribution 4.0 License.

https://doi.org/10.5194/isprs-annals-XI-1-2026-255-2026

© Author(s) 2026. This work is distributed under
the Creative Commons Attribution 4.0 License.

Articles | Volume XI-1-2026

03 Jul 2026

| 03 Jul 2026

A Category-Specific Prompt Strategy for Semantic 3D Indoor Mapping Using RGB-D Camera

Jiwei Hou, Vivien Volland, Samer Karam, and Dorota Iwaszczuk

Keywords: 3D Semantic Mapping, RGB-D SLAM, Segment Anything Model, Prompt-Based Segmentation, Indoor Mapping

Abstract. Semantic 3D indoor mapping often depends on supervised learning and large annotated datasets, limiting scalability across diverse environments. This work introduces a category-specific prompt strategy for semantic 3D mapping using RGB-D cameras, integrating RGB-D SLAM with the Segment Anything Model 2 (SAM2) to enable annotation-efficient reconstruction. Keyframes and trajectories extracted from SLAM provide spatial references, while SAM2 performs zero-shot segmentation guided by a Category- Wise Prompt Segmentation Strategy (CPSS), which segments structural and functional elements (e.g., floors, doors, staircases) by category to reduce prompt interference and manual effort. The segmented keyframes are then fused with depth and pose data to produce instance-level semantic point clouds. Experiments on custom RGB-D sequences and selected ScanNet scenes demonstrate centimeter-scale geometric consistency and strong semantic consistency, with mIoU values up to 0.89 on the custom dataset and 0.98 on ScanNet. The resulting semantic point clouds are clean, structured, and require minimal post-processing, showing that the proposed strategy provides an efficient and scalable solution for semantic 3D indoor mapping without retraining or environment-specific supervision.

A Category-Specific Prompt Strategy for Semantic 3D Indoor Mapping Using RGB-D Camera

Useful Links

Useful External Links

Our Contact