ISPRS Annals of the Photogrammetry, Remote Sensing and Spatial Information Sciences
Download
Publications Copernicus
Download
Citation
Articles | Volume VIII-5/W1-2022
https://doi.org/10.5194/isprs-annals-VIII-5-W1-2022-1-2022
https://doi.org/10.5194/isprs-annals-VIII-5-W1-2022-1-2022
03 Feb 2022
 | 03 Feb 2022

AUTOMATIC ENRICHMENT OF INDOOR 3D MODELS USING A DEEP LEARNING APPROACH BASED ON SINGLE IMAGES WITH UNKNOWN CAMERA POSES

M. Jarząbek-Rychard and H-G. Maas

Keywords: Building Information Model (BIM), deep learning, object recognition, texture mapping, camera pose estimation

Abstract. 3D building modeling is a diverse field of research with a multitude of challenges, where data integration is an inherent component. The intensively growing market of BIM-related consumer applications requires methods and algorithms that enable efficient updates of existing 3D models without the need for cost-intensive data capturing and repetitive reconstruction processes. We propose a novel approach for semantic enrichment of existing indoor models by window objects, based on amateur camera RGB images with unknown exterior orientation parameters. The core idea of the approach is the parallel estimation of image camera poses with semantic recognition of target objects and their automatic mapping onto a 3D vector model. The presented solution goes beyond pure texture matching and links deep learning detection techniques with camera pose estimation and 3D reconstruction. To evaluate the performance of our procedure, we compare the estimated camera parameters with reference data, obtaining median values of 13.8 cm for the camera position and 1.1° for its orientation. Furthermore, a quality of 3D mapping is assessed based on the comparison to the reference 3D point cloud. All the windows presented in the data source were detected successfully, with a mean distance between both point sets equal to 3.6 cm. The experimental results prove that the presented approach achieves accurate integration of objects extracted from single images with an input 3D model, allowing for an effective increase of its semantic coverage.