AN IMPROVED MASK R-CNN: EXTRACTION OF DOOR AND WINDOW INSTANCES ON VILLAGE BUILDING FAÇADE IMAGES
Keywords: Object detection, Instance segmentation, Window and door extraction, Mask R-CNN, Attention mechanism
Abstract. Rapid access to the basic structure of village buildings is conducive to the investigation of the load-bearing bodies of village houses and provides data support for disaster assessment and post-disaster rescue and reconstruction. The development of computer vision technology provides new ideas and tools for identifying and extracting basic structures of housing buildings. Considering that the original Mask R-CNN ignores the spatial association and relationship of door and window elements, an advanced deep learning model based on Mask R-CNN network is proposed in this paper to detect and segment the door and window structure from the façade images. The improved network architectures integrate the attention mechanism with the original network, containing an improved Coordinate Attention(CA) module and a relationship module-based head network. The experimental results show that the Average Precision(AP) value of the backbone combined with the improved CA module is increased by 0.7% and 0.7% on regression and segmentation tasks respectively, compared with the original Mask R-CNN network. In the head network based on the relationship module, the calculation strategy of the relational module proposed in this paper increases the AP values of detection and segmentation from 76.7% and 77.7% to 80.6% and 80.0%, respectively.