Multi-modal Land Cover Classification of Historical Aerial Images and Topographic Maps Exploiting Attention-based Feature Fusion
Keywords: Multi-modal Classification, Attention-based Fusion, Semantic Segmentation, Historical Geodata, Remote Sensing Imagery, Topographic Maps
Abstract. Knowledge about past and present land cover is of interest for the assessment of the current status of our environment and, thus, for proper planning of the future. Information on past land cover is exclusively contained in an implicit way in historic remote sensing imagery and historic topographic maps. To make this information explicit, pixel-wise classification methods based on neural networks can be used. The method proposed in this paper aims to automatically predict land cover based on historic aerial imagery and scanned topographic maps. The proposed deep learning-based classifier extracts features at different scales from both modalities and fuses the most complex topographic map features of the smallest scale to enrich the ones derived from the aerial images. Both, the multi-modal features and those of the aerial images at larger scales, are mapped to pixel-wise predictions by means of a decoder. Comprehensive experiments show that the result of the proposed multi-modal classifier are superior compared to those of a uni-modal aerial image classifier; the multi-modal mIOU of 82.3% is 1.4% larger than the one of uni-modal classifier. This demonstrates that aerial image classification can benefit from additional information contained in topographic maps.