Multi-modal Land Cover Classification of Historical Aerial Images and Topographic Maps Exploiting Attention-based Feature Fusion

Dorozynski, Mareike; Rottensteiner, Franz; Thiemann, Frank; Sester, Monika; Dahms, Thorsten; Hovenbitzer, Michael

doi:https://doi.org/10.5194/isprs-annals-X-G-2025-221-2025

Articles | Volume X-G-2025

https://doi.org/10.5194/isprs-annals-X-G-2025-221-2025

© Author(s) 2025. This work is distributed under
the Creative Commons Attribution 4.0 License.

https://doi.org/10.5194/isprs-annals-X-G-2025-221-2025

© Author(s) 2025. This work is distributed under
the Creative Commons Attribution 4.0 License.

Articles | Volume X-G-2025

10 Jul 2025

| 10 Jul 2025

Multi-modal Land Cover Classification of Historical Aerial Images and Topographic Maps Exploiting Attention-based Feature Fusion

Mareike Dorozynski, Franz Rottensteiner, Frank Thiemann, Monika Sester, Thorsten Dahms, and Michael Hovenbitzer

Keywords: Multi-modal Classification, Attention-based Fusion, Semantic Segmentation, Historical Geodata, Remote Sensing Imagery, Topographic Maps

Abstract. Knowledge about past and present land cover is of interest for the assessment of the current status of our environment and, thus, for proper planning of the future. Information on past land cover is exclusively contained in an implicit way in historic remote sensing imagery and historic topographic maps. To make this information explicit, pixel-wise classification methods based on neural networks can be used. The method proposed in this paper aims to automatically predict land cover based on historic aerial imagery and scanned topographic maps. The proposed deep learning-based classifier extracts features at different scales from both modalities and fuses the most complex topographic map features of the smallest scale to enrich the ones derived from the aerial images. Both, the multi-modal features and those of the aerial images at larger scales, are mapped to pixel-wise predictions by means of a decoder. Comprehensive experiments show that the result of the proposed multi-modal classifier are superior compared to those of a uni-modal aerial image classifier; the multi-modal mIOU of 82.3% is 1.4% larger than the one of uni-modal classifier. This demonstrates that aerial image classification can benefit from additional information contained in topographic maps.

Multi-modal Land Cover Classification of Historical Aerial Images and Topographic Maps Exploiting Attention-based Feature Fusion

Useful Links

Useful External Links

Our Contact