ISPRS Annals of the Photogrammetry, Remote Sensing and Spatial Information Sciences
Download
Share
Publications Copernicus
Download
Citation
Share
Articles | Volume X-G-2025
https://doi.org/10.5194/isprs-annals-X-G-2025-383-2025
https://doi.org/10.5194/isprs-annals-X-G-2025-383-2025
10 Jul 2025
 | 10 Jul 2025

TextSCD: Leveraging Text-based Semantic Guidance for Remote Sensing Image Semantic Change Detection

Haiyan Huang, Qimin Cheng, Duowang Zhu, Xiao Huang, and Qunshan Zhao

Keywords: Semantic change detection, Vision-language representation learning, Multi-task learning, Remote sensing

Abstract. Semantic change detection (SCD) in remote sensing image aims to identify semantic alterations between bi-temporal images captured at the same geographic location. SCD is extensively applied in fields such as environmental monitoring and disaster assessment. Despite significant advancements in deep learning leading to numerous successful approaches, most existing methods primarily rely on visual representation learning, thereby overlooking the potential benefits of multimodal data. Recently, vision-language models have demonstrated outstanding performance across various downstream tasks. In this paper, we propose a novel framework named TextSCD that leverages text-based semantic information to guide the generation of semantic change maps. Our approach integrates Gemini to generate change descriptions between bi-temporal images and employs a multi-level semantic extraction method to capture features from both images and their corresponding captions. Furthermore, we introduce a semantic text-guided interaction module that facilitates the effective integration of visual and textual features, enhancing multimodal knowledge transfer and the extraction of discriminative features. This design effectively reduces false detections and omissions. We validate the effectiveness of our model on the SECOND dataset, achieving notable improvements in overall accuracy for semantic change detection.

Share