TextSCD: Leveraging Text-based Semantic Guidance for Remote Sensing Image Semantic Change Detection

Huang, Haiyan; Cheng, Qimin; Zhu, Duowang; Huang, Xiao; Zhao, Qunshan

doi:https://doi.org/10.5194/isprs-annals-X-G-2025-383-2025

Articles | Volume X-G-2025

https://doi.org/10.5194/isprs-annals-X-G-2025-383-2025

© Author(s) 2025. This work is distributed under
the Creative Commons Attribution 4.0 License.

https://doi.org/10.5194/isprs-annals-X-G-2025-383-2025

© Author(s) 2025. This work is distributed under
the Creative Commons Attribution 4.0 License.

Articles | Volume X-G-2025

10 Jul 2025

| 10 Jul 2025

TextSCD: Leveraging Text-based Semantic Guidance for Remote Sensing Image Semantic Change Detection

Haiyan Huang, Qimin Cheng, Duowang Zhu, Xiao Huang, and Qunshan Zhao

Keywords: Semantic change detection, Vision-language representation learning, Multi-task learning, Remote sensing

Abstract. Semantic change detection (SCD) in remote sensing image aims to identify semantic alterations between bi-temporal images captured at the same geographic location. SCD is extensively applied in fields such as environmental monitoring and disaster assessment. Despite significant advancements in deep learning leading to numerous successful approaches, most existing methods primarily rely on visual representation learning, thereby overlooking the potential benefits of multimodal data. Recently, vision-language models have demonstrated outstanding performance across various downstream tasks. In this paper, we propose a novel framework named TextSCD that leverages text-based semantic information to guide the generation of semantic change maps. Our approach integrates Gemini to generate change descriptions between bi-temporal images and employs a multi-level semantic extraction method to capture features from both images and their corresponding captions. Furthermore, we introduce a semantic text-guided interaction module that facilitates the effective integration of visual and textual features, enhancing multimodal knowledge transfer and the extraction of discriminative features. This design effectively reduces false detections and omissions. We validate the effectiveness of our model on the SECOND dataset, achieving notable improvements in overall accuracy for semantic change detection.

TextSCD: Leveraging Text-based Semantic Guidance for Remote Sensing Image Semantic Change Detection

Useful Links

Useful External Links

Our Contact