ISPRS Annals of the Photogrammetry, Remote Sensing and Spatial Information Sciences
Download
Publications Copernicus
Download
Citation
Articles | Volume X-1-2024
https://doi.org/10.5194/isprs-annals-X-1-2024-75-2024
https://doi.org/10.5194/isprs-annals-X-1-2024-75-2024
09 May 2024
 | 09 May 2024

Cross-modal change detection flood extraction based on self-supervised contrastive pre-training

Wenqing Feng, Fangli Guan, Chenhao Sun, and Wei Xu

Keywords: Cross-modal, Change detection, Flood extraction, Self-supervised contrastive learning, Pre-training

Abstract. Flood extraction is a critical issue in remote sensing analysis. Accurate flood extraction faces challenges such as complex scenes, image differences across modalities, and a shortage of labeled samples. Traditional supervised deep learning algorithms demonstrate promising prospects in flood extraction. They mostly rely on abundant labeled data. However, in practical applications, there is a scarcity of available labeled samples for flood change regions, leading to an expensive acquisition of such data for flood extraction. In contrast, there is a wealth of unlabeled data in remote sensing images. Self-supervised contrastive learning (SSCL) provides a solution, allowing learning from unlabeled data without explicit labels. Inspired by SSCL, we utilized the open-source CAU-Flood dataset and developed a framework for cross-modal change detection in flood extraction (CMCDFE). We employed the Barlow Twin (BT) SSCL algorithm to learn effective visual feature representations of flood change regions from unlabeled cross-modal bi-temporal remote sensing data. Subsequently, these well-initialized weight parameters were transferred to the task of flood extraction, achieving optimal accuracy. We introduced the improved CS-DeepLabV3+ network for extracting flood change regions from cross-modal bi-temporal remote sensing data, incorporating the CBAM dual attention mechanism. By demonstrating on the CAU-Flood dataset, we proved that fine-tuning with only a pre-trained encoder can surpass widely used ImageNet pre-training methods without additional data. This approach effectively addresses downstream cross-modal change detection flood extraction tasks.