ISPRS Annals of the Photogrammetry, Remote Sensing and Spatial Information Sciences
Download
Publications Copernicus
Download
Citation
Articles | Volume X-1/W1-2023
https://doi.org/10.5194/isprs-annals-X-1-W1-2023-889-2023
https://doi.org/10.5194/isprs-annals-X-1-W1-2023-889-2023
05 Dec 2023
 | 05 Dec 2023

A NOVEL HYBRID MODEL BASED ON CNN AND MULTI-SCALE TRANSFORMER FOR EXTRACTING WATER BODIES FROM HIGH RESOLUTION REMOTE SENSING IMAGES

Q. Zhang, X. Hu, and Y. Xiao

Keywords: Water Body Extraction, Remote sensing images, Deep learning, Convolutional neural networks, Transformer, Multi-scale features

Abstract. Extracting water bodies from high-resolution remote sensing images has always been a challenging and hot task in the field of remote sensing. Considering that the accuracy and reliability of water body extraction still have some room for improvement, this paper proposes a hybrid network model based on CNN and multi-scale transformer for water body extraction from high-resolution remote sensing images. Specifically, the proposed network first uses a CNN model to extract a series of multi-scale features from shallow to deep from remote sensing images. These multi-scale features are then fed into a designed multi-scale transformer module to extract global contextual association information of water bodies. Afterwards, the water separability in the new multi-scale features output from the multi-scale transformer module is evaluated separately, and the features at different scales are adaptively weighted and fused according to their water separability. Subsequently, the network adaptively refines the fused features with the aid of a hybrid attention model to generate refined features that can effectively distinguish between water bodies and non-water bodies. Finally, these refined features are input into the prediction head to generate the final water body extraction results. The proposed network integrates the ability of CNN to capture local detail features and the ability of transformer to model global contextual semantic associations in a large range. Therefore, it can more accurately identify water bodies in remote sensing images, and the extracted water body boundaries have high accuracy and continuity. Finally, water body extraction experiments on the public dataset demonstrate the effectiveness of the proposed network. Moreover, the results of comparative experiments also show that compared with existing networks or methods such as U-Net, FCN8s, DeepLabv3+, and MSFA-Net, the proposed network has certain advantages in terms of water body extraction accuracy.