FL-DBENet: Double-branch encoder network based on segment anything model for farmland segmentation of large very-high-resolution optical remote sensing images
Keywords: Farmland extraction, remote sensing images, vision transformer, Segment Anything Model, SegFormer
Abstract. Extracting farmland from very-high-resolution optical remote sensing images is a challenging task. Although deep learning algorithms have been extensively applied to farmland extraction, their performance remains limited due to the scarcity of labeled farmland samples and restricted generalization capabilities. The recent introduction of the Segment Anything Model (SAM), based on the Vision Transformer (ViT) architecture, has brought transformative advancements to remote sensing image analysis for farmland extraction. This paper introduces FL-DBENet, a farmland extraction network that builds on SAM’s strengths. FL-DBENet features a general-specialized double-branch encoder network: the general branch leverages SAM’s robust edge detection to capture precise farmland boundaries, while the specialized branch incorporates the lightweight SegFormer encoder to provide SAM with targeted prompts on farmland features. To further streamline the model, we integrate a Low-Rank Adaptation (LoRA) module into SAM’s image encoder, reducing training parameters and computational demands. Additionally, a prompt mixer module is developed to integrate diverse features effectively. Extensive evaluations on the GID dataset and the ultra-high resolution, ultra-rich context (URUR) dataset demonstrate that FL-DBENet achieves superior performance in both qualitative and quantitative assessments for farmland extraction tasks.