Fine-grained Vegetation Segmentation in Complex Urban Park Environments Using a Deeply Supervised Parallel SegFormer
Keywords: UAV imagery, RGB images, vegetation species, semantic segmentation, urban park, SegFormer
Abstract. Accurate vegetation mapping in complex urban environments is essential for ecological monitoring, biodiversity assessment, and sustainable park management. However, fine-grained vegetation segmentation remains challenging because of the high diversity of plant species, overlapping canopies, and the interference of artificial objects. To address these challenges, a deeply supervised parallel architecture based on the SegFormer backbone was proposed in this paper. The model incorporated a SegFormer-ASPP-low-level (SAL) head, which fused high-level semantic representations, multi-scale contextual information, and low-level spatial details through a parallel decoding mechanism. Two auxiliary heads, a pyramid pooling module (PSP) and a fully convolutional network (FCN), were added to provide deep supervision and improve the recognition of blurred boundaries and rare categories. High-resolution UAV imagery was used to perform fine-grained semantic segmentation of 17 vegetation categories. The dataset included multiple tree species as well as non-tree classes such as Nelumbo sp. (lotus) and dead trees. Experimental results showed that our model achieved a mean intersection over union (mIoU) of 73.57%, outperforming architectures such as SegFormer-b1, DeepLab v3+, ConvNeXt and SCTNet. Visual analysis further demonstrated the model’s robustness in complex urban park scenes, showing superior boundary delineation, improved recognition of small and spectrally similar species, and resilience to interference from artificial objects like plastic lawns and landscape lighting. The proposed approach offers valuable insights for precision forestry, ecological monitoring, and intelligent UAV-based remote sensing applications.
