ISPRS-Annals

ISPRS Annals of the Photogrammetry, Remote Sensing and Spatial Information Sciences

ISPRS-Annals

ISPRS Ann. Photogramm. Remote Sens. Spatial Inf. Sci.

2194-9050

Copernicus Publications

Göttingen, Germany

10.5194/isprs-annals-XI-1-2026-313-2026

Fine-grained Vegetation Segmentation in Complex Urban Park Environments Using a Deeply Supervised Parallel SegFormer

Zhang

Haixin

https://orcid.org/0009-0002-6163-1757

¹ Zhang

Qinying

Department of Landscape Architecture, Tianjin University, 300072 Tianjin, China

03 07 2026

XI-1-2026 313 320

2026

This work is licensed under the Creative Commons Attribution 4.0 International License. To view a copy of this licence, visit https://creativecommons.org/licenses/by/4.0/

This article is available from https://isprs-annals.copernicus.org/articles/XI-1-2026/313/2026/isprs-annals-XI-1-2026-313-2026.html

The full text article is available as a PDF file from https://isprs-annals.copernicus.org/articles/XI-1-2026/313/2026/isprs-annals-XI-1-2026-313-2026.pdf

Accurate vegetation mapping in complex urban environments is essential for ecological monitoring, biodiversity assessment, and sustainable park management. However, fine-grained vegetation segmentation remains challenging because of the high diversity of plant species, overlapping canopies, and the interference of artificial objects. To address these challenges, a deeply supervised parallel architecture based on the SegFormer backbone was proposed in this paper. The model incorporated a SegFormer-ASPP-low-level (SAL) head, which fused high-level semantic representations, multi-scale contextual information, and low-level spatial details through a parallel decoding mechanism. Two auxiliary heads, a pyramid pooling module (PSP) and a fully convolutional network (FCN), were added to provide deep supervision and improve the recognition of blurred boundaries and rare categories. High-resolution UAV imagery was used to perform fine-grained semantic segmentation of 17 vegetation categories. The dataset included multiple tree species as well as non-tree classes such as <em>Nelumbo</em> sp. (lotus) and dead trees. Experimental results showed that our model achieved a mean intersection over union (mIoU) of 73.57%, outperforming architectures such as SegFormer-b1, DeepLab v3+, ConvNeXt and SCTNet. Visual analysis further demonstrated the model’s robustness in complex urban park scenes, showing superior boundary delineation, improved recognition of small and spectrally similar species, and resilience to interference from artificial objects like plastic lawns and landscape lighting. The proposed approach offers valuable insights for precision forestry, ecological monitoring, and intelligent UAV-based remote sensing applications.