Elevation Guided Global and Local Smoothness for Unsupervised Semantic Segmentation in Remote Sensing Imagery

Qiu, Kevin; Mebus Kishi de Oliveira, Isabella; Bulatov, Dimitri; Iwaszczuk, Dorota

doi:https://doi.org/10.5194/isprs-annals-X-4-W6-2025-177-2025

Articles | Volume X-4/W6-2025

https://doi.org/10.5194/isprs-annals-X-4-W6-2025-177-2025

© Author(s) 2025. This work is distributed under
the Creative Commons Attribution 4.0 License.

https://doi.org/10.5194/isprs-annals-X-4-W6-2025-177-2025

© Author(s) 2025. This work is distributed under
the Creative Commons Attribution 4.0 License.

Articles | Volume X-4/W6-2025

18 Sep 2025

| 18 Sep 2025

Elevation Guided Global and Local Smoothness for Unsupervised Semantic Segmentation in Remote Sensing Imagery

Kevin Qiu, Isabella Mebus Kishi de Oliveira, Dimitri Bulatov, and Dorota Iwaszczuk

Keywords: Multimodal Training, Self-Supervision, NDSM, Energy Minimization, Conditional Random Fields

Abstract. Unsupervised and self-supervised deep learning networks for semantic segmentation of images have made impressive progress in the last years. They can be trained without any labelled data and yet are able to effectively segment RGB images into meaningful semantic groups. In remote sensing, supplementary information, such as elevation, improves class separation by differentiating classes based to their height above ground. We take SmooSeg, a recently developed, state-of-the-art unsupervised network for semantic segmentation, and guide its training process by infusing elevation information into its projector and smoothness prior. This ensures global label consistency across the entire dataset and improves the segmentation performance, since patches of the same semantic group often exhibit similar elevation characteristics. We also extend the Conditional Random Field (CRF) to refine the low-resolution segmentation results in a post-processing step with elevation information. We introduce a second pairwise potential that encourages neighboring pixels with similar elevation to have the same label, ensuring local label consistency. Our multi-modal training strategy remains unsupervised and improves the segmentation performance on the ISPRS Potsdam-3 dataset by +4.0% in mIoU over the RGB-only SmooSeg baseline and by 4.4% when also using the multi-modal CRF post-processing. Collectively, our approach surpasses all state-of-the-art unsupervised segmentation networks that rely solely on RGB data for the Potsdam-3 dataset, highlighting the important role of elevation data in label-free segmentation for remote sensing applications.

Elevation Guided Global and Local Smoothness for Unsupervised Semantic Segmentation in Remote Sensing Imagery

Useful Links

Useful External Links

Our Contact