ISPRS-Annals

ISPRS Annals of the Photogrammetry, Remote Sensing and Spatial Information Sciences

ISPRS-Annals

ISPRS Ann. Photogramm. Remote Sens. Spatial Inf. Sci.

2194-9050

Copernicus Publications

Göttingen, Germany

10.5194/isprs-annals-XI-2-2026-225-2026

MambaPanoptic: A Vision Mamba-based Structured State Space Framework for Panoptic Segmentation

Cheng

Qing

¹ ² Bertolini

Damiano

¹ ³ Zhang

Wei

⁴ Wang

Dong

⁵ Zeller

Niclas

⁶ Cremers

Daniel

¹ ²

Department of Computer Science, Technical University of Munich, Munich, Germany

Munich Center for Machine Learning (MCML), Munich, Germany

Department of Electronics, Information and Bioengineering, Polytechnic University of Milan, Milan, Italy

Institute for Photogrammetry and Geoinformatics, University of Stuttgart, Stuttgart, Germany

Department of Photogrammetry and Remote Sensing, Wuhan University, Wuhan, China

Faculty of Electrical and Information Engineering, Karlsruhe University of Applied Sciences, Karlsruhe, Germany

03 07 2026

XI-2-2026 225 233

2026

This work is licensed under the Creative Commons Attribution 4.0 International License. To view a copy of this licence, visit https://creativecommons.org/licenses/by/4.0/

This article is available from https://isprs-annals.copernicus.org/articles/XI-2-2026/225/2026/isprs-annals-XI-2-2026-225-2026.html

The full text article is available as a PDF file from https://isprs-annals.copernicus.org/articles/XI-2-2026/225/2026/isprs-annals-XI-2-2026-225-2026.pdf

Panoptic segmentation requires the simultaneous recognition of countable thing instances and amorphous stuff regions, placing joint demands on long-range context modelling, multi-scale feature representation, and efficient dense prediction. Existing convolutional and transformer-based methods struggle to satisfy all three requirements concurrently: convolutional architectures are limited in their capacity to model long-range dependencies, while transformer-based methods incur quadratic computational cost that is prohibitive at high resolutions. In this paper, we propose MambaPanoptic, a fully Mamba-based panoptic segmentation framework that addresses these limitations through two principal contributions. First, we introduce MambaFPN, a top-down feature pyramid that leverages Mamba blocks to generate globally coherent, multi-scale feature representations with linear computational complexity. Second, we adopt a PanopticFCN-style kernel generator that produces unified thing and stuff kernels for proposal-free panoptic prediction, enhanced by a QuadMamba-based feature refinement module applied at multiple network stages. Experiments on the Cityscapes and COCO panoptic segmentation benchmarks demonstrate that MambaPanoptic consistently outperforms PanopticDeepLab and PanopticFCN under comparable model sizes, and matches or surpasses Mask2Former on Cityscapes in PQ and AP while requiring fewer parameters.