ISPRS Annals of the Photogrammetry, Remote Sensing and Spatial Information Sciences
Download
Share
Publications Copernicus
Download
Citation
Share
Articles | Volume XI-2-2026
https://doi.org/10.5194/isprs-annals-XI-2-2026-321-2026
https://doi.org/10.5194/isprs-annals-XI-2-2026-321-2026
03 Jul 2026
 | 03 Jul 2026

Geometry-aided Video Panoptic Segmentation

Tuan Nguyen, Max Mehltretter, and Franz Rottensteiner

Keywords: Video Panoptic Segmentation, Optical Flow, Bounding Box Estimation, Kernel-based Panoptic Segmentation

Abstract. Video panoptic segmentation (VPS) unifies panoptic segmentation and object tracking by assigning each pixel a semantic class label, or for thing classes, an instance identifier that is consistent across frames. Addressing this task, we propose a novel online VPS method for processing stereoscopic image sequences, which is based on depth-aware kernel-based panoptic segmentation. Specifically, we introduce a geometrical constraint based on predicted bounding boxes into the segmentation of thing instances to overcome the fundamental limitation of kernel-based panoptic segmentation that only appearance information is considered in this step; this regularly leads to panoptic segmentation results in which distinct instances are erroneously merged into one mask. To link detected instances across frames, we propose to extend the commonly employed appearance-based association with a motion-related constraint based on optical flow; this resolves ambiguities in case of instances of similar appearance and, thus, reduces the number of incorrect associations. We experimentally evaluate our method on the publicly available Cityscapes-VPS dataset and compare our results to those of several related methods from the literature. The results demonstrate that our method improves the panoptic quality for a single frame and enhances the instance association across frames, leading to an overall improvement of 3.5% in Video Panoptic Quality on thing classes compared to the employed baseline.

Share