Exploring Faster Street-Level Semantic Segmentation with Learnable Resized and Pixel-Shuffled MobileNets
Keywords: Street-Level Imagery, Semantic Segmentation, Lightweight Deep Learning Model
Abstract. This initial exploratory study investigates enhancements for real-time urban segmentation by integrating prominent scaling techniques, such as learnable resizing and pixel shuffling, into MobileNets. Using the Cityscapes dataset, we evaluate MobileNetV3 and MobileNetV4 to achieve a balance between computational efficiency and segmentation accuracy, particularly for high-resolution street-level imagery. A common way to proceed is to make use of traditional resizing methods to reduce image size and improve inference speed, but they often degrade important details, leading to lower segmentation accuracy. To address this, we incorporate a learnable resizer that optimizes downsampling while preserving critical features, along with pixel shuffling to efficiently restore spatial details during upsampling. Our results indicate that integrating learnable resizing and pixel shuffling improved segmentation accuracy by 9-14% compared to traditional resizing, and increased speed by 36-50% relative to no resizing. We also observed that MobileNetV4 continued to surpass MobileNetV3 in accuracy. Overall, the learnable resizer significantly mitigates accuracy loss due to downsampling, while pixel shuffling improves segmentation consistency with minimal impact on speed. These enhancements allow for better preservation of fine details, reducing misclassifications in complex urban scenes. By optimizing MobileNets with learnable resizing and upsampling techniques, we provide a practical solution for resource-constrained environments, such as mobile and edge computing platforms. This approach of combining efficiency-boosting techniques and lightweight architectures enables fast and accurate segmentation for urban and environmental monitoring, improving deep learning model performance in real-world applications without sacrificing speed or accuracy.
