Automatic Detection, 3D Localization, and Semantic Enrichment of Commercial Signboards Using 360° Mobile Mapping Imagery: A Case Study in Temara, Morocco
Keywords: Signboard Detection, YOLO, 3D Geolocation, MMS Imagery, Line of Bearing , GPT-4o
Abstract. The regulation of urban advertising signage is critical for preserving visual harmony and ensuring regulatory compliance in modern cities. This study presents a novel pipeline for the automatic detection, tracking, geolocation, and textual identification of storefront signboards from 360° Panoramic imagery acquired via Mobile Mapping Systems (MMS). We first fine-tune a YOLOv11 object detection model on a custom-labeled dataset of urban scenes, enabling robust identification of signboards across varied viewing angles. To associate detections across consecutive frames, we leverage the integrated YOLOv11 tracking mode, which assigns consistent object IDs based on motion and appearance features. Each tracked instance is then localized in 3D space using a photogrammetric Line of Bearing (LoB) method, relying on known camera poses and pixel coordinates. In parallel, we extract the textual content from each detected sign using advanced GPT-4o Vision, which has demonstrated improved performance in complex visual environment. The proposed pipeline offers a scalable alternative to manual inspection, providing precise spatial and semantic information about urban signage. The pixel-wise projection precision, quantified by an average RMSE of 7.75 pixels (median 7.17px, std dev 2.80px) derived from LoB intersection consistency, confirms the pipeline’s reliability for automated urban inventory systems and smart city applications.