YOLO-12 Performance Analysis for Vehicle Detection in Aerial Imagery
Keywords: YOLO, Vehicle Detection, Real-time object detection, Edge Computing, Deep Learning, Aerial Imagery
Abstract. Real-time vehicle detection in aerial imagery presents unique challenges for traffic monitoring and smart city surveillance systems, requiring specialized adaptation of computer vision models to photogrammetric contexts. This study introduces a novel photogrammetric evaluation framework for analyzing two lightweight object detection models—YOLO12-m and YOLO12-n—within the innovative YOLO12 architecture. Our approach systematically investigates the impact of Ground Sampling Distance (GSD) variations (5–45 cm/pixel) and altitude-dependent scale changes on detection performance, establishing quantitative relationships between imaging geometry and model accuracy. The models incorporate advanced components including R-ELAN backbone architecture, 7×7 separable convolutions, and Flash Attention-based area attention mechanisms for optimized feature extraction in aerial contexts. Trained on the EAGLE dataset and evaluated on consumer-grade hardware (Intel Core i5-4200M CPU, NVIDIA GeForce GT 740M GPU), our results demonstrate that YOLO12-m achieves 0.815 average precision (AP) and 0.986 F1-score with 1.782 seconds inference time, while YOLO12-n delivers superior processing speed at 0.535 seconds with competitive performance of 0.798 AP and 0.977 F1-score. The study provides crucial insights into altitude-specific performance thresholds and GSD-aware optimization strategies, offering a practical framework for deploying lightweight models in resource-constrained aerial surveillance applications while maintaining photogrammetric rigor.
