Enhancing Urban UAV Photogrammetric Products Through Domain-Specific Training of the Real-ESRGAN Super-Resolution Model
Keywords: UAV photogrammetry, Super-resolution, Real-ESRGAN, Deep learning, 3D mesh, Orthophotomosaic
Abstract. The growing demand for high-resolution geospatial data in urban environments necessitates advanced methods to improve the quality of spatial products derived from UAV photogrammetry. This study presents a deep learning–based framework for enhancing both the radiometric and geometric quality of UAV imagery using a fine-tuned Real-ESRGAN (Enhanced Super-Resolution Generative Adversarial Network) model. The training process consists of two stages: an initial Real-ESRNet pretraining phase for stable pixel-level reconstruction (average pixel loss ≈ 0.03), followed by Real-ESRGAN fine-tuning to improve perceptual and structural fidelity (average perceptual and adversarial losses ≈ 8.5 and 0.25, respectively). Quantitative evaluation demonstrated that the fine-tuned Real-ESRGAN achieved a 3.5 dB improvement in PSNR and a 0.02 increase in SSIM compared with bicubic interpolation, and outperformed the pretrained Real-ESRNet by approximately 1.8 dB. The enhanced UAV images subsequently produced orthophotomosaics and 3D mesh models with greater radiometric consistency and geometric precision. These findings highlight that domain-specific fine-tuning of Real-ESRGAN provides substantial improvements in visual detail and spatial accuracy, confirming its practical value for high-fidelity urban mapping based on UAV photogrammetry.
