ISPRS Annals of the Photogrammetry, Remote Sensing and Spatial Information Sciences
Publications Copernicus
Articles | Volume V-2-2020
03 Aug 2020
 | 03 Aug 2020


L. Madhuanand, F. Nex, and M. Y. Yang

Keywords: Depth, Monocular, Aerial images, Disparity, Deep learning, Image reconstruction, Scene Understanding

Abstract. Depth is an essential component for various scene understanding tasks and for reconstructing the 3D geometry of the scene. Estimating depth from stereo images requires multiple views of the same scene to be captured which is often not possible when exploring new environments with a UAV. To overcome this monocular depth estimation has been a topic of interest with the recent advancements in computer vision and deep learning techniques. This research has been widely focused on indoor scenes or outdoor scenes captured at ground level. Single image depth estimation from aerial images has been limited due to additional complexities arising from increased camera distance, wider area coverage with lots of occlusions. A new aerial image dataset is prepared specifically for this purpose combining Unmanned Aerial Vehicles (UAV) images covering different regions, features and point of views. The single image depth estimation is based on image reconstruction techniques which uses stereo images for learning to estimate depth from single images. Among the various available models for ground-level single image depth estimation, two models, 1) a Convolutional Neural Network (CNN) and 2) a Generative Adversarial model (GAN) are used to learn depth from aerial images from UAVs. These models generate pixel-wise disparity images which could be converted into depth information. The generated disparity maps from these models are evaluated for its internal quality using various error metrics. The results show higher disparity ranges with smoother images generated by CNN model and sharper images with lesser disparity range generated by GAN model. The produced disparity images are converted to depth information and compared with point clouds obtained using Pix4D. It is found that the CNN model performs better than GAN and produces depth similar to that of Pix4D. This comparison helps in streamlining the efforts to produce depth from a single aerial image.