ISPRS Annals of the Photogrammetry, Remote Sensing and Spatial Information Sciences
Download
Publications Copernicus
Download
Citation
Articles | Volume X-1/W1-2023
https://doi.org/10.5194/isprs-annals-X-1-W1-2023-649-2023
https://doi.org/10.5194/isprs-annals-X-1-W1-2023-649-2023
05 Dec 2023
 | 05 Dec 2023

HYBRID DEEP LEARNING APPROACH FOR VEHICLE’S RELATIVE ATTITUDE ESTIMATION USING MONOCULAR CAMERA

M. Haggag, A. Moussa, and N. El-Sheimy

Keywords: Relative pose estimation, Monocular camera, Ego-motion, Optical flow map, KLT tracker, Deep Learning

Abstract. Relative pose estimation using a monocular camera is one of the most common approaches for aiding vehicle’s navigation. It involves determining the position and orientation of a vehicle relative to its surroundings using only a single camera. This can be achieved through four main steps: feature detection and matching, motion estimation, filtering and optimization, and scale estimation. Feature tracking involves detecting and tracking distinctive visual features in the environment, such as corners or edges, and using their relative motion to estimate the camera's movement. This approach can be prone to errors due to feature detection and tracking difficulties, as well as issues with moving objects, occlusions, and changes in lighting conditions. These typical computer vision approaches are computationally intensive and may require significant processing power as well, which limits their real time application. This paper proposes a hybrid deep neural network approach for estimating the relative attitude of a vehicle using a monocular camera to aid in vehicle navigation. The proposed neural network adopts a relatively shallow architecture to minimize the computational cost and to meet the real-time requirements of low-cost processing systems. The network is trained using the KITTI dataset and can estimate the relative attitude of the vehicle with a RMSE of relative orientation of 0.017 degrees per frame. The processing time of the proposed approach is around 28 ms per frame including both the tracking and network prediction steps, which is significantly faster than the typical estimation pipelines. The results show that the proposed approach is a viable alternative to conventional computer vision methods and can significantly reduce computational costs, deal with the confusing scenarios of the moving objects while maintaining a good accuracy in estimating ego-motion.