ISPRS Annals of the Photogrammetry, Remote Sensing and Spatial Information Sciences
Download
Publications Copernicus
Download
Citation
Articles | Volume X-4-2024
https://doi.org/10.5194/isprs-annals-X-4-2024-373-2024
https://doi.org/10.5194/isprs-annals-X-4-2024-373-2024
18 Oct 2024
 | 18 Oct 2024

Lightweight Indoor Positioning System Based on Multiple Self-Learning Features and Key Frame Classification

Chenzhe Wang, Kai Bi, Bianli Zhao, Ming Li, Yujia Chen, Shiliang Tao, and Juntao Yang

Keywords: Indoor Positioning System, Key Frame Classification, Convolutional Neural Network, Feature Point Recognition, SuperPoint, MobileNet V3-Small

Abstract. Traditional indoor positioning technologies mostly require advanced installation of hardware devices, resulting in high costs and long-term maintenance. With advancements in image recognition and deep learning technologies, indoor visual positioning based on image recognition has become increasingly mature. This method offers the benefits of low cost and does not require additional hardware installation. However, it still has inherent defects, such as cumbersome data collection, complex algorithms, and universality. To minimize indoor information pre-collection cost, improve versatility, and enable rapid deployment in low-performance mobile devices, this paper proposes a lightweight indoor positioning system based on multiple self-learning features and key frame classification. The system is divided into two stages: preprocessing and real-time positioning. In the preprocessing stage, image information is collected for the entire indoor environment, and a key-frame recognizer is trained based on the image information. Simultaneously, an environmental feature information database is established. In the real-time positioning stage, the system first uses mobile devices such as smartphones to obtain real-time video streams. A key frame recognizer based on convolutional neural networks identifies key frames in each video stream frame, thereby obtaining approximate positions for rough positioning. Second, feature points are identified in each frame of the video stream and matched with feature points with location information in the built environmental feature information database to calculate precise positions for fine positioning. It has significant optimizations compared with conventional visual solutions in terms of preprocessing data collection, algorithm performance consumption, and versatility.