LSTM-MLP Based Uncertainty Modelling Approach for Complex Human Indoor Trajectory

: Modelling the movement uncertainty of human indoor trajectory consist of an essential part in promoting the performance of smart city related applications. At this stage, the existing uncertainty modelling algorithms usually take the constant sampling error or measurement error into consideration and cannot adapt well to the changeable human motion modes and complex handheld modes of smartphones. To fill this gap, this paper applied the Long Short-Term Memory (LSTM) network for continuous prediction of uncertainty error of human indoor trajectory with complex motion modes and detected indoor landmark points. The human motion information including handheld modes, walking speed, and heading information in extracted and fused with detected landmark points for reconstruction of human indoor trajectory under large-scale areas using Gradient Descent (GD) algorithm. In addition, the hybrid LSTM and Multilayer Perceptron (MLP) network is adopted for uncertainty error prediction, by considering both sampling error and measurement error in a specific time period, and the reconstructed trajectory with human motion features are modelled as the input vector for model training with the ground-truth uncertainty error as reference. Comprehensive experiments on real-world collected dataset indicate that the proposed LSTM-assisted uncertainty modelling algorithm has robust outperformance in uncertainty error prediction and uncertainty region definition compared with traditional uncertainty modelling approaches.


INTRODUCTION
The movement path data of people is considered a crucial component in the field of human mobility analysis as it effectively portrays an individual's spatial-temporal movement and social behaviour.The advancements in Micro-Electro-Mechanical Systems (MEMS) sensors have facilitated the collection of pedestrian movement information from diverse mobile devices for location-based services (LBS) such as tourism planning (Liu et al., 2022), user habit analysis (Liu et al., 2021), epidemic control and prevention (Yang et al., 2023), and community recommendations (Liu et al., 2022).However, the acquisition of pedestrian trajectories faces challenges caused by the changing urban scenarios and different kinds of motion data acquired from various mobile equipment, leading to movement uncertainty.This issue has been considered an inevitable problem in data acquisition that could reduce knowledge extraction efficiency and accuracy (Liu et al., 2022;Shi et al., 2022).Recently, researchers have focused on describing and eliminating uncertainty error in massive amounts of movement data in changing application scenarios, specifically in the areas of trajectory mining, representation, and navigation (Yu et al., 2021;Yu et al., 2022;Wu et al., 2021;Yu et al., 2021).
Typically, in a two-dimensional plane, the path data collected is denoted as a finite set of time-stamped location coordinates <Rt1, Rt2,..., Rtn>, which are obtained from diverse measurement systems.Here, Rt={X, Y, T} signifies a 2D location coordinate along with its corresponding timestamp (Kuijpers et al., 2010).The inaccuracies present in the raw trajectories can be attributed to two primary factors: sampling deviation and measurement deviation.Sampling deviation results from unconnected data sets or sampled points that exhibit varying sampling rates, thereby making it difficult to determine the motion information between these points (Zheng et al., 2012).Measurement error arises during data collection and is influenced by the positioning algorithm adopted, environmental changes, and hardware performance deviations (Zheng et al., 2014;Zheng, 2015).
The concept of a space-time prism (STP) is frequently utilized in the realm of analyzing the uncertainty degree of human moving in order to establish the potential path area (PPA) based on the generated movement data (Miller, 1991, Kwan, 1998).In previous works, the PPA was generally estimated using the speed or distance of mobile entities.However, the assumption of uniform speed limits the accuracy of uncertainty estimation, resulting in an overestimation of the PPA in real-world scenarios (Xia et al., 2017, Downs et al., 2018).Addressing this issue, researchers have explored effective methods to control speed estimation errors and devise adaptive speed control criteria for more efficient PPA prediction (Zhou et al., 2018).The moving distance also significantly affects the PPA prediction.Furtado and his colleagues (Furtado et al., 2018) introduced the approximate upper bound (AUB) model, which estimates the uncertainty region by utilizing the largest movement length to enhance the limitation of adaptive velocity.In this model, the Manhattan distance was employed to ascertain the maximum distance, as opposed to methods based on Euclidean distance.Nevertheless, the intricate and stochastic nature of pedestrian movements and routes makes it difficult for the proposed method using Manhattan distance to better capture the logical mapping relations between sampling points.Furthermore, the uncertainty associated with predicting the PPA region is exacerbated by the uncertain collection rate of position value and related measurement errors.
Previous studies have primarily focused on uncertainty modelling of outdoor trajectories, whereas the most effective trajectories for pedestrian mobility analysis are collected indoors.However, analyzing the uncertainty of indoor trajectories poses several challenges and difficulties compared to outdoor trajectories, including: 1) Limited reference points: The collected motion information in indoor environments often lacks absolute location references such as Global Positioning System (GPS), thus the indoor huma trajectory is normally cannot autonomously acquire (Yu et al., 2021).2) Variations in measurement errors of registered indoor positions: Given the intricate nature of indoor environments, pedestrian movement information is collected in real-time, which results in variations in measurement errors that must be accurately foreseen (Li et al., 2021).3) Inconsistent gathered trajectories: In contrast to outdoor networks, indoor pedestrian trajectories are often disordered owing to the stochasticity associated with indoor pedestrian movements, making it challenging to estimate trajectory uncertainty (Wan et al., 2022).
In this investigation, we present a continuous and unified model using cutting-edge deep learning methods to overcome the difficulties of uncertainty modelling in indoor trajectories.Unlike existing techniques that exclusively rely on two adjacent measurement points to establish the uncertainty region, our method employs a sequence of location points representing the pedestrian's previous moving period as context input for the training model.The training model produces Euclidean distance as its output value, and its coefficient is adaptively selected based on the desired outcomes of the training phase.A range of characteristics are derived from the user's motion data to represent the evolving measurement and sampling errors.The expected Euclidean distance is then adopted to generate the probable area according to model prediction results, which consists of a compact and powerful uncertainty region.Through our extensive experimentation utilizing self-generated realworld trajectory datasets, we have demonstrated the effectiveness of our hybrid LSTM-MLP network.Furthermore, comparative analyses with present-day algorithms validate both the accuracy and stability of our hybrid LSTM-MLP network for generating probable area of movement trajectory using realworld collected dataset.
The innovations of our research are described as four different aspects: (1) We propose an efficient user trajectory reconstruction algorithm suitable for large indoor spaces with limited reference points.This model enhances the raw trajectory's performance and improves reference location point continuity, rather than relying solely on reference points for uncertainty analysis.
(2) A novel deep learning structure is introduced for uncertainty modelling by combining LSTM and MLP networks.Unlike traditional models, this model considers position information among a set time period as essential factors to accurately describe the sampling and measurement errors and their timespatial relationship of pedestrian's trajectory uncertainty.
(3) The training dataset acquired in our previous work is enhanced through the incorporation of more intricate indoor routes to improve comprehensiveness.Additional pedestrian motion information is collected to enhance the final uncertainty prediction's performance while accounting for measurement and sampling errors.
(4) The Euclidean distance is used to present the measurement error under selected step period, and the Euclidean coefficient is adaptively calculated based on the training outcome.The ultimate probable area accounts for both sampling and measurement errors across different user motion modes.
The arrangements of this paper are outlined as follows: Section 2 proposes the human indoor trajectory reconstruction and optimization algorithm.Section 3 presents the hybrid LSTM-MLP network and features extraction.Section 4 demonstrates the effectiveness and robustness of our algorithm via experimental results.Finally, in Section 5, we conclude the paper and highlight potential applications of our method.

HUMAN INDOOR TRAJECTORY RECONSTRUCTION AND OPTIMIZATION
In this section, the human indoor trajectory contains rich motion information is modelled and optimized combining with the detected landmark information using wireless or Quick Response (QR) codes, and the optimized trajectory is further applied for features extraction for uncertainty prediction.

Human Indoor Trajectory Modelling
In contrast to outdoor trajectories, which rely on GPS-acquired outdoor location information with comparable sampling and measurement deviations, indoor trajectory data is characterized by varying sampling and measurement deviations.Additionally, raw acquired indoor trajectories often lack reference points and require reconstruction through human motion information to provide continuous indoor location points.To address these challenges, we represent pedestrian indoor trajectories as a graph comprising landmark points and human motion features, as depicted in Figure 1, which is based on our previous work (Liu et al., 2022):

Diagram of Human Trajectory Modelling
Figure 1 depicts the indoor trajectory reconstruction process, which involves utilizing pedestrian motion data to calculate step-length and heading values between two spatiotemporal points.Absolute location sources like Wi-Fi stations, BLE nodes, and QR codes are used as reference points for location determination.By incorporating the collected motion data throughout the trajectory duration, it becomes feasible to reconstruct the complete indoor trajectory: where 01 R represents the location of first detected landmark point under walking period.(Shi et al., 2021).
In this study, to enhance the precision of raw collected movement data and related movement features, above (1) is treated as an optimization formula: where covariance matrix  of the measured quantity is denoted by the symbol specified.
The objective of trajectory optimization based on graphs is to identify the smallest state vector that satisfies the described expression.: , ( , ) arg min ( , ) (3) where ( , ) i i L  indicates the collection of calculated gait and direction vectors, and the loss function ( , ) under the condition of optimal gait and direction vectors.
Upon obtaining the optimized indoor trajectory reconstruction, it is imperative to have a ground-truth trajectory as well.For this study, the original sensor data was sourced from the IPIN-2018 public dataset (Renaudin et al., 2019), while the reference points were gathered using a total station that has centimetrelevel precision.Each ground-truth trajectory was created using 5 to 8 reference points, and an accuracy of 0.1 to 0.3 m can be achieved based on our previous research (Li et al., 2021).Additional trajectories were included in the improved dataset, which were acquired from larger indoor spaces with more complex routes.The constructed dataset comprises data vectors that include the following parameters:

Landmark Detection and Trajectory Reconstruction
In this section, the daily-life facilities for instance Wi-Fi, BLE, and QR codes, are adopted for landmark recognition, and the DTW matching is developed by considering the real-time collected measurement distribution and ideal distribution.
The Received Signal Strength Indication (RSSI) value of local W-Fi and BLE stations can be utilized in measuring distances, and the conversion formula that relates the acquired distance to the acquired RSSI value is expounded as follows.: 00 0 ( ) ( ) 10 lg( ) where () r L  indicates the acquired RSSI value at the distance  among user and wireless stations, 0  is the corresponding ground- truth distance, 00 () L  is the reference RSSI at the known distance d0,  indicates the path loss index, and  represents the random error of measured RSSI.
During the course of user walking towards the Wi-Fi/BLE stations and leave the stations, it is customary for the measured distance between the smartphone and the landmark to yield regular peaks.Ideally, there exists a distribution that accurately describes this process, which is constructed using information on the pedestrian's ideal walking speed and the measured distance between their ideal position and the location of the landmarks.However, the collected distance set can be influenced by realworld environmental factors.In order to mitigate these effects, this study proposes a DTW-assisted landmark recognition algorithm, which is founded on the similarity results obtained from comparing real-time collected measurement vectors with self-generated reference vectors: The identified landmark can furnish an observation for location reference in Equation (3), wherein the Gradient Descent (GD) technique is employed to obtain the optimal outcomes.Since the observation model is nonlinear, it necessitates a linearization stage, wherein the Taylor series is utilized to expand the present state estimation and extract the first-order term: where  x indicates the state update error, G indicates the Jacobian matrix.The difference between each iteration phase is presented as follow: (8) The difference in the updated state vector following each iteration phase is computed using the equation expressed as follows: (9) To achieve a state estimation error below the threshold, non-linear least squares requires multiple iterations of the aforementioned process.Generally, the update for nonlinear least squares can be represented as: x x (10) where j represents the number of iteration.Since the observation error is not affected by the state estimation, the observation error covariance matrix R remains unchanged.The optimal solution reaches in the case when () L x less than the set threshold.

HYBRID LSTM-MLP BASED UNCERTIANTY PREDICTION MODEL
This section details the autonomous extraction and learning of comprehensive features using the suggested LSTM-MLP network.The output vector is modelled to contain uncertainty error, which encompasses varying sampling and measurement deviations within specific time windows.

LSTM-MLP Based Features Extraction
In order to achieve a comprehensive depiction of the relationship between the movement uncertainty index and reconstructed trajectory, we extracted the following characteristics from the latter.This allowed for the determination of the corresponding mapping relationship between the reconstructed trajectory and the uncertainty index of each spatiotemporal point: 1) Approximated gait-length observation 7) Current ratio of completeness index under total trajectory distance: where n indicates the recorded number of gaits of overall trajectory, k indicates the indexed number of current gaits.8) Current ratio of completeness index under total time length and current time used: 11) Calculated cumulative direction changes according to the initial heading value: The aforementioned characteristics could aptly illustrate the efficacy of the chosen indoor movement data and its expected uncertainty error value.These acquired features are subsequently structured as the input vector for the hybrid LSTM and MLP based framework used in predicting uncertainties.
Furthermore, the suggested hybrid LSTM and MLP based uncertainty prediction model takes full account of both sampling and measurement errors.To address the issue of sampling error, the estimated interval i   for coordinate updates between two consecutive spatiotemporal points and the input vector associated with the continuous time period are utilized.Meanwhile, the constructed features of the selected trajectory are employed to tackle the challenge of changing measurement deviation.

Model Design of Uncertainty Prediction
In this work, we utilize a hybrid LSTM-MLP network to construct an uncertainty model for indoor trajectories.Five distinct features that have been acquired from the trajectories are structured into the input features of the hybrid LSTM-MLP network.
Furthermore, in order to provide a detailed depiction of the chosen trajectory, it is necessary to include additional features.
To address the issue of feature correlation in uncertainty prediction for indoor trajectory estimations, 11 distinct features are extracted and employed.To enhance the learning and extraction of the feature vector utilized in describing the movement data, the MLP model is adopted to enhance the performance of single LSTM.
To effectively leverage the characteristics of both the MLP and LSTM networks and to account for time and characteristics correlations within large-scale indoor trajectories, a novel hybrid LSTM and MLP network is introduced in this study.The proposed structure uses a combination of LSTM and MLP networks, as shown in Figure 3:  In the LSTM layer, the update model of LSTM parameters is described as: Lastly, the output layer of LSTM network is structured as the input layer for the MLP network.The uncertainty error is then calculated using the following equation: ˆ() To characterize the measurement error associated with different users and daily-life movement trajectory, the Euclidean distance is applied.The deviation among reference movement data and reconstructed movement data under selected location point Dis(GT.Pi, RT.Pi) is applied as the reference output vector under the model training procedure.
To get the overall uncertainty area of selected movement trajectory, the estimated uncertainty error under each step period is generated as an uncertainty circle, and the overleap parts of all the uncertainty circles are generated as the overall uncertainty area.

EXPERIMENTAL RESULTS
In this section, a series of experiments are designed to evaluate the performance of the proposed LSTM-based human trajectory uncertainty prediction framework.The real-world trajectory collected from large-scale indoor spaces provided by IPIN-2018 and an office building is applied as the training dataset.

Performance Evaluation of Trajectory Reconstruction
In this paper, the raw human motion data is collected by mobile terminals to calculate the initial motion features including steplength, heading, and time period.In addition, the Wi-Fi, BLE, and QR codes are applied as the landmarks, and the DTW algorithm is proposed to autonomously match the collected RSSI vector and ideal vector for landmark detection.The comparison between ideal distribution and real-time collected distribution and the calculated DTW results are described in Figure 4 and Figure 5 respectively.It can be found from Figure 4 and Figure 5 that the proposed DTW algorithm proves expected matching performance in the procedure of landmark detection.The lower point can be find according to calculated DTW results, which indicates the timestamp of detected landmark at the nearest location.
In addition, the location of detected landmark point is further applied for human trajectory reconstruction.The GD algorithm is developed to optimize the collected motion features under each step period.The comparison between raw trajectory and reconstructed trajectory is described as follows: And the accuracy of optimized trajectory under generated dailylife dataset is described as: It can be found from Figure 6 that the positioning error of reconstructed trajectory is lower than 3.32 m in 75% under complex human motion modes, compared with raw trajectory error of 9.31 m in 75%.

Model Training and Dataset Setting
In the aforementioned study, the source data from raw sensors and reference points were obtained from the IPIN-2018 training dataset.The original data was gathered within a large-scale indoor shopping mall situated in Nantes, France.The groundtruth trajectory was determined by the optimal outcomes of the raw trajectory together with high-accuracy control points, as explained in the preceding section.Furthermore, real-world trajectories collected from another large-scale building were introduced as an enhanced daily-life dataset.The comprehensive attributes of the improved real-world dataset are outlined in The proposed deep learning architecture integrates the advantages of LSTM and MLP networks.The Adam optimizer is implemented as it is efficient in handling a vast amount of training data.The input vector dimension of hybrid LSTM and MLP network structure is set to 11.The sensitivity experiment verifies that these settings generally reflect the model's performance, which remains effective even with different configurations.
To achieve the proposed deep learning framework's training objectives, we randomly selected 70% of the 9772 spatiotemporal coordinates from 45 routes as the training dataset, leaving 30% as the test dataset.After the procedure of model training, the optimized model is utilized to evaluate the performance of the test dataset and measure the final uncertainty prediction accuracy and generate the related uncertainty area of selected trajectory.

Performance Evaluation of Uncertainty Prediction
In this section, we evaluate our proposed uncertainty prediction framework against three current state-of-the-art models.We utilize completeness and density indices as reference standards for comprehensive evaluation.The completeness index evaluates the degree of coverage of the ground-truth trajectory by the generated uncertainty region, thereby characterizing the overall completeness of our proposed framework.On the other hand, the density index provides a more detailed assessment by determining the ratio of the total area covered by the generated uncertainty region to the spatiotemporal points in the groundtruth trajectory.To comprehensively evaluate the performance of our deep-learning-based uncertainty prediction algorithm, we combine these two indices.
In addition to the LSTM model proposed by (Liu et al., 2022), we compare our proposed framework with three conventional uncertainty prediction models: the upper bound (UB) model, approximate upper bound (AUB) model, and broad adaptive error ellipse (BAEE) model.The UB model generates the uncertainty region by considering the trajectory's starting and ending points and defining an error ellipse based on the time length and maximum speed as factors.The AUB model generates the uncertainty region using the "Approximate Upper Bound Distance" method and the constrained error ellipse.In contrast, the BAEE model enhances the AUB model by incorporating the Minkowski distance metric.To enable a comprehensive comparison, we maintain consistency with Shi et al.'s work (Shi et al., 2021) by setting the maximum speed for UB at 4.94 m/s and the Minkowski coefficient p-value for error ellipse generation in the BAEE model at 1.5.They utilized all three algorithms for uncertainty analysis and uncertainty region generation.
To ensure a comprehensive evaluation of each method, we adopt the uncertainty region produced by the UB model as the standard benchmark.We then determine the proportion of actual trajectory points that fall within this region when aligned with the constructed uncertainty area.This provides us with an evaluation of the completeness of our proposed framework.Additionally, the density of covered ground-truth trajectories is an essential metric for assessing the accuracy of the generated uncertainty region.
Our proposed hybrid LSTM and MLP network-based method is a noteworthy contribution to the field of uncertainty prediction, as it has been compared with four other existing models, taking into account both completeness and density indices.Unlike the traditional methods that generate a single uncertainty region for the entire trajectory, our method generates a separate uncertainty region at each spatiotemporal point, ensuring robust uncertainty prediction using features extracted from pedestrian motion data.Each spatiotemporal point contains a standard circular uncertainty region, and the final uncertainty region for the entire trajectory is generated by the union of all spatiotemporal points.Furthermore, our proposed model utilizes a period of trajectory data to predict the uncertainty value at the current moment, which leads to more accurate predicted uncertainty values at the beginning of the trajectory, similar to the location error of the deployed control point.This is an important advantage over traditional methods where the predicted uncertainty values remain constant throughout the trajectory.Figure 6 provides a typical representation of the generated uncertainty region using our proposed hybrid LSTM and MLP network, highlighting its superior performance in uncertainty prediction.Overall, our study demonstrates the effectiveness of our proposed model in predicting uncertainty regions for indoor positioning systems.As presented in Figure 7, the suggested combination of LSTM and MLP network achieves an 87% completeness index for predicting uncertain regions on the chosen trajectory, compared to UB (100%), AUB (88%), and BAEE (69%) methods.The density index results indicate that the hybrid LSTM and MLP network's application, utilizing down-sampled points, leads to a density index of 0.14/m2, surpassing UB (0.03/m2), AUB (0.13/m2), and BAEE (0.09/m2) methods, thus demonstrating superior performance in predicting uncertainty regions while taking both completeness and density indices into account.
In order to obtain a thorough comparison of four distinct uncertainty prediction algorithms based on completeness and density indices, an improved test dataset is utilized to assess their performance.The evaluation includes 45 different daily-life trajectories of various users, with the final uncertainty areas calculated using the respective algorithms' predicted uncertainty errors.Table 2 presents a comprehensive comparison of the average completeness and density indices for four conventional methods (UB, AUB, BAEE, LSTM) and our innovative HDL framework.Furthermore, our suggested hybrid LSTM and MLP network significantly enhances the uncertainty prediction performance of the LSTM model, which translates to an improvement ratio of 3.16% and 9.21% in completeness and density indices, respectively.

CONCLUSION
In order to realize accurate uncertainty modelling of complex human trajectory data, this work proposes a hybrid LSTM-MLP network, which takes the sampling error, measurement error, human motion features, and time related effects into consideration and training the hybrid LSTM-MLP network using the real-world generated trajectory dataset.Comprehensive experiments indicate that the proposed hybrid LSTM-MLP network can effectively describe the uncertainty error of human indoor trajectory under complex motion modes, and the proposed hybrid LSTM-MLP network also proves better performance compared with state-of-art algorithms under different accuracy indexes.


The gait-length and direction values collected among two consecutive spatiotemporal points represented in the x-axis and y-axis of acquired ground-truth trajectory, L and are the gait-length and direction values, r x and r y represents raw x-axis and y-axis of acquired trajectory.The ground-truth user trajectories are described in Figure.2 with indoor map information:

Figure. 2
Figure.2 Walking Routes of Collected Dataset movement velocity vi among two consecutive collected points: time of gaits of overall trajectory, () Ti indicates the indexed time used by current gaits.9) Current ratio of completeness index under total number of gaits and current number of gaits:

X
indicates the input vector of LSTM model at current moment, and the t h represents the hidden state vector, regarded as the output of the LSTM model. indicates the sigmoid function, and t C is the candidate vector.

Figure. 4
Figure.4Comparison Between Reference Vector and Real-time Vector

Figure. 7
Figure.7 Trajectory Reconstruction Error Figure.7 Comparison of Generated Uncertainty Areas

Table 1 .
Parameters of Generated Real-world DatasetTable 1 displays that the augmented dataset formulated this study comprises more than 45 routes and 9772 acquired location coordinates.The mean length of a route is 117.2 m, and the average time interval between points is 115.4 s.Additionally, the collected routes have varying sampling durations due to the inconsistent step intervals.

Table 2 .
Table 2 presents the average completeness and density indices for the five algorithms under consideration.Description of Completeness and Density Indexes