GROUNDWATER LEVEL PREDICTION USING DEEP RECURRENT NEURAL NETWORKS AND UNCERTAINTY ASSESSMENT

: Groundwater is one of the most important sources of regional water supply for humans. In recent years, several factors have contributed to a significant decline in groundwater levels (GWL) in certain regions. As a result of climate change, such as temperature increase, rainfall decrease, and changes in relative humidity, it is necessary to investigate and model the effects of these factors on GWL. Although a number of researches have been conducted on GWL modeling with machine learning (ML) and deep learning (DL) algorithms, only a limited number of studies have reported model uncertainty. In this paper, GWL modeling of some piezometric wells has been conducted by considering the effects of the meteorological parameters with Long-Short Term Memory (LSTM) and Gated Recurrent Unit (GRU) algorithms. The models were trained on one piezometric well data and predictions were executed on six other wells. To perform an uncertainty assessment, the models were run 10 times and their means were calculated. Subsequently, their standard deviations were considered to evaluate the outcomes. In addition, the prediction power of the models was validated using Mean Absolute Error (MAE), Root Mean Square Error (RMSE), Normalized Root Mean Square Error (NRMSE), and R-Squared ( R 2 ). Finally, for all the six wells that did not participate in the training phase, the prediction functions of the trained models were run 10 times and their accuracy was assessed. The results indicate that LSTM ( R 2 =95.6895, RMSE=0.4744 m, NRMSE=0.0558, MAE=0.3383 m) had a better performance compared to that of GRU ( R 2 =95.2433, RMSE=0.4984 m, NRMSE=0.0586, MAE=0.3658 m) on the GWL modeling.


INTRODUCTION
Groundwater plays an important role in providing humans with freshwater demand.Due to population growth, the water demand for domestic, industrial, and agricultural purposes has increased widely (Dalin et al., 2017;Wada et al., 2010).Faced with the growing demand for freshwater, the groundwater level (GWL) has been decreased in many parts of the world including Iran (Foroughnia et al., 2019;Mohammady et al., 2019).In addition, in recent decades, climate change has greatly affected all human lives (McMichael et al., Seyed Mousavi et al., 2022).Observed the climate changes such as temperature increase and changes in the trend of rainfall, relative humidity, soil moisture, as well as evaporation and transpiration have led to significant changes in groundwater sources, especially in arid and semi-arid areas (Afrifa et al., 2022;Gaffoor et al., 2022;Wunsch et al., 2022;Yifru et al., 2021).Hence, appropriate and effective models can be necessary tools to make informed decisions about the status of groundwater sources and their environmental effects.Recently, many researchers have used different methods in order to model GWL and investigate the effects of environmental parameters.Among the employed methods, machine learning (ML) (Basant Yadav et al., 2017;Kouziokas et al., 2018) and deep learning (DL) (Bowes et al., 2019;Wunsch et al., 2022) algorithms have been very popular due to their capacity to handle complex real-world problems and their high accuracy and efficiency.ML and DL are used in various fields of earth and environmental sciences (Eghrari et al., 2023;Izanlou et al., 2023;Mousavi and Akhoondzadeh, 2023).Various climate parameters have been used to model GWL, for example Wunsch et al. (2022) used precipitation, temperature, and relative humidity parameters to model the GWL using a convolutional neural network (CNN), nonlinear autoregressive network with exogenous inputs (NARX) and Long-Short Term Memory (LSTM) models at 17 groundwater wells within the Upper Rhine Graben (URG) area.Pham et al. (2022) used seven different ML models including random tree (RT), random forest (RF), decision stump, M5P, support vector machine (SVM), locally weighted linear regression (LWLR), and reduce error pruning tree (REP Tree) to model GWL using mean temperature, rainfall, and relative humidity dataset.Ghazi et al. (2021) used an artificial neural network (ANN), least square support vector machine (LSSVM), and NARX models for predicting GWL fluctuations under climate change scenarios for Tasuj plain, Iran.Recently, Recurrent Neural Networks (RNNs) have been widely used to model and predict GWL changes (Chu et al., 2022;Jeong and Park, 2019;Shin et al., 2020;Vu et al., 2021;Wunsch et al., 2022).
Although a number of studies have been conducted on GWL modeling with ML and DL algorithms, a few studies have reported model uncertainty.Moreover, given that changes in GWLs are time-dependent, it is essential to use models with longterm memory for accurate predictions.LSTM and Gated Recurrent Unit (GRU) are two popular types of RNNs that excel at addressing the vanishing gradient problem, enabling them to better handle long-term dependencies in time series data.In this research, the meteorological data including temperature, precipitation, relative humidity, soil moisture, as well as evaporation and transpiration were acquired from Iran Meteorological Organization (IMO), and the water level data of seven piezometric wells acquired through Isfahan Regional Water Company.By using this integrated dataset, we were able to develop the GWL model using LSTM and GRU.The models were trained on one piezometric well data and predictions were conducted on six other wells.The models were then assessed for uncertainty by calculating the mean and standard deviation.The prediction power of the models was validated through various metrics, including Mean Absolute Error (MAE), Root Mean Square Error (RMSE), Normalized Root Mean Square Error (NRMSE), and R-Squared (R 2 ).Additionally, to evaluate the accuracy of the trained models, the prediction functions were executed 10 times for all six wells that were not part of the training phase.The research includes five sections.Section 2 has concentrated on the description of study area.In the third section, at first data pre-processing has been performed.Then, GWL modeling has been undertaken with LSTM and GRU algorithms.In the fourth section, the prediction uncertainty of the employed models has been examined.Then, the error analysis of the test data, and the prediction of the models with geometric parameters have been done.Finally, in the fifth section, conclusions and directions for future research have been elaborated, respectively.

DESCRIPTION OF THE STUDY AREA
Kashan Plain as the study area covers an area of more than 1730.03Km 2 is located in the north of the Isfahan Province in the Kashan city, Iran between Longitudes of 51.10 and 52.01 degrees and Latitudes of 33.81 and 34.50 degrees (Figure ( 1)).Kashan Plain is one of the least rainfall regions of Iran which has two classes of arid and semi-arid climates.

METHODOLOGY
The research methodology consists of 4 steps as illustrated in

Data Pre-Processing
We have used the GWL dataset of 7 piezometric wells, which were reported monthly from Nov. 1, 2002 to Jun. 1, 2022 by Isfahan Regional Water Company.Data redundancy was checked and resolved.Five meteorological parameters including temperature, precipitation, relative humidity, soil moisture, as well as evaporation and transpiration have been used.The statistical report of the groundwater wells is presented in

GWL Modeling
This section provides an explanation of two deep learning algorithms, namely LSTM and GRU, which have been utilized for GWL modeling.Seven piezometric wells were considered and their spatial distribution is shown in Figure ( 1).The data from W0 was used to train the models, with 80% of the data allocated as training data and 20% as test data.Subsequently, the prediction function of the trained model was applied to six other wells.Figure 1.The study area and location of the piezometric wells.

LSTM
LSTM is a special type of RNNs that solves the long-term memory problem of RNNs (Hochreiter and Schmidhuber, 1997).
The LSTM network has internal mechanisms called gates.They control the flow of information.It also determines the information in the sequence which are important and should be kept and the information which should be eliminated.In this way, it passes the important information along the sequence networks to get the desired output ( Equations 1, 2, 3, 4, 5, and 6) (Hochreiter and Schmidhuber, 1997).Figure (3) illustrates a hidden layer block with an LSTM cell unit. where,

GRU
The neural network of the GRU is very similar to the LSTM network (Figure ( 4)), except that instead of three gates it has only two reset gates (Reset Gate) and an update gate (Update Gate).In addition, the GRU network does not have a cell state and uses a hidden state to transmit information (Guo et al., 2018) (Equations (7-10)).
where, x t = input h t = output

Employed Uncertainty Evaluation Method
To assess the uncertainty, both models were run 10 times, their means were calculated and their standard deviations (Equation ( 11)) were considered in assessing the results.
where, σ = standard deviation N = the size of the point data x i = point data µ = mean

Models Evaluation
To evaluate the prediction power of the models, MAE, RMSE, NRMSE and R 2 have been used (Equations (12-15)).
MAE is the difference between the actual value and the predicted value over all training samples, while for n test samples, and for each actual value of y and predicted value of  � , it is calculated using Equation ( 12) (Willmott and Matsuura, 2005).
n RMSE is calculated from the mean squared error using Equation ( 13) (Barnston, 1992): NRMSE criterion is the result of dividing the square root of the average error by the interval of the target feature (Equation ( 14)) (Hyndman and Koehler, 2006).NRMSE = RMSE y max − y min R 2 , coefficient of determination, is equal to or smaller than 1 (Equation ( 15)) (Cameron and Windmeijer, 1997).The coefficient of determination is used to compare models and report the results.Considering that the variance of the numerical data set is constant, with the increase of the mean squared error, the coefficient of determination decreases. (1) where, y � i = vector of predicted dependent variables with n data points y i = vector of observed values of the variable being predicted y � � = mean of the observed dependent variables

Models Results
The objective of this study is to model GWL changes in seven piezometric wells, taking into account the influence of various meteorological parameters including temperature, precipitation, relative humidity, soil moisture, as well as evaporation and transpiration, with LSTM and GRU algorithms.RMSE, NRMSE, R², and MAE were calculated for LSTM and GRU models.Table (2 5), and ( 6)) demonstrate the compatibility between the real data and the predicted data for the test, train, and all data while implementing LSTM and GRU for the GWL modeling.As a result, the LSTM model has a higher prediction accuracy than that of the GRU model as presented in

The Uncertainty and Predictions Results
To conduct a comprehensive uncertainty assessment, both the LSTM and GRU models were meticulously executed ten times.This ensured a robust evaluation of their performance.Following these iterations, the means of the model outputs were calculated, providing an aggregated measure of their predictive capabilities.Furthermore, to gain deeper insights into the reliability and consistency of the models predictions, their respective standard deviations were carefully considered.Serving as a visual representation, Figure ( 8) offers valuable insights into the predictive reliability of the models by showcasing their uncertainty.The distance observed between the two black lines within the figure holds significance as it acts as a meaningful indicator of the level of uncertainty present.It is important to note that the length of this distance directly correlates with the degree of uncertainty.Specifically, a greater distance indicates a higher level of uncertainty.Moreover, Figure ( 9) showcases the execution of prediction functions for the trained models, which were conducted 10 times for each of the six wells not included in the training phase.Subsequently, the accuracy of these predictions was evaluated and presented in Figure ( 10).Among the six wells analyzed, it was observed that well W3 obtained the highest R² value while using the LSTM model, whereas well W2 achieved the lowest R² value.On the other hand, for the GRU model, wells W5 and W2 demonstrated the highest and lowest R² values respectively.It is worth noting that well W2 exhibited the lowest RMSE across both models.The key highlight is that, despite not being included in the model training phase, these six wells exhibit remarkable predictive capabilities when it comes to capturing trends and seasonality.This underscores the impressive performance of the models and reinforces their reliability in forecasting patterns.
Over the past two decades, there has been a notable and unequivocal decline in GWL changes, which is evident.This downward trend can be attributed to a range of factors, with climate change playing a significant role.Empirical evidence from modeling results further substantiates the profound impact of this crucial parameter on GWL fluctuations.

CONCLUSIONS AND FUTURE DIRECTIONS
Groundwater is an essential source of regional water supply for communities.However, in recent years, some areas have experienced a significant reduction in GWLs due to various factors, including climate change.Consequently, GWL modeling has become a crucial issue in water resource management, serving as a valuable tool to provide insights into the status of groundwater sources and their environmental impact.This modeling can help decision-makers to make more informed decisions about how to manage and protect these essential resources.Despite the extensive research conducted on the utilization of ML and DL algorithms in GWL modeling, only a limited number of studies have prioritized addressing the aspect of reporting the uncertainty associated with these models.
In this study, we used LSTM and GRU algorithms to model the GWL of piezometric wells in Kashan Plain, Isfahan Province, Iran.
We have examined seven piezometric wells, with the findings indicating a significant downward trend in the GWL across all seven wells over the past two decades.This notable decline in GWL can lead to severe repercussions, including an increasing probability of land subsidence.
As the results confirmed, the use of meteorological parameters including temperature, precipitation, relative humidity, soil moisture, as well as evaporation and transpiration have had a significant impact on modeling the GWL changes of the piezometric wells.
To conduct uncertainty assessment, the study employed a process involving the execution of both LSTM and GRU models for 10 iterations.The resulting measurements were averaged, and their standard deviations were subsequently calculated to evaluate the outcomes.Notably, the LSTM and GRU models consistently accurately modeled GWL.These models possessed a remarkable capability to effectively address low uncertainty, which is critical for predicting GWL variations.By proficiently capturing intricate temporal dependencies and patterns within the data, these models made substantial contributions to our comprehension of GWL dynamics and facilitated more precise forecasting of future water levels.The accuracy of the predictions for the GWL in six wells, which were not part of the training phase, was assessed by running the prediction functions ten times.Surprisingly, despite not being included in the training phase, the trained models exhibited a remarkable ability to accurately predict the GWL in these piezometric wells.In this research, our case study had two classes of arid and semiarid climates.Future researchers may consider assessing and comparing the performance of the deep recurrent neural networks across various climate classes to create the best global model.

Figure 2 .
Figure 2. Flowchart of the research methodology.
are weight matrices and b f , b i , b o , b c are biases for the input.

Figure 3 .
Figure 3. LSTM architecture h t−1 = hidden state tanh = Activation function W z , W r , W h are weight matrices and b z , b r , b h are biases for the input.

Figure ( 7 Figure 6 .
Figure(7)  demonstrates the compatibility between the real data and the predicted data of time series GWL for All, Train, and Test data of the GRU and LSTM models.The performance of the models in predicting trend and seasonality has been exceptional, as evidenced by their high accuracy when evaluated against the test data.

Figure 5 .
Figure 5. Compatibility between the real data and the predicted data for All, Train, and Test data of the LSTM model.

Figure 7 .
Figure 7. (a), (b), and (c) demonstrate the compatibility between the real data and the predicted data of time series GWL for All, Train, and Test data of the LSTM model respectively, and (d), (e), as well as (f) demonstrate the compatibility between the real data and the predicted data of time series GWL for All, Train and Test data of GRU model, respectively.

Figure 9 .
Figure 9. (a), (b), (c), (d), (e), and (f) demonstrate the real data and 10 prediction results of the time series GWL for all the 6 wells that did not participate in the training phase with LSTM model.(g), (h), (i), (j), (k), and (l) illustrate the real data and 10 prediction results of the time series GWL for all the 6 wells that did not participate in the training phase with the GRU model.

Figure 10 .
Figure 10.(a), and (c) present R 2 of 10 runs of the GWL predictions for each of the six wells that did not participate in the training phase with LSTM and GRU models respectively.(b), and (d) demonstrate RMSE of 10 runs of the GWL predictions for each of the six wells that did not participate in the training phase with the LSTM and GRU models, respectively.

Table 1 .
Statistics of the observed GWLs for all the wells