A SPATIO-TEMPORAL FRAMEWORK FOR MODELING ACTIVE LAYER THICKNESS

The Arctic is experiencing an unprecedented rate of environmental and climate change. The active layer (the uppermost layer of soil between the atmosphere and permafrost that freezes in winter and thaws in summer) is sensitive to both climatic and environmental changes, and plays an important role in the functioning, planning, and economic activities of Arctic human and natural ecosystems. This study develops a methodology for modeling and estimating spatial-temporal variations in active layer thickness (ALT) using data from several sites of the Circumpolar Active Layer Monitoring network, and demonstrates its use in spatial-temporal interpolation. The simplest model’s stochastic component exhibits no spatial or spatio-temporal dependency and is referred to as the naı̈ve model, against which we evaluate the performance of the other models, which assume that the stochastic component exhibits either spatial or spatio-temporal dependency. The methods used to fit the models are then discussed, along with point forecasting. We compare the predicted fit of the various models at key study sites located in the North Slope of Alaska and demonstrate the advantages of space-time models through a series of error statistics such as mean squared error, mean absolute and percent deviance from observed data. We find the difference in performance between the spatio-temporal and remaining models is significant for all three error statistics. The best stochastic spatio-temporal model increases predictive accuracy, compared to the naı̈ve model, of 33.3%, 36.2% and 32.5% on average across the three error metrics at the key sites for a one-year hold out period.


INTRODUCTION AND BACKGROUND
The Arctic is experiencing an unprecedented rate of environmental and climate change (Hinzman et al., 2005).Vast areas of the Arctic are underlain by permafrost, defined as earth materials that remain at or below 0 • C continuously for two or more years.The permafrost regions occupy nearly a quarter of the Earth's terrestrial surface (Zhang et al., 1999).
The active layer is the uppermost layer of soil or other earth material above permafrost that experiences seasonal freezing and thawing.The thickness of the active layer (ALT) plays an important role in the ecology, hydrology and geomorphology of cold environments.Knowledge about the spatial-temporal variability of ALT is crucial for engineering applications and construction in northern regions (Streletskiy et al., 2012).Permafrost degradation reported in the Arctic has the potential to influence the balance of greenhouse gases and may pose significant hazards to local infrastructure, especially in permafrost ice-rich areas where extensive landscape changes result in subsidence and the development of irregular surfaces over extensive areas, known as thermokarst terrain (Streletskiy et al., 2015).Consequently, understanding the dynamics that contribute to shifts in ALT is of interest at both local and global scales.
The thickness of the active layer varies from centimeters to several meters along the latitudinal bioclimatic gradient, and is present everywhere where there is permafrost.Its calculation is often approximated analytically using the Stefan Solution e.g., (Jumikis, 1978), given in one of its most basic forms by: where X is the annual depth of thaw (m), λ is the thermal con-ductivity of the substrate (W/m • C), DDT is the degrees days of thaw, a time-temperature integral representing cumulative summer warmth at the surface ( • C days), s is a scale factor (s/day), ρ is the dry density of the substrate (kg/m 3 ), w is water content expressed as a proportion of dry weight, and L is the latent heat of fusion (J/kg).
Calculation of ALT over extensive areas is a challenging undertaking owing to the high variability and unavailability of detailed subsurface information.Accordingly, many studies have assumed that subsurface characteristics are constant over geographic space, and used even more simplified versions of the Stefan relation.Compounding this situation is the fact that few spatial time series of active-layer thickness (ALT) existed prior to about 1990.For this reason, the Circumpolar Active Layer Monitoring (CALM) program was implemented in the 1990s.The CALM program maintains a network of more than 250 permafrost observatories that monitor active layer and near-surface permafrost responses to climate change over multi-decadal time scales.CALM has produced a large number of publications, many of which address spatial time series using relatively simple statistical strategies (Shiklomanov et al., 2012).
Despite growing awareness of permafrost's potential impact on global temperature and that of active-layer thickness on environmental processes and economic activities (Schaefer et al., 2012), observational data are relatively sparse, a reflection primarily of logistical and economic constraints.High costs and difficulties of access severely limit the number of monitoring sites in the Arctic regions (Biskaborn et al., 2015).An additional problem is that many ALT records are of limited duration, as most of the activelayer monitoring sites were established prior to the early 1990s in support of short-term engineering and ecological investigations (Brown et al., 2000).Given these limitations, not all sites can be visited on an annual basis, making it necessary to use alternative methods for characterizing the spatial and temporal dynam-ics of ALT.Process-oriented analytical and numerical models, of varying complexity, have been developed to explain changes in ALT.Similarly, several statistical procedures have been used to map and analyze the spatial variability of ALT, including regression modeling (Nelson et al., 1997), kriging (Shiklomanov and Nelson, 2003), nested sampling and analysis (Nelson et al., 1999), and spatial autocorrelation analysis (Nelson et al., 1998).Some of this literature has been summarized in (Brown et al., 2000), (Nelson et al., 2008), and (Mishra et al., 2013) , and a comparative study of various models was published by (Shiklomanov et al., 2007).However, the literature concerned with modeling space-time variability using continuous parameters (Qian and Apanasovich, 2014) is small.This study is concerned with developing a methodological framework for modeling ALT using Gaussian random fields with non-separable space-time covariance structures.The primary points addressed are: (a) data availability at eight site locations in northern Alaska; (b) theory about spatial-temporal covariance structures; (c) specification of stochastic spatial and spatio-temporal models; (d) introduction of four geostatistical ALT models formulated as the sum of a deterministic trend and a stochastic component; (e) differentiation between the models, based on specification of their stochastic components.

DATA AVAILABILITY
The data used to model ALT were drawn from the CALM database.CALM data are available publicly, and can be accessed online at the CALM website (www.gwu.edu/∼calm).Eight sites located in the North Slope of Alaska were selected to develop the model.The sites were selected based on data availability and proximity to settlements in hope that the developed methodology can be used for societal benefits in these settlements.Basic information about the sites is shown in 1.These sites were selected for analysis because of the availability of high-quality data, their contrasting topography, and their proximity to indigenous populations (Barrow, Atkasuk), a major operating oil field (West Dock -Purdhoe Bay), and environmental organizations (University of Alaska's Toolik Lake Long-Term Ecological Research station).These characteristics impart social, commercial, and educational relevance.
Active layer data were obtained at different scales, according to one of two schemes, both employing a systematic spatial sampling protocol.Under the first scheme an 11 × 11 grid of 121 stations was sampled over 100 m increments to form a 1×1 km square array.In the second scheme data were sampled over a 100×100 m square array with stakes separated every 10 m.Data are recorded annually by manual probing during late August or early September, when ALT is near its maximum annual value.For computational purposes, the working assumption was made that measurements occur at precisely annual intervals, so that modeling procedures can be implemented using regular intervals.
Table 1 summarizes data availability and characteristics of the design.Although data are available at most sites between the years 1995-2014, there are missing values.This may be by design, for example, at point locations with rocky substrates that do not permit manual probing.Two sampling grids that contain a relatively large number of such points are Imnavait Creek and Toolik Lake.Accordingly, their sample design contains 49 and 74 stations, respectively.These are very sparse in comparison with the remaining site locations, which each have at least 99 stations at which ALT can be measured.For succinctness, we use site codes to reference sampling locations in the remainder of this paper.
During exploratory data analysis, ALT was compared spatially and cross-temporally.ALT points that vary substantially from others may be an artifact of terrain characteristics or the result of an unseasonably warm summer.For example, the U1 site is bisected by a gravel-rich beach-ridge, for which readings in that sampling locale are significantly higher than neighboring values.Using Anselin's local Moran I statistic (Anselin, 1995), ALT readings that are significantly different from their neighbors were identified and removed.

Geostatistical Modelling
Geophysical processes are variable over both space and time.Geostatistical approaches typically model continuous spatio-temporal observations by the sum of a systematic trend and stochastic components: where m(s, t) = E{X(s, t)}, the mean function or global trend, is smooth and deterministic, ξ(s, t) is a Gaussian field of spatially and temporally uncorrelated mean zero errors, and Z(s, t) is a mean-zero Gaussian random function.The error process ξ(s, t) has covariance function: ) is fully characterized by its covariance function and explains space-time variability not captured by the mean function or error process i.e. the microscale variation.The covariance function of X(s, t) is defined as: where s, s + h ∈ S; t, t + u ∈ T , θ ∈ Θ and Θ is the parameter space.In general, some assumptions were made with respect to the stochastic space-time process Z(s, t).Two common working assumptions are isotropy and separability.When the data support the two assumptions, model complexity and computational intensity are reduced.
In the context of covariance functions, when Z(s, t) is both translation and rotation invariant, it is said to be isotropic: where || • || is the Euclidean distance.
A process Z(s, t) is said to be separable if the covariance function can be factored into purely spatial and temporal components: where s + h ∈ S and t + u ∈ T .Separable models do not allow for space-time interaction and consequently do not adequately model physical processes when interaction is present.Accordingly, non-separable models generally have better predictive performance than separable models.When Z(s, t) is isotropic and separable its covariance function can be written as: where h ∈ R d and u ∈ R.

Stochastic Spatial and Spatio-Temporal Models
We consider two spatial covariance functions CS{||h||}.The first is the Matérn covariance: ) where φ > 0, ν > 0 and Kν is the modified Bessel function of the second kind.The Matérn is selected due to its flexibility modeling different degrees of smoothness.A second commonly used spatial covariance function is the powered-exponential covariance, which has a closed parametric form and does not rely on the Bessel function, easing subsequent parameter estimation: where φ > 0, 0 < p ≤ 2.

Modeling Active Layer Thickness
Four models are considered.The first, naïve, model assumes no stochastic spatio-temporal variability, Z(s, t), so that: where m(s, t) is the mean function and ξ(s, t) is the error process.The spatial and temporal trends do not interact so the mean function is defined as the sum of spatial and temporal trends: where υ = {υi}i∈I, λ = {λj}j∈J are unknown parameters with index sets I, J and the functions mS(s; υ) and mT (t; λ) are known up to the value of their parameters .
When there are a small number of parameters, as in polynomial regression, the microscale variation is not captured by the mean function hence it is not flexible enough to capture all space-time variation.Accordingly, we add a stochastic component to account for the microscale variation.The remaining models include a global trend and stochastic space-time component: where m(s, t), ξ(s, t) and Z(s, t) were previously defined.It is possible that microscale variation occurs only in space and here we consider the reduced model.For a fixed t ∈ T , Model 2's covariance does not depend on time and is given by: When Z(s, t)'s microscale variation is both in space and time, we refer to it as Model 3, whereas when it is non-separable it is referred to as Model 4.

Fitting Models to Data
Following the convention of (Zimmerman and Michael, 2010), we assume the naïve model's spatial trend is modeled by at most, a second order polynomial: where s = (s1, s2) ∈ S are coordinates in R 2 and {υi} 4 i=0 are their corresponding unknown parameters.We assume the temporal trend follows, at most, a second order polynomial equation with Fourier frequency cos(ωt): where t ∈ T ⊆ R and {λi} 4 i=0 are the corresponding unknown parameters.The parameters of the spatial and temporal trends are estimated using non-linear least squares.With respect to models 2-4, once the global trend is fit, the parameters of the covariance function of the residuals Z(s, t) (CZ (•, •; θ)) are estimated.When maximum likelihood criteria are used, exact schemes rely on repeated formation and inversion of the covariance matrix and evaluation of its determinant which can be computationally demanding.Rather, composite likelihood, an approximate likelihood method, is used to estimate the covariance parameters.Using composite likelihood, a pseudolikelihood function is constructed from the marginal densities of all pairwise differences among the observations and maximized.Using a spatio-temporal version of the scheme proposed by (Curriero and Lele, 1999), we find the parameters θ ∈ Θ that maximize the spatio-temporal composite likelihood function: where n is the number of observations, ẑ(si, ti) is the value of the residual at location si and time ti, θ ∈ ΘZ ⊆ Θ and CZ {si − sj, ti − tj; θ} := Cov{Z(si, ti), Z(sj, tj); θ}. ( 22) Models are fit in R using the CompRandFld package (Padoan and Bevilacqua, 2015).

Prediction
For separable and non-separable spatio-temporal models the simple point forecast at location s0 ∈ S and time t0 ∈ T is given by: where ci0 is the vector of covariances between the residuals, (ẑ(s, t) = {ẑ(si, tj)}i,j∈S,T ) and predicted variables (z(s0, t0) = {z(si, t0)}i∈S ), CZ is the variance-covariance matrix of Z(s, t) and ẑ(s, t) is the vector of residuals.
The predicted value for Model 1 at locations s0 ∈ S and time t0 ∈ T is similarly given by: which is the global trend predicted at values s0 and t0.

Model Comparison Test
In the absence of temporal correlation the spatio-temporal models, Model 3 and Model 4, reduce to the stochastic spatial model, Model 2, with Matérn and powered exponential covariances, respectively.Model 2 has four estimable parameters whereas Model 3 has seven and Model 4 has eight.We compare the fitted nested spatial models and saturated spatio-temporal models to one another at the eight sites using likelihood ratio tests.We test whether there is a significant difference between spatial Matérn and spatiotemporal Matérn-Cauchy models as well as the spatial powered exponential and spatio-temporal powered exponential-Cauchy models.

Time-Forward Prediction
To assess the predictive performance of increasingly complex models we use time-forward predictions of the ALT during the test period based on fitted models in the training period.In particular, one-year ahead spatio-temporal predictions for ALT measures are made at each of the stations for the four models.A one-year hold out set, the year 2014, was used to rank model fits based on several metrics.Predicted root mean squared error (PRMSE), predicted mean absolute error (PMAE) and predicted mean percentage difference (PMPD) were used to assess the quality of point forecasts.For year t0 ∈ T and sites s0 ∈ S the prediction metrics are defined as: where {x(s0, t0)}s 0 ∈S are the observed values for the one-year hold out set and |S| is the cardinality of S.

RESULTS AND DISCUSSION
The methodological framework elaborated above was applied to the ALT grids summarized in Table 1.We first applied the likelihood ratio test to spatial and spatio-temporal models to assess fit.Table 2 lists the log-likelihood for Models 2-4.Since all p-values are close to zero the likelihood ratio tests indicate a significant difference in model fit (p ≈ 0) between spatio-temporal models and their spatial counterparts for each of the eight sites.
We subsequently compared models based on the time-forward error measures at the various sites.Tables 3-5 list the PRMSE, PRMAE and PMPD of Models 1-4 for a one-year time forward period; Table 6 summarizes the relative difference in performance of the non-separable model to the other three.
The spatio-temporal models time-forward errors are lower than in deterministic and spatial stochastic models at seven of the eight sites.In particular, for all sites except at site U7A the non-separable model has the lowest PRMSE, PMAE and PMPD.
Using a paired Wilcoxon non-parametric permutation test with exact p-values we assessed whether systematic differences in predictive performance between sites existed for Models 3 and 4. The test indicated the PRMSE, PMAE and PMPD for the Model 3 and Model 4 were not statistically significant as the p-values for the PRMSE, PMAE, and PMPD were all p > 0.05.In contrast, the test indicated a significant difference between spatio-temporal models and Models 1 and 2. Their p-values are summarized in Tables 7-8.
Two implications follow.First, the inclusion of a microscale stochastic temporal component within the model explains variability not captured by deterministic models.The importance of the stochastic temporal component can be seen when comparing the point prediction errors of Models 1 and 2 to Models 3 and 4.
Prior to the inclusion of the temporal component, Models 1 and 2, have similar prediction errors.Subsequent to the inclusion of the stochastic temporal component the reduction in PRMSE, PMAE and PMPD of the spatio-temporal models to naïve and spatial stochastic models becomes significant.To compare, the separable and non-separable stochastic spatio-temporal model are on average within 14.2% and 13% of the one-year time forward true values at the predicted eight sites whereas the deterministic, spatial stochastic Matérn and spatial stochastic powered exponential models averages time-forward predictions across the eight are within 19.3%, 20.0% and 20.4% of the true values.The nonseparable spatio-temporal model's average PRMSE, PMAE and PMPD across the eight sites are identified as the smallest among the models.Relative to the naïve model, the increase in average predictive accuracy across the eight sites for the three error metrics is 33.3%, 36.2% and 32.5%.
Second, when spatio-temporal interaction is present, predictive performance may increase.Table 9 lists the values of the interaction parameter β for the eight sites.At sites U3, U11A and U12A, where spatio-temporal interaction is present, there are predictive performance gains for the non-separable spatio-temporal model relative to the separable spatio-temporal model.The most noticeable decrease in predictive errors occur at sites U11A and U12A, where the landscapes are homogeneous and the spatio-temporal interaction is detectable.The relative decrease of the PRMSE, PMAE and PMPD of the non-separable model to the separable model are by 20.6%, 27.9% and 28.7%, respectively.

CONCLUSIONS
This study introduces a methodological framework for modeling ALT, using four models that are the sum of a systematic trend and stochastic component.The naïve model, against which we compare the other models, assumes ALT is completely determined by the global trend, whereas the other models characterize microscale variability through either spatial or spatio-temporal dependency.We assessed the models' predictive accuracy at eight sites which possess different degrees of sparsity for a one-year forward period using root mean squared error, mean absolute error, and mean performance difference.
The spatio-temporal models significantly reduce the time-forward errors metrics when compared to both deterministic and spatial stochastic models.The temporal stochastic component subsequently plays a role characterizing active layer thickness dynamics and decreasing prediction point errors.
Despite grid sparsity the spatio-temporal model was able to capture the residual variability through its temporal component, and generate predictions superior to those of the deterministic and spatially stochastic models.The implication for researchers is that complete grids may not be necessary to characterize ALT dynamics, possibly resulting in decreased setup and operational costs.When landscapes are homogeneous and interaction is detectable measurement error is further reduced.
The largest limitation in the current approach is that a non-separable Gneiting Cauchy-Matérn could not be implemented in the analysis given the limitations of the package CompRandFld.Had the option for a non-separable Gneiting Cauchy-Matérn been available the analysis would have allowed comparison of nested spatial, separable and non-separable models.
A future route of research is to develop a full stochastic model that considers not only ALT, but other variables such as temperature, snow coverage, relative humidity and solar radiation across space simultaneously.Some of the multivariate spatial models currently of interest in the statistics community may also be useful to this end (Apanasovich andGenton, 2010, Apanasovich et al., 2012).
Figure 1.Site Locations

Table 6 .
Average Performance of Models 1-3 to Model 4

Table 7 .
Model 3 -p-values for paired Wilcoxon test

Table 9 .
Interaction Parameter of Model 4 at Different Sites