Spatiotemporal Mining of Time-Series Remote Sensing Images Based on Sequential Pattern Mining

With the continuous development of satellite techniques, it is now possible to acquire a regular series of images concerning a given geographical zone with both high accuracy and low cost. Research on how best to effectively process huge volumes of observational data obtained on different dates for a specific geographical zone, and to exploit the valuable information regarding land cover contained in these images has received increasing interest from the remote sensing community. In contrast to traditional land cover change measures using pair-wise comparisons that emphasize the compositional or configurational changes between dates, this research focuses on the analysis of the temporal sequence of land cover dynamics, which refers to the succession of land cover types for a given area over more than two observational periods. Using a time series of classified Landsat images, ranging from 2006 to 2011, a sequential pattern mining method was extended to this spatiotemporal context to extract sets of connected pixels sharing similar temporal evolutions. The resultant sequential patterns could be selected (or not) based on the range of support values. These selected patterns were used to explore the spatial compositions and temporal evolutions of land cover change within the study region. Experimental results showed that continuous patterns that represent consistent land cover over time appeared as quite homogeneous zones, which agreed with our domain knowledge. Discontinuous patterns that represent land cover change trajectories were dominated by the transition from vegetation to bare land, especially during 2009–2010. This approach quantified land cover changes in terms of the percentage area affected and mapped the spatial distribution of these changes. Sequential pattern mining has been used for string mining or itemset mining in transactions analysis. The expected novel significance of this study is the generalization of the application of the sequential pattern mining method for capturing the spatial variability of landscape patterns, and their trajectories of change, to reveal information regarding process regularities with satellite imagery. * Corresponding author


INTRODUCTION
Land use and land cover change are becoming increasingly recognized as important drivers of global environmental change (Turner et al., 2007).The characteristics of land cover can have important effects on the local climate, radiation balance, biogeochemistry, hydrology, and the diversity and abundance of terrestrial species (Randerson et al., 2006).Therefore, the study of land cover change is an important problem within the geoscience domain.Advances in earth observation technologies have led to the acquisition of vast volumes of accurate, timely, and reliable environmental data, which encompass wideranging information about the earth's land, ocean, and atmosphere (Karpatne et al., 2013).Remote sensing imagery consisting of satellite-based observations of the land surface, biosphere, solid earth, atmosphere, and oceans, combined with historical climate records and predictions from ecosystem models, offer new opportunities for understanding how the earth is changing, determining the factors causing these changes, and predicting future changes (Boriah et al., 2009).Therefore, remote sensing satellite imagery has emerged as the most useful data source for characterizing and quantitatively measuring landscape-scale land cover changes (Hudak and Wessman, 1998).
Detecting and characterizing change over time is the natural first step towards identifying the drivers of the change and understanding the change mechanism (Verbesselt et al., 2010).Satellite remote sensing has long been used as a means of detecting and classifying temporal changes in the condition of the land surface (Coppin et al., 2004;Lu et al., 2004).Satellite sensors are well suited to this task because they provide consistent and repeatable measurements on a spatial scale appropriate for capturing the processes of change (Jin and Sader, 2006).A given scene may be observed repeatedly from space, resulting in a times series of satellite images.The high spatial resolution of current sensors provides detailed information on spatial structures, which after a series of revisits, can be extended to spatiotemporal data structures.It follows that a time series of satellite images represents a highly complex data set that potentially contains valuable spatiotemporal information (Gueguen and Datcu, 2007).
Although the value of remotely sensed long-term data sets to change detection has been firmly established (De Beurs and Henebry, 2005), only a limited number of time-series changedetection methods have been developed.Most previous changedetection studies have relied primarily on examining the differences between two or more satellite images acquired on different dates (Boriah, 2010).These procedures can be categorized into three types.Procedures of the first category identify simple proportional differences of certain classes within a certain area between two points in time without being spatially explicit (Godoy and Contreras, 2001;Sierra, 2000).
The second category procedures involve the calculation of annual rates of change between end-member periods (Lambin and Ehrlich, 1997;Puyravaud, 2003).Methods in the third category quantify changes in the spatial configuration and composition of land cover on multiple dates using pair-wise comparisons (Mertens and Lambin, 2000;Schneider, 2012).One limitation of these methods is the inadequacy of detailed observation and discrimination of the process of land cover change.Actually, temporally, the change process of land cover types can be viewed as a trajectory, which highlights the dynamic character of change (Petit et al., 2001).By analyzing change trajectories, we can investigate the detailed dynamics of change processes (i.e., sequences of successive changes in land cover types).However, such methods used for analyzing change trajectories can usually be found for use with green cover; for example, vegetation (Cai and Wang, 2010), cropland (Liu et al., 2005), forest land (Chen et al., 2012;Gimmi et al., 2010;Kennedy et al., 2007;Lambert et al., 2011), and grassland (Dubinin et al., 2010;Dusseux et al., 2011) research.These methods are rarely applied in the change analysis of other land cover types.Therefore, there is an urgent need for a new universal technique for transforming remote sensing data into information on the patterns and processes of land cover change.The effective use of a satellite image time series to characterize and monitor land cover change trajectories requires the analysis of temporal variations in spatial patterns (Henebry and Goodin, 2002).When analyzing a sequence of items, one basic problem to be addressed is to find frequent episodes (Mannila and Toivonen, 1996.), or in other words, to extract regular patterns from temporal data.Therefore, there is a critical need for a method that can enable efficient and reliable characterization of spatiotemporal patterns contained in an image time series (Henebry and Goodin, 2002).Mining sequential and spatial patterns is an active area of research in artificial intelligence (Le Ber et al., 2006), and it has been used for string mining or itemset mining in transactions analysis.The objective of this study is the generalization of the application of the sequential pattern mining method for capturing the spatial variability of landscape and their temporal trajectories of change, to reveal information regarding process regularities with satellite imagery.In this study, a sequence of repeated satellite images of the same scene was used to establish the land cover change sequence set, and a typical sequence mining algorithm-the Continuous Association Rule Mining Algorithm (CARMA)used on this sequence set to analyze the land cover change in the form of change patterns.This article is organized as follows.Section 1 discussed the significance of land cover change and introduced the objectives of this study.Section 2 covers the sequential pattern mining method and Section 3 presents a case study and an analysis of the results.Finally, Section 4 discusses the principal findings and offers our conclusions.

METHODOLOGY
The methodology used in this study includes remote sensing image processing, spatial analysis, and image data mining.In addition to image pre-processing and remote sensing image classification, a sequential pattern mining method is applied in the analysis of land cover change, which forms the main part of the research.Therefore, in this section, the sequential pattern mining method based on the CARMA algorithm is introduced.First, some basic concepts of pattern mining are provided in Section 2.1, followed by an introduction to the CARMA algorithm in Section 2.2.

Basic concepts
Let I = {I 1 ,I 2 ,…,I p } be the set of all items; a set of items is referred to as an itemset and a sequence is an ordered set of one or more itemsets.For example, in a sequence s = <e 1 ,e 2 ,…,e j >, itemset e 1 shows before e 2 , and e 2 shows before e 3 , and so on.Itemset e j is also an element of the sequence s denoted as (x 1 ,x 2 ,…,x q ) in which x q ∈ I .A sequence that contains k itemsets is a k-sequence.If there exists 1 sequences is referred to as a sequence set.In a sequence set, if a sequence s is not contained in any other sequence, then s can be called the largest sequence.Additionally, in the sequence set, the number of sequences that contain s is known as the support of s (written as sup(s)).In the mining process, if a sequence satisfies the pre-determined minimum support, then it is a frequent sequence.Sequential pattern mining is the mining of the largest sequences from the frequent sequences in a sequence set, and the sequences found by sequential pattern mining can be called the sequential pattern.In a sequential pattern s = <e 1 ,e 2 ,…,e j >, sequence s' = <e 1 ,e 2 ,…,e j-1 > is called the antecedent and sequence s" = <e j > is called the consequent.
The confidence for a sequential pattern can be defined as the ratio of sup(s) to sup(s').The confidence for each generated sequential pattern should be calculated and those sequential patterns for which confidence is larger than the minimum confidence threshold are recognized as more interesting (or more useful for us) than the other sequential patterns.

First part of CARMA
For the first step, the main target is to form a set V of all potentially large itemsets in a lattice.There are three parameters for each itemset in the lattice: Count(v), firstTrans(v), and maxMissed(v).The meanings of these three parameters are introduced below and shown in Fig. 1: Count(v): the number of occurrences of itemset v since v was inserted in the lattice.firstTrans(v): the index of the sequence data at which v was inserted in the lattice.maxMissed(v): upper bound on the number of occurrences of v before v was inserted in the lattice.
The construction of the lattice is shown in Table 1: Table 1.Example of the lattice In Figure 1, t 1 ,t 2 ,…,t n are the sequences in the database.When t j was under the scan process, itemset v was inserted into V. Suppose we are reading sequence t i ; therefore, maxMissed(v) means the number of occurrences of v from t 1 to t j-1 , the value of firstTrans(v) is j, and Count(v) means the number of occurrences of itemset v from t j to t i .For any itemset v in the lattice, we get a deterministic lower bound Count(v)/i and upper bound [maxMissed(v) + Count(v)]/i.We denote these bounds by minSupport(v) and maxSupport(v), respectively.
Figure 1.Scan of the equence database First, we initialize V to {ø} and set Count(ø) = 0, firstTrans(ø) = 0, and maxmissed(ø) = 0. Thus, V is a support lattice for an empty sequence.Suppose V is a support lattice up to sequence i-1, we are reading the i-th sequence t i , and we want to transform V into a support lattice up to i; there are three steps to go through.
We insert a subset v of t i into V, if and only if all subsets w of v are already contained in V and satisfy maxSupport(w) ≥ σ i (where σ i is the current user defined support threshold).As v is contained in the current sequence t i , let Count(v) = 1, firstTrans(v) = i, and we compute the value of maxMissed(v).As w is a subset of v, we obtain maxSupport(w ]/i and Count(v) = 1; therefore, we obtain maxMissed(v) ≤ maxMissed(w) + Count(w) -1.When we are inserting a subset v into V, the set v is not yet contained in V. Hence, the support of v for the first (i-1) sequences satisfies , where v means the number of items in v.In addition, considering maxMissed(v) = support i-1 (v) × (i -1), we obtain And third, We compute maxSupport = (maxMissed + Count)/i for each itemset v of V when every k sequences (the value of k is defined by the user) are scanned.For any itemset v whose maxSupport < σ i , we delete v from V.

Second part of CARMA
For the second step, the main aim is to scan the sequences a second time and generate sequential patterns based on the frequent sequences found in the first part of CARMA.In the second step, we compute the precise support of all itemsets v in V and continually remove itemsets with maxSupport < σ n where σ n is the last threshold of minSupport.While performing the scanning, all itemsets v of V are checked and the parameters associated with v updated.Two situations may arise:first, if firstTrans(v) < i, then v is considered as a large itemset.If the current sequence index is past firstTrans for all itemsets in the lattice, the second part of the CARMA algorithm stops.And second, if the current sequence contains itemset v of V, we set Count(v) = Count(v) + 1 and maxMissed(v) = maxMissed(v) -1, and if firstTrans(v) = i, we set maxMissed(v) = 0.However, setting maxMissed(v) = 0 for an itemset v, might yield maxSupport(w) > maxSupport(v) for some superset w of v. Thus, we set maxMissed(w) = Count(v) -Count(w) for all supersets w of v with maxSupport(w) > maxSupport(v).We also remove the itemsets v from V with maxSupport < σ n .
From all the above, it is clear that CARMA only requires two scans of the sequences to obtain the sequential pattern.

Data
A time series of Landsat TM images, path 123/row 32, for 2006-2011 were selected as the data source ( As cloud and fog can seriously affect the results of the mining, we chose the northern part of Beijing as the study area, which was clear in all six images (Figure 2).

Experimental Procedure
For the rational and effective analysis of land cover changes, after the image pre-processing, we firstly classified the six timeseries images of the study area into land cover maps.Secondly, based on the land cover map, we constructed the image sequence set within which each sequence is a land cover class trajectory at pixel level that is described through the classified images assembled in the time series.Thirdly, we applied the sequential pattern mining algorithm to the image sequence set to search for sequential patterns.Finally, we analyzed some interesting sequential patterns to reveal the trajectory of land cover change and evaluated the degree of change.The flowchart of the experimental procedure is shown in Figure 3.

Pre-processing and classification
The pre-processing steps included atmospheric correction, geometric correction, and data format conversion.Landsat TM image from 2011, obtained from the United States Geological Survey, is orthorectified and therefore, it was selected as the master or reference data for geometric correction.Ground control points were collected from the reference image for rectifying the remaining images from 2006 to 2010 (i.e., relative registration) (Kennedy and Cohen, 2003).In this paper, the Gauss-Kruger projection was used in the geometric correction, and a minimum of 20 evenly distributed ground control points were selected to ensure geometric precision of 0.5 pixel (<15 m) for all images.Thus, all the data were arranged in the same coordinate system to form a data set with consistency and integrity, suitable for spatial and sequential comparative analyses.
For image classification, we used eCognition software, and for visualization, the band composite included bands 5, 4, and 3 in the TM data, shown as red, green, and blue, respectively.Because differences and disagreements may appear in the classification process when interpreting land cover types, the classification for all six remote sensing images was undertaken by a single expert in a manner combining software and manual techniques.The land cover types were classified into four categories: built-up area, vegetation, bare land, and water bodies, using a modified Anderson land cover classification scheme (Anderson, 1976).After all the pre-processing steps, the original digital number values for every pixel in the six images were transformed into the value of land cover type.Then, the land cover types or classes were converted into symbols.The experiment used four characters "1", "2", "3", and "4" as a representation for built-up area, bare land, vegetation, and water bodies, respectively.The classification results are shown in Figure 4 and the statistical results for the classification are shown in Table 3.Because of the lack of quality ground data, it is difficult to quantify the error in classifications at the pixel level (Foody, 2002), especially for multi-temporal analysis (Lunetta, 1999).
Although the classification accuracy of year 2011 and year 2010 nearly reached 87% and 85%, the degree of uncertainty is not possible to quantify.

Construction of land cover change sequence set
An image time series portraying the same scene can be transformed into a landscape trajectory by decomposing the sequence image-by-image and projecting it as a time-ordered series of coordinates in a pattern metric space (Henebry and Goodin, 2002).In our research, as land classes were converted to symbols, a categorical land cover change sequence set that contains the pixel history, or the land cover trajectory, at the pixel-level, was created by obtaining each sequence for every pixel transition.Suppose the classification results for the pixel located in position (1, 1) in the image is vegetation in year 2006, vegetation in 2007, vegetation in 2008, bare land in 2009, bare land in 2010, and built-up area in 2011; then, the land cover change sequence for this pixel can be denoted as "333221".Therefore, the land cover change sequence set can be generated by copying the land cover change sequence sequentially for each pixel in the study area from the beginning of the image to the end.

Sequential pattern mining
To identify the typical land cover changes within the six-year period, sequential pattern mining was performed on the constructed land cover change sequence set.In the mining process, the two most important parameters are the support and confidence of the sequence mode.We selected a number of different combinations to establish the most appropriate support and confidence values and tested the resultant sequential patterns, as shown below.
Figure 5. Performance comparison of different values for the CARMA parameters("support" and "confidence") In Figure 5, the numbers of generated patterns of different support tend to coincide as confidence increases.Therefore, in subsequent experiments, we selected a confidence level of 40% and a support rate of 0.02% as the parameters for the sequential pattern mining.To select the most representative land cover change patterns in the different periods, the resultant patterns were subdivided into two types.If the support rate of a pattern was more than 0.1%, it was considered as a selected pattern; if not, it was not selected.Furthermore, to characterize the direction of change, a distinction was made between continuous and discontinuous patterns within the selected patterns.Continuous patterns are characterized by pixels that belong to the same class (e.g.: 111111 and 222222), whereas discontinuous patterns are characterized by pixels that change class (e.g.: 111112, 121211) through time.

Analysis
For the study area, the mining process over the six images led to the identification of 118 sequential patterns, each with their own proportion.Table 4 presents the top ten land cover pixel trajectories, the support rates for which were more than 0.1%, and which were selected as land cover patterns.Among all the resultant sequential patterns, the selected top 10 patterns ac Count for 68.44% of the land cover change of the entire study area (Table 4).The remainder is spread among the other 108 patterns.Table 5 shows     Although the support and confidence in the mining process can help identify the sequential pattern, alone they are insufficient.Not all the resultant patterns can be considered as meaningful (i.e., to reflect the needs and interests of a particular user).
Therefore, an analysis of pattern interestingness is essential for the resultant patterns.Typically, only the user can make a judgment of the degree of pattern interestingness, and this judgment is subjective.As the image sequence mining is a datadriven model, it requires expert knowledge to analyze and interpret the semantic meaning of the generated sequential patterns.Therefore, we describe just a few interesting patterns for demonstration purposes.
Among the selected patterns, three are continuous from 2006 to 2011 and they jointly represent 59.71% of the study area.Therefore, the majority of the study area has remained in the cluster to which it belonged in 2006 for the duration of the entire time series.Considering the continuous pattern "111111" as an example, the spatial distribution of this pattern is shown as the pixels within the red area in Figure 6.The support of the pattern is 12.83%, and it shows that 86.9% of the built-up area of cluster 1 in 2006 still belonged to that same cluster in 2011, and that 38.59% of the built-up area in 2011 originated from that same cluster in 2006.Thus, the current land area share of the same cluster can have different histories.Thus, this method can provide new ideas and approaches for the study of urban development.
Figure 6.Spatial distribution of the continuous pattern "111111" (in red) Considering the discontinuous pattern "442444" as an example, the support of the pattern is 0.17% and its spatial distribution is shown in Figure 7.   Compared with other years, the water storage of the Miyun Reservoir at the end of 2008 was significantly higher, which is not consistent with the obtained pattern "442444".However, according to the Beijing Water Authority, the water level of the Miyun Reservoir was reduced to a minimum of 853.1 million cubic meters by the end of June 2008.The main flood season began in late July and by September 6, the water volume of the Miyun Reservoir exceeded one billion cubic meters.Therefore, because of the lower water level at the time of image acquisition, some additional bare land was exposed, which is reflected in "442444" pattern in the sequence mining results.

CONCLUSION
The method developed in this study has allowed the identification, classification, and spatial localization of land cover types and their trajectories of change for a temporal series.It quantified the land cover changes in terms of the percentage of area affected, as well as mapping the spatial distribution of these changes.It has also provided a different measure for the description of land cover change according to their current characteristics and history.The expected novel significance of this study is the generalization of the application of the sequential pattern mining method for capturing the spatial variability of landscape patterns and their trajectories of change, to reveal information regarding process regularities with satellite imagery.
Although the presented case study clearly demonstrates that the sequential pattern mining method is a promising analytical tool for spatiotemporal data analysis, a number of issues warrant further investigation.As with other studies using historical data for studying landscape changes, the availability and quality of the data, their classification, and analysis all influence the typology of the landscape patterns and of the changes detected (Antrop, 1998).Discovering interesting patterns is also an important requirement in this field and in future research; we intend to develop interestingness and mining methods that are more sophisticated, to improve the utility and efficiency of applying sequential pattern mining to remote sensing data.Moreover, we will try to set a threshold of minimal covered area used to clump the isolated results, to avoid the outliers and other minority patterns which would introduce errors and further problems to understand the results.And most important of all, as the more snapshots included in the time series, the more complex the pattern code will be.So, providing more efficient method to dissolve and understand the pattern code would be the major challenge.

Figure 2 .
Figure 2. Overview of study area, (a) location of Beijing within the People's Republic of China, (b) location of the study area within Beijing

Figure 3 .
Figure 3.The flowchart of the experimental procedure

Figure 4 .
Figure 4. (a) Original satellite image of the study area, (b)-(g) the classification results of the study area from 2006 to 2011 the composition of land cover change in the selected top 10 patterns for each year, which reveals that the land cover change trajectories are dominated by the transition from vegetation to bare land, especially during 2009 to 2010.It also shows that during 2008 to 2009, the highest percentage (7.83%) of the area was affected by land cover change.

Figure 7 .
Figure 7. (a)Spatial distribution of the discontinuous pattern "442444"(in red), (b) Partial enlarged image detail of (a)In Figure7, the pixels within the red area represent the pattern "442444" and they are mainly located on the waterside of Miyun Reservoir.The "442444" pattern means that since 2006, except for 2008, the area represented by this pattern belonged to cluster 4 (water bodies).However, in 2008, this area belonged to cluster 2 (bare land).In other words, these areas were not covered by water at the end of May 2008 (the acquisition time of the image was May 30, 2008).Table7shows the end-of-year water storage of the Miyun Reservoir from 2006 to 2011, according to the "Beijing Water Resource Bulletin."

Table 2 .
Table 2).The chosen acquisition time was June-July; this corresponds to the spring and summer for most areas in northern China, which are the best seasons for analyses of land cover change.Satellite data used in the experiment were all downloaded free of charge from two websites, http://ids.ceode.ac.cn/ and http://glovis.usgs.gov/.Acquisition time of the six Landsat TM images used in the experiment

Table 4 .
The resultant top 10 patterns

Table 5 .
Composition of land cover change in the selected top 10 patternsThe percentage area of a land cover type involved in land cover change is illustrated in Table6.From this table, we can conclude that the trajectories are dominated by transitions that contain vegetation, especially during 2009 to 2010, and that the most stable land cover type is built-up area.These conclusions are also interesting because they reflect our domain knowledge that vegetation is liable to change and built-up area are not.

Table 6 .
Different land cover types contained in the land cover change Table 7 shows the end-of-year water storage of the Miyun Reservoir from 2006 to 2011, according to the "Beijing Water Resource Bulletin."

Table 7 .
End-of-year water storage of the Miyun Reservoir