TOWARDS HIGH RESOLUTION FEATURE MAPPNG WITH SENTINEL-2 IMAGES

: High resolution feature mapping from medium resolution imageries gained special attention among remote sensing user community with the launch of Copernicus’ Sentinel-2 mission due to its capability to provide global coverage with relatively high revisit time at no cost. In this paper, we have examined and evaluated the potential of high resolution (2.5m) feature mapping from Sentinel-2 imageries with the aid of artificial intelligence. Generative adversarial network (GAN) is used as single image super resolution (SISR) technology in this study. And SPOT satellite imageries are used as corresponding high-resolution images. From qualitative and quantitative analysis of the experimental results found that spectral quality of the generated images is adequate for remote sensing applications. In conclusion, high resolution feature mapping from Sentinel-2 images found to be feasible to a greater extent for remote sensing applications.


INTRODUCTION
Sentinel-2 satellite mission developed within Copernicus program is a joint initiative of European Space Agency, European environment agency and the European commission in order to provide operational information of earth for environmental applications (Romeo et al., 2020).With the launch of Sentinel-2A satellite in 2015 it gained special attention of remote sensing community due to its open data distribution policy.Its popularity further extended with the launch of Sentinel 2B due to its high spectral resolution, global coverage, and relatively high revisit time.Since then, Sentinel-2 is an indispensable data source for larger scale studies.Sentinel-2 provides 13 spectral bands ranging from shortwave infrared to visible (Table 1).
Among them red (R), green (G), blue (B) and near infrared (NIR) bands have the highest ground resolution of 10m.However, even the spatial details (Figure 1) in those bands are not adequate for post processing applications such as damage detection or feature extraction.Therefore, the resolution improvement of Sentinel-2 (S2) images got special attention considering the cost accompanied with the high resolution (HR) images for aforementioned applications and disaster mapping (e.g., flooding, landslides) in large scale.Moreover, the recent advancements in deep neural networks emphasized the potential of resolution improvement when pan sharpening techniques are not supportive due to unavailability of the panchromatic bands.
Available single image super resolution (SISR) techniques for S2 imageries are limited to conventional interpolation methods and different network architectures from standard convolutional neural networks generative adversarial networks (GANs).Among the neural network-based methods, GAN based methods got least attention due to concerns over spectral distortions of generated images (Kapilaratne et al., 2022).However, several recent studies (Romeo et al., 2020;Galar et al., 2019;Mehmood, 2019) have been attempted to examine the applicability of GAN based methods for resolution improvements in remote sensing imageries where adequate attention has not been given to the applicability of the generated images for remote sensing applications.Therefore, this study investigated and examined potential of the GAN based super resolution model as a resolution enhancement method (by 4 folds) for remote sensing applications with special emphasis on disaster monitoring.

Single Image Super Resolution Using GAN Models
The main idea of GAN-based SISR is to train a generator network to produce HR images from low resolution (LR) inputs, while a discriminator network is trained to distinguish between real HR images and those generated by the generator network (Figure 2).This process leads to the generator network learning to produce  Where only WV3 images were used for both HR and LR images.
The main objective was to examine and evaluate the capabilities of the model to restore the original resolution images from the down sampled images with minimal impact to its remote sensing value.Due to limitation with available data samples for model training, model capabilities to super resolve S2 images were unable to examine with WV3 and S2 image pairs.Further, the qualitative and quantitative evaluation was mainly focused on the overall performance of the generated image with respect to the corresponding original HR image where the detailed spectral quality analysis was not carried out.
Therefore, the main focus of this second phase is to investigate the applicability of ESRGAN model as a general method to super resolve S2 images for remote sensing applications.Two experiments were designed to evaluate the model performance on true colour (TC) and false colour (FC) images.Where, the TC images prepared with Red, Green and Blue bands assigned to RGB layers and FC images with assigning Red, Green and NIR bands to aforementioned layers.Through these experiments authors expected to cater a broader remote sensing user community for comprehensive analysis using largest earth observation satellite data sets available at no cost with special emphasis on disaster mapping.

Network Architecture
As mentioned in the above section this study adopted the GAN (Goodfellow et al., (2016)) based ESRGAN (Xintao et al., 2018) model due to its novelty and proven success.As the name depicts ESRGAN is an enhanced version of super resolution generative adversarial network (SRGAN) by means of network architecture, adversarial and the perceptual loss.As per the original paper, through those advancements ESRGAN model able to overcome the prominent blurring effects found at images super resolved through SRGAN model.

Datasets and Study Area
SPOT satellite data is used as the HR counterpart to super resolve S2 images considering the closeness of the central wavelengths ( Both true colour experiment (TC EXP) and false colour experiments (FC EXP) were carried out using the listed datasets in Table 3. Common image samples were used for both experiments at training validation and testing phases in order to make a general conclusion about experiments.All HR and LR datasets used to train ESRGAN model are captured in Japan covering area of 589 km 2 .Among them 1463 image samples (480*480 pixels) utilized for training and validation phase with 8:2 ratio respectively.Moreover, satellite image captured around Ichihara city of Chiba prefecture Japan specifically selected for test phase considering its zero-day lag between HR and LR images.

Location
Imaging

Evaluation Methodology
Performance is evaluated qualitatively and quantitatively.
Qualitative evaluation is mainly focused on correctness of image features such as building footprints, roads, vegetation and soil textures etc. of generated images with respect to the corresponding high resolution ground truth (GT) samples.
Following 03 spectral quality measuring indices are used for quantitative evaluation.Those are spectral angle mapper (SAM), spectral information divergence (SID) and Pearson correlation coefficient (CC).

SAM(X, Y) = arccos ( 𝑋. 𝑌 |𝑋| 2 |𝑌| 2 )
Where, X and Y are generated image and corresponding ground truth respectively.To preserve the conciseness of the manuscript quantitative results of the validation phase are excluded.Authors would like to mention that the model performance on validation phase found to be better than that of the test phase.Image classification, land slide extraction and flood mapping applications are carried out only on false colour images as it found to be the best band composition for those applications.However, other evaluation results are summarized on both experiments where it necessary.

RESULTS AND DISCUSSION
This section presents the model performance at validation and test phase.Analysis has carried out and results are summarised qualitatively and quantitatively as mentioned in the previous section.For validation phase only the qualitative results and analysis has been attached to preserve the conciseness of the manuscript.Quantitative analysis on spectral quality assessment results for both FC and TC experiments are given in terms of mean value of the evaluation indices presented in sub section 2.4.

Validation Results
This sub section discusses the qualitative results achieved at validation phase of each experiment.Results are presented in forms of figures.To demonstrate different perspectives, four different image tiles were selected for the qualitative analysis of the validation phase results of FC and TC experiments.From Figure 8 and 9 results it is obvious that the models are successfully restored HR information for the available details in LR images.Its relatively hard to differentiate the model output and the real ground truth image at first sight.However, when it comes to very finer details restoration such as solar panels from Figure 8 and sports stadium of Figure 9 demonstrate the models need to be further improved based upon user requirements.Model performance on generating other details such as roads, buildings and vegetation replication is adequately enough.

Test Results
This study investigated the potentials of ESRGAN models for super resolving images by 4x.Ground resolution of S2 satellite image is 10.0 m.Therefore, pan sharpened SPOT images were down sampled into 2.5 m during data set creation.Test phase results are arranged in such a way that qualitative results followed by quantitative results.For a comprehensive analysis of spectral quality of the generated images with respect to the ground truth images, histogram analysis is also included for the corresponding image samples.Overall (Figures 10 and 11), realistic images have been generated with less artefacts.However, the blurring effects of buildings and vegetation are somewhat remained.Moreover, a noticeable colour tone difference has been observed with vegetation and the soil in generated images in TC experiment.Similar observation has been found with corresponding FC image samples as well.It is obvious that some linear features such as building footprints and asphalts which may sub pixel level details in original S2 images are unable generated correctly thus, one to one feature comparison with corresponding ground truth samples is unlikely.
From prepared scatter plots (Figure 12) with generated images along with respective ground truth samples found that the points tend to scatter slightly away from the diagonal axis when the majority of the image pixels are covering impervious areas.
Whereas points tend to scatter along the diagonal axis with the absence of those land use pixels.For a comprehensive analysis on the spectral quality of generated images with respect to the original S2 image and corresponding HR SPOT image, spectral profiles for 07 selected land use categories (including built-up, asphalts, forest covers etc.) are created.From Figure 13, it is observable that the comparatively higher spectral discrepancy obtained for built-up and asphalt classes while minimum is for water features and vegetation land use classes.Thereafter as a quantification measure of observed spectral discrepancies, widely used spectral quality assessment indices are utilized.
Figure 13.Spectral profiles of selected land use types.

Quantitative Analysis:
In this section, comprehensive analysis with specific emphasis on spectral quality of the generated image was carried out.Spectral indices value of quantitative analysis of both TC and FC experiments are given in terms of mean of the indexes mentioned in the text.
As a comparison measure, quantitative evaluation is carried out for both TC and FC experiments on categorized image tiles based on their key land use types (bare lands, urban lands and forest covers) as shown in Table 4. Image tile categorization is based on the thresholds (0.5) set up on normalized difference soil index (NDSI) and normalized difference vegetation index (NDVI) along with manual confirmation of the presence or the absence of corresponding land use types.According to the Table 4 results based on spectral quality indices, generated images tiles with vegetation and bare lands can be used for remote sensing analysis.However, model performance of built-up areas especially in true colour experiment need to be further improved in the context of spectral quality for safer use on remote sensing applications.As mentioned in the previous section validation phase results found to be well within the range for all 03 indices for safer use in remote sensing analysis.Consequently, it is acceptable to mention that the further model training with qualitatively and quantitatively improved data set may lead to a better performance.From the qualitative analysis authors found that extraction omission has improved through super resolving process particularly at narrow debris flows and some large-scale landslides are not properly extracted from super resolved image in comparison to the original S2 image (Figure 17).In accordance with the qualitative analysis, accuracy assessment represents the similar trend with lower recall and F-score and slightly improved precision scale (

CONCUSIONS AND FUTURE WORK
This contribution has examined and evaluated GAN based algorithm for resolution improvements of S2 imageries for high resolution feature mapping.With special emphasis on spectral quality of the generated images, SPOT imageries are used as HR counterpart for S2 images considering the similarities of central wavelengths of both sensors.In order to generalize the train model to be able to apply at various conditions no stricter data filtering criterion is incorporated except the exclusion of image tiles with clouds.TC and FC experiment results demonstrate that the spectral quality of the generated images is fairly adequate for widely used remote sensing applications.However, from the spectral quality assessment and the tested remote sensing applications found that the super-resolved image tiles covering urban landscapes need to be further improved for accurate mapping purposes.Moreover, authors have found that the SAM index relatively more sensitive to the spectral quality deviations on urban land uses than other two tested indices during this study.Therefore, it is expected to introduce aforementioned index to the existing perceptual loss function as a measure of spectral quality control of the generated images as a future perspective of this study.Further, authors are expected to extend the usability of the
visually similar to real HR images, resulting in high quality image super resolution.In general, GAN-based SISR methods have been shown to outperform traditional interpolation-based methods in terms of both visual quality and quantitative performance measures such as Peak Signal to Noise Ratio (PSNR) and Structural Similarity Index (SSIM).Moreover, GAN-based SISR methods have the ability to generate diverse HR images for a given LR input, which makes them more suitable for various applications such as image restoration, satellite and medical image processing.

Figure 2 .
Figure 2. Overview of the generative adversarial networks This study is the second communication of series of studies to develop generalized SISR technique based on GAN models for resolution improvements without degrading the quality and value of the satellite imageries.During the first phase (Kapilaratne et al., 2022), authors have examined the potential of GAN based super resolution model Enhanced Super Resolution Generative Adversarial Network (ESRGAN) for resolution improvements without degrading the quality and value of satellite imageries.Where only WV3 images were used for both HR and LR images.The main objective was to examine and evaluate the capabilities of the model to restore the original resolution images from the down sampled images with minimal impact to its remote sensing value.Due to limitation with available data samples for model training, model capabilities to super resolve S2 images were unable to examine with WV3 and S2 image pairs.Further, the qualitative and quantitative evaluation was mainly focused on the overall performance of the generated image with respect to the corresponding original HR image where the detailed spectral quality analysis was not carried out.

Figure 3 .
Figure 3. Selected image pair used during training and test phase (a) SPOT image captured around Joban express way (b) Corresponding S2 image (c) SPOT image observed around Ichihara city (d) Corresponding S2 image.

Figure 4 .
Figure 4. Overview of the model training phases of TC and FC experiments.

Figure 5 .
Figure 5. Overview of dataset preparation for pre-train model with SPOT-6/7 images only.

Figure 6 .
Figure 6.Overview of the dataset preparation for acutal model training.
SID(x, y) = ∑ [   (, ,   ) ∑   (, ,   ) , ,   ) ∑   (, ,   )  =1 ] [   (, ,   ) ∑   (, ,   )  =1 −    (, ,   ) ∑   (, ,   )  =1 ] Where, IX : generate image IY : ground truth image I(x,y,gi): spectral reflectance at pixel of image of band i m*n*l: image dimensions x = 1,…,m y = 1,…,n gi (i= 1,…,l) In order to provide a comprehensive analysis on the usability of the generated images for remote sensing applications authors have compared scatter plots of generated images and corresponding GT samples along with spectral profiles on important land use types those can be generally found on satellite images.Finally, an object-based image classification, a land slide extraction (satellite image taken after a 2018 heavy rainfall event in Northern Kyushu) and an inundation extent extraction (image taken after 2018 heavy rainfall in Kurashiki city of Okayama prefecture Japan) are carried out to quantitatively evaluate the usability of super-resolved images for remote sensing applications.Support vector machine (SVM) algorithm is used as classification method and the pre trained UNet model with FC pre and post disaster Pleiades (0.5m) and WV images are used for land slide extraction.Flood area extraction is carried out with UNet model originally trained with FC SPOT6/7 images.Image classification is carried out for 05 classes including bare lands as most of the bare lands correspond to landslide on selected satellite image.As a measure of accuracy, user accuracy (UA) is used as it is considered as the reliability of the classified maps.Whereas, recall (completeness), precision (correctness) and F-Score is evaluated for the extracted landslides and inundation extents from original S2 images and corresponding super-resolved images.

Figure 7 .
Figure 7. Description of accuracy matric notations. () =   +   () =   +   −  = 2  + Analysis: The performance of trained models at each experiment was assessed by visually comparing the super resolve image output of the model with HR ground truth images along with LR input image.
Model performances at test phase were assessed by visually comparing the inference results with their respective ground truths.Figures 10 and 11 represent the TC and FC experimental results respectively.It is revealed from the visual inspection that both models were successful in resolving S2 images into SPOT level at small scale.

Figure 12 .
Figure 12.Scatter plots of TC and FC experiments (scatter plots corresponds to image tile used in Figure 10 & 11 respectively) that the observed discrepancy in georectification of S2 and SPOT images might also be contributed for the relatively larger spectral discrepancy in impervious area.Table 4. Spectral quality assessment with widely used spectral indices for TC and FC experiments.Therefore, authors realized the importance of evaluating the usability of generated images for remote sensing purposes through real world applications.Consequently, 03 feature extraction tasks are incorporated to this study.Subsequently, performances are evaluated with respect to the corresponding feature extraction capabilities to S2 images.03 tested applications are consisting of image classification experiment along with 02 disaster mapping tasks.Object based classification and the landslide extraction is carried out using satellite image captured after Northern Kyushu heavy rain fall event occurred in 2018 covering Asakura city Japan.Support vector machine algorithm is used to classify original S2 image (Figure14), and corresponding super resolved (SR) (Figure15) image (2.5m) for urban, water, forest, grasslands, and bare lands.Accuracy assessment results demonstrate that the super resolving process has improved the classification accuracy for all classes except for forest category.And a significant improvement is observed for bare land category which is mainly consists of landslides on tested image.

Figure 15 .
Figure 15.Classification results of super-resolved image.

Figure 17 .
Figure 17.Super-resolved post-disaster S2 image with overlayed extracted landslides from super resolved image.

Table 2 )
of two satellite sensors in comparison to other HR options such as WV3 or Pleiades images.

Table 2 .
Comparison of central wavelengths of SPOT and WV3 sensors with Sentinel-2.
Summary of used satellite image datasets.

Table 6 )
. Authors assume the performance reduction in super resolved image might cause due to robustness incompetency of the trained UNet model as well.Hence a flood area extraction is carried out with a trained UNet model with SPOT imageries.Due to 10 days lag of disaster occurrence and image capturing, flood water remained only on water features such as river and ponds.
for multichannel usage and examine the usage of other satellite image options with original resolution is very much similar to the super resolved S2 imageries by four folds to alleviate the bias caused by the difference in level of details (LoD) in used HR images. model