A DEEP LEARNING APPROACH USING VERY-HIGH SPATIAL RESOLUTION GAOFEN-2 IMAGES TO SUPPORT THE UNITED NATIONS SUSTAINABLE DEVELOPMENT GOAL INDICATOR 11.7.1 ASSESSMENT

: Since the proposal of the "2030 Agenda", the United Nations Sustainable Development Goal indicator 11.7.1 aims to calculate the accessibility and quality of urban open spaces(UOS). An accurate and rapid assessment framework of UOS is of great significance for urban sustainable development. Previous research on UOS has mainly focused on the evolution patterns of UOS, with little research on assessments of their accessibility for different population structures (i.e., men vs. women, young vs. older). In this study, a U-Net deep learning network was used for training from 3072 annotated samples of urban green spaces(UGS) which was created based on Gaofen-2 remote sensing images. The trained model was used to identify UGS within five districts of Beijing at sub-meter level, incorporated with Open Street Map and area of interest data. A spatial analysis was conducted for accessibility of UOS, finding that most of the UOS in the central urban area of Beijing can be reached within 10 minutes, but access to the eastern and western edges is poorer (more than 30 minutes). Finally, using Worldpop data, the accessibility of UOS was statistically analyzed for different ages and genders. The results show that UOS accessibility rate for the elderly and children reaches over 90% (10 minutes accessibility).


INTRODUCTION
In 2015, the United Nations 2030 Agenda for Sustainable Development (2030 Agenda) established 17 Sustainable Development Goals (SDGs) and 169 targets, with the aim of harmonizing the trinity of economic growth, social inclusion and environmental well-being (Colglazier, 2015).SDG 11.7.1 is" Average share of the built-up area of cities that is open space for public use for all, by sex, age and persons with disabilities".One important way to make cities more inclusive, safe and sustainable is to provide urban open spaces (UOS), which can provide many material and non-material benefits to residents through their environmental and social functions, and can improve the environmental quality of cities (Wai et al., 2018).In 2018, UN-HABITAT provided a technical paper describing the reference calculation steps and potential data sources for this SDG 11.Previous research on UOS includes studying on the morphological changes of UOS under rapid urbanization (Zhu and Ling, 2022), the relationship between dynamic growth of UOS and walk ability (Liang et al., 2021), as well as landscape characteristics of UOS at different resolutions (Toger et al., 2015).
The application of multiple methods has allowed for the identification of UOS in both temporal and spatial dimensions.
Urban green spaces (UGS) such as parks and gardens, and urban squares as a type of urban grey space, are both important elements of UOS.It is worth noting that there are various definitions suited to different research needs.In this paper, it is argued that UOS serves as a place of recreation and entertainment for residents, and therefore encompasses both urban green spaces and urban squares.
Earth observation, especially using high-resolution Furthermore, U-Net concatenates the each layer of the encoder to the decoder, significantly enhancing the accuracy of the segmented image information and ultimately results (Ali et al., 2017;Abascal et al., 2022 ).Thus, the utilization of high-quality (2) Multi-temporal Urban built-up area(UBA) datasets.
The study utilized data from the built-up areas of Beijing (Figure 1, (b)), delineated based on 30-m Global Artificial Impervious Surface data processed by Professor Gong Peng from Tsinghua University.This data was derived from high-resolution satellite imagery, which was used to identify impervious surfaces within urban areas (Shi et al., 2023).From 1900-2018, there are 7 years of urban boundary data available, and the 2018 data are used for this study.
(3) Spatially-detailed population dataset.The study utilized Worldpop population data for the year 2020 (Figure 1, (c)), with a spatial resolution of 100 meters().A top-down, constrained approach was used to predict population counts, allowing for accurate depiction of residential areas and buildings and resulting in precise population distribution estimates.This approach also helped to reduce the impact of uninhabited areas on the analysis.The input to the left side of the network is a 512x512 image.
Through paired 3x3 convolutions, the depth of the image is increased, followed by pooling operations to reduce the size of the image.At each downsampling step, the image is reduced by half, while the number of convolutional filters is doubled.In the right side of the network, four upsampling steps are performed, resulting in an output of the same size as the original image.
However, since deconvolution can only enlarge images rather than restore them, in order to reduce data loss, the approach is taken to crop the images from the left-hand side to the same size and concatenate them directly to the right-hand side to increase feature layers.Then, convolution is performed to extract features, and a 1x1 convolutional layer is used at the output layer to adjust the number of channels to match the number of object categories and obtain the segmentation result.walking or driving.Therefore, a designed driving probability is generated for determining the driving probability for each road (Table 1).Meanwhile, the walking speed on roads is set at 6 km/h, and off-road walking speed is set to 3.5 km/h.Finally, the time cost for one kilometer of road after revision is calculated.

Service range analysis
Secondly, it is necessary to filter and clip other data in OSM.
In the AOI surface data of OSM, there is a land use type named "park", which includes most parks and squares in Beijing and can be used as auxiliary information to determine UOS.

Accuracy Assessment
The Confusion Matrix serves as a fundamental tool in the evaluation of the predicted values with the Google Image Maps (3) to maintain a high level of precision while not sacrificing its ability to recognize positive samples.

Estimation of urban extent and the land allocated to streets(LAS)
After applying Urban-Built-up Area(UBA) data filtering to exclude green space samples from non-urban areas, more accurate samples of UGS were obtained.Statistical analysis showed a area of 1,039.81km 2 for the UBA across five in Beijing, accounting for 97% of the total area.. Following the method in section 2.2.2, road classification results were obtained as shown in Figure 4.The spatial extension of each road classification was performed according to Table 3, followed by resampling to the same resolution.The results found that total surface of urban streets to be 168.96km 2 Then, estimating the official recommended metric LAS by eq(5):    tends to produce a blue-colored bias in the fused results when there are large water bodies in the image (Figure 5).As green space samples need to be created based on the fused images, a blue bias can impede the identification of ground objects.
Therefore, the Gram-Sch method is chosen.

AOI data
The U-Net model has been trained to achieve promising results in UGS, which is part of UOS.However, using only the identification results of UGS as UOS is far from sufficient.In this study, OSM data was incorporated to refine the precise location of UGS based on the identified results.This includes segmenting the objects based on the road network and merging objects with similar semantics in spatial terms.
By aggregating the UGS identification results and utilizing urban roads to segment the UGS, the study have selected green spaces larger than 400m 2 as urban green space areas.The study have also fused this information with OpenStreetMap's AOI data for parks and squares.This study employs a geospatial overlay analysis technique to integrate the results of UGS recognition with AOI data, delineating overlapping areas as UOS.
Furthermore, any unrecognized green open spaces are supplemented with OSM data.

The overall characteristics of UOS service range considering gender and age
In accordance with the approach outlined in Section 2.2.2, the calculation of road speed, traffic probability, and time cost was conducted.Subsequently, the roads were rasterized, with time cost used as the value of each pixel and a resolution of 10m.
The areas outside of the roads were designated as pedestrian areas with a speed of 3.5 km/h.OSM water data was included as 7.1.And it suggests three steps: (1) Estimation of the land allocated to streets(LAS); (2) Estimation of the share of land allocated to * Corresponding author public open spaces; (3) Computation of core indicator: Average share of built-up area of cities that is open space for public use for all (UN-Habitat, 2018).

( 4 )
Multiple sources of Volunteered GeographicInformation(VGI).The study utilized Open Street Map (OSM) road, Aera of interest (AOI), and water body data (Figure1, (d)).In the water information of OSM, there are three types of waterrelated areas: reservoir, riverbank, and water.These data are part of VGI and are characterized by its fast updates and diverse data ISPRS Annals of the Photogrammetry, Remote Sensing and Spatial Information Sciences, Volume X-1/W1-2023 ISPRS Geospatial Week 2023, 2-7 September 2023, Cairo, Egypt types.However, data gaps and accuracy issues are also common.After undergoing data quality checks and consistency checks, OSM data can be used to supplement urban open space ranges and support the calculation of SDG 11.7.1 indicator.Additionally, The study utilized a sample set created from Google imagery to enhance training of the U-Net model(Shi et al., 2023).

Figure 2 .
Figure 2. A framework of mapping UGS and supporting SDG 11.7.1 assessment derived multiple-source geo-spatial data

Figure 3 .
Figure 3. U-Net architecture (Ronneberger, 2015) During the image segmentation process, it is necessary to segment objects of the same class that are in contact with each other.To better segment the boundaries of each object, weights need to be assigned to each pixel position during the training process to calculate the weighted loss.The weight of the boundary region is higher, which effectively strengthens the learning of boundary samples.

Firstly, it is
necessary to collect road data with different grades.The OSM road data has a class tag, which includes motorway, motorway link, trunk, trunk link, primary, primary link, secondary, secondary link, tertiary, and tertiary link types, corresponding to Chinese highways, first, second, and third level roads.Based on the design speed of Chinese highways and the actual situation of the research area, the speed for highways and expressways is determined to be 80 km/h, the speed for main roads is 60 km/h, the speed for secondary roads is 45 km/h, and the speed for urban roads is 35 km/h.Due to factors such as road conditions, weather, and traffic volume, the actual operating speed is often calculated by reducing the design speed and setting a reduction coefficient of 0.7.In addition, residents' travel modes are often closely related to road length, and different travel distances lead to different probabilities of residents choosing Thirdly, Accessibility analysis takes UOS, OSM water data, and OSM roads as input objects, and through analysis, the time from any location in the city to the nearest urban open space can be obtained.This involves analyzing the scale of UGS and squares and identifying service areas for developing green/grey open spaces (Pafi et al., 2016).Accessibility analysis typically includes two methods: the establishment of an Origin-Destination (OD) cost matrix for network analysis and gridded cost distance.The study use the gridded cost way to evaluate the accessibility of UOS within a geographic area for different population groups.
andUGS-1m(Shi et al., 2023).It is widely utilized to analyze and quantify the preditive accuracy and reliability of models in the field of image recognition.Confusion Matrix Where TP represents the number of instances that are actually predicted as positive, while FN indicates the instances Distancenegative.Conversely, FP refers to instances incorrectly classified as positive, and TN represents instances correctly classified as negative.These metrics facilitate the computation of critical evaluation measures, including accuracy, precision, recall, and F1 score.These measures are essential in assessing the performance and effectiveness of classification models in the context of image recognition tasks.

A
high accuracy represents that the model has a high percentage of correctly classified samples.A higher precision means that the model correctly identifies most of the actual positive cases, minimizing the chances of misdiagnosis and a higher recall rate means that the model is able to capture more true positive samples, reducing the risk of missing relevant positive samples (false negatives) by incorrectly predicting them as negative.A high F1 score means that the model is able 5)LAS refer to land allocated to streets, and The result is 16.2%.Given that the study area is part of the city, road buffer zones were appropriately set at the boundary to improve connectivity between adjacent regions and obtain more accurate accessibility calculation results.

Figure 4 .
Figure 4. Road classification results and road density map

3. 2
Accurate sample sets of UGS using high-resolution imagesThe high-resolution satellite imagery from Gaofen-2 conducted orthorectification and geometric correction before the multispectral and panchromatic images are fused.Currently, there are two popular fusion methods, the Gram-Schmidt fusion and the NNDiffuse fusion, which both preserve color, texture, and spectral information well.However, the NNDiffuse method

Figure 5 .
Figure 5.Comparison of the effects of different fusion methods before and after the preprocessing of GF-2 image The fused Gaofen-2 imagery is segmented into 5120*5120 sized images, of which 48 are uniformly selected for manual visual interpretation based on the "Urban Green Space Classification Standards", with samples labeled for park green space (Figure 6, (a)) and affiliated green space (Figure 6, (b)).Both categories of urban green space in Beijing are accurately marked to improve the effectiveness of deep learning models.These 48 images can generate 768 annotated images, which can be augmented to create 3072 training samples.The model was trained for 100 epochs, with the final loss value reaching 0.0024 (Figure 7).Specifically, the recognition performance of green spaces in parks is superior to that of affiliated green spaces, as evidenced by the clearer external contours of green spaces in parks and the ability to discern roads and water bodies.However, there is a problem of missing segmentation, where small holes may appear in the middle of green spaces(Figure 8, (a)).Moreover, affiliated green spaces are susceptible to misidentification, where residential buildings may be mistakenly identified as green spaces(Figure 8, (b)).

Figure 9 .
Figure 9. Service coverage analysis map and Cumulative service number chart ISPRS Annals of the Photogrammetry, Remote Sensing and Spatial Information Sciences, Volume X-1/W1-2023 ISPRS Geospatial Week 2023, 2-7 September 2023, Cairo, Egypt imagery, can help acquire more detailed urban information.Sub- observation data, new applications of computer vision have been opened up, including but not limited to change detection, longterm monitoring, and image segmentation.The U-Net model is widely used for land-cover recognition due to its small data requirements, quick training process, and high accuracy in image segmentation.As a segmentation network algorithm evolved from fully convolutional network, U-Net neural network considers both global and detailed information of the image.

Table 1 .
Comparing Driving Probability on Roads

Table 3 .
The road widths used in this study

Table 4 .
Confusion Matrix Comparison: our method vs UGS-