AUTOMATIC NON-RESIDENTIAL BUILT-UP MAPPING OVER NATIONAL EXTENTS WITH A SENTINEL-2 IMAGE SEGMENTATION MODEL TRAINED WITH ANCILLARY CENSUS DATA

: Information regarding the residential status of the built-area is used within several contexts such as disaster management, urban and regional planning, among others. Currently such non-residential built-up information can be extracted for most of Europe from Land Use/Land Cover maps such as CORINE Land Cover (CLC) and Urban Atlas (UA) by harmonizing the class nomenclature into a residential/non-residential nomenclature. However, these have update cycles of several years given their usually costly and lengthy production, which also relies on visual interpretation of ancillary datasets. Given these limitations many methods have been proposed to increase the thematic detail of the built-up environment. More recently, these methods often rely on ancillary datasets such as, e.g., social media and mobile phone networks metadata, which may not be readily available in many areas. In this paper we propose a framework to map non-residential built-up areas by training an image segmentation model with national census information and Sentinel-2 imagery. The non-residential map coming from the segmentation model was compared with public pan-European maps and both of their quality assessed against UA 2018. The results show that using census data to automatically generate training data for a Sentinel-2 image segmentation model of non-residential built-up improves the mapping of non-residential areas when compared with the existing datasets available for most of Europe.


INTRODUCTION
Knowledge regarding the location of non-residential buildings is used in several fields such as disaster management (Freire, 2010), energy related studies (D'Agostino et al., 2017a), sustainability (Liu et al., 2017) and urban planning (Nadal et al., 2017).Such non-residential buildings are estimated to be 25% of the European building stock (D'Agostino et al., 2017b).
At a pan-European scale, the location of non-residential buildings can be extracted from publicly available Land Use/Land Cover (LULC) maps, such as CORINE Land Cover (CLC).Urban Atlas (UA), while having European coverage is only produced for European cities with more than 100,000 inhabitants.Both are currently part of the European programme Copernicus.CLC maps 44 classes for most of the European countries, with a minimum mapping unit (MMU) of 25 ha and minimum length of linear features of 100 m.The UA, on the other hand, has much more thematic detail and is focused on land use; where the MMU is 0.25 ha.The last two main versions of the datasets correspond to the 2012 and 2018 reference years.While other LULC datasets exist, even at a global scale, such as ESA WorldCover (Zanaga, Daniele et al., 2021) and ESRI LULC (Karra et al., 2021) , these are focused on the land cover component where land use information is often reserved for cropland classes.National mapping agencies have also been providing such land use datasets.However, these have no standardization given different interests and objectives by each of the map producers, which deliver LULC products with, e.g., different update cycles, thematic detail and spatial resolution.*

Corresponding author
Given the lengthy update cycle of these products, its focus on large metropolitan areas (e.g., UA), limited geographical extent (e.g., UA and CLC only have a European extent) or with coarse resolution (e.g., CLC), several authors proposed methods to derive land use information.Such land use information can then be harmonized to identify the location of non-residential buildings.
Earlier studies focusing on the mapping of land use, which could then be harmonized into residential status of buildings, were mainly focused in feature extraction, selection and classification using remote sensing imagery (Gong et al., 1992;Xu et al., 2003).Ancillary information such as municipalities building permits and census data were also starting to be used for the mapping of such land use component (Mesev, 1998) and to assess the relationship between land use and remote sensing imagery (Volker C. Radeloff, Alice E. Hagen, 2000).Advancements in geographical information systems and computation capabilities enabled the use of more sets of ancillary information such as airborne laser scanning (Aubrecht et al., 2009).More recently, ancillary datasets related with points of interest (Zhang et al., 2017), taxi trajectory data (Wang et al., 2018), mobile phone networks metadata (Pei et al., 2014, p. 201), street view images (Li et al., 2017) and social media data (Du et al., 2020;Fonte et al., 2018) have been used.Most of the studies make use of very high resolution commercial (VHR) satellite imagery, however there are also studies which considered the Sentinel constellation in the mapping of such land use information (Tu et al., 2021;Zong et al., 2020).For example, in Tu et al. (2020) the authors generate training data for several land use classes by visual inspecting street view imagery and other publicly available datasets to train a random forest classifier using Sentinel 1 and 2 data.Chen et al. (2021) used several sets of ancillary data such as OpenStreetMap (OSM) and population data to generate training data.The data was manually labelled into the several target land use classes.Huang et al. (2021), determined the population of a given image patch combining Sentinel-1 and 2 imagery within a deep learning approach.Overall, these show that the decametre resolution of Sentinel-2 data can still contain relevant feature information regarding land use and specifically built-up land use.
Current literature uses several sets of ancillary data to map urban land use, which can then be used to assess the use of the built-up areas.These studies are usually restricted to metropolitan areas where data, such as the one coming from social media platforms or mobile phone service providers, is more abundant.Moreover, these data may not be readily available for other regions given contractual obligation or cost of the data, when it exists.Hence, with reduced chances of being transferable to other cities or regions.Another limitation is the manual labelling of the data, which is both a costly and lengthy procedure, even though it is still a common procedure to gather training data for supervised classification methods.
Census data often performed at a national scale and involving lengthy and costly ground campaigns have long been used as one set of ancillary data alongside remote sensing data, mainly due to its quality and also often national extent.For example, census data were used within a supervised image classification approach to predict the population (Chen, 2002;Volker C. Radeloff, Alice E. Hagen, 2000) or land use (Chen et al., 2021;Gounaridis and Koukoulas, 2016;Rocha and Tenedorio, 2001) in a given satellite image.Census data has also been used as ancillary data to, for example, measure quality of life by combining census and remote sensing data (Li and Weng, 2007).
In this study we will automatically generate training data by using national census data and Sentinel-2 imagery to train an image segmentation model, which is able to map non-residential builtup areas over national extents.The proposed approach produced better results when compared with existing land use datasets available for most European countries such as CLC 2018 (CLC18), having the UA 2018 (UA18) as ground truth.Hence, showing that feature information regarding non-residential buildings may be harnessed from Sentinel-2 images given a set of quality training data automatically extracted from census data.In this way, avoiding lengthy and costly manual training data labelling.Such approach can be tested with any type of census of population data for other regions of the world given the global coverage of Sentinel-2 data.

STUDY AREA AND DATA
The study area is located in Portugal, Iberian Peninsula, Europe.It corresponds to the area contained in the red square shown in Figure 1.The blue square indicates where the approach will be assessed for its quality.

Datasets
The used datasets are compiled in Table 1, where, for each dataset, the type, producer and role in the experiments is indicated.These comprise the set of Sentinel-2 images covering the study area, the ancillary datasets, such as the census data, to generate the training data, the publicly available datasets used as baselines for the proposed approach and the dataset considered as ground truth, UA18.In this context, the baselines will serve to assess how the proposed approach compares with currently available public information regarding land use, especially when it comes to built-up areas.Table 1 Datasets considered in this study, type of data, producer and its role in the experiments.
While the residential use of an area can be extracted from the census, there is no information regarding the localization of buildings given that the census data is either related with administrative boundaries or a predefined grid.The built-up information was extracted from the 'Áreas Edificadas 2018' (AE18).This vector dataset is provided by "Direção Geral do Território"(DGT) (the Portuguese National Mapping Agency) and aims at identifying the areas which have buildings.AE18 is depicted in pink in Figure 2.This was used as built-up surface for the generation of both the residential and non-residential classes, as explained further in section 3.
Sentinel-2 images were extracted for the entire study area, which matches the extent of 4 Sentinel-2 tiles (tiles T29TNE, T29TNF, T29TPE and T29TPF) achieving a total of 12056,04  2 .The date chosen was August 17 th 2021, mostly due to the clear sky for all the tiles and matching year of reference of the census data, described below.All the 10 m spatial resolution bands from the Sentinel-2 images (bands b2, b3, b4 and b5) regarding the 4 tiles were used in the experiments.The Portuguese boundaries were used to crop the Sentinel-2 tiles and also the remainder datasets.The census data coming from the national institute of statistics "Instituto Nacional de Estatística" were also extracted for the study area.These data contain vector information in a GIS format (geopackage) and provide several types of demographic information such as population counts, number of residential buildings, different classes for buildings and household sizes, among others.Two files are available: the "Base Geográfica de Referenciação da Informação" (BGRI), where information is available at the scale of the smallest Portuguese administrative unit; and GRID1K, which contains the same information but on a 1 km grid over continental Portugal.BGRI is detailed within areas with more population density, i.e., urban areas.Figure 2 shows in yellow, areas which were considered as non-residential (BGRI and GRID1k) since these have no residential buildings.
The proposed method will be compared with two baselines coming from CLC18 and WorldPop, in the latter case the 2020 version (WPOP20) (Bondarenko et al., 2020).WorldPop is making publicly available at a global scale subnational demographic information population distribution and characteristics.For example, it provides population counts up to 100 m spatial resolution.WorldPop is mostly based on the disaggregation of population counts from census data, using several sets of publicly available ancillary data within a machine learning approach (Stevens et al., 2015).Hence, it will be used to generate a non-residential land use surface where the population count is zero.
Regarding the CLC18 dataset, the objective was to extract the land use information of artificial surfaces.Hence, only the level 1 artificial surfaces class was used.From these classes 111 and 112, continuous and discontinuous urban fabric, respectively, were considered in the residential surface while the remaining CLC18 level 1 classes were considered in the non-residential surface.
UA18 is a land use focused map with 17 main LULC classes and is produced for every European city with more than 100,000 inhabitants.UA is produced using both image classification and visual interpretation of very high-resolution satellite imagery combined with functional information coming from ancillary data sources, either from publicly available sources or, e.g., from municipal plans.Table 2 presents the nomenclature harmonization into residential status of the UA18.Hence, the UA18 will be used as ground truth in this study.Overall, the procedure is based on the assumption that built-up areas without residential buildings only contain non-residential buildings.Hence, with this high-quality information, even if scarce, we expect that the image segmentation model is able to learn from these data and identify relevant features which are able to distinguish residential from non-residential building blocks.

Code Urban Atlas class CC
The number of residential buildings is indicated for each cell of each of the census datasets, BGRI and GRID1K.Combining this with the built-up data AE18, we could derive both the nonresidential and residential built-up areas.The main objective was to harness the feature information inside the cells where no residential buildings exist but are marked as built-up by the built data AE18 used in the study.In the following subsection a more detailed explanation regarding the generation of the training data is made, followed by specifics on the image segmentation model and its hyperparameters and training details.

Training data generation
As shown in Figure 3 the first step to generate the training data was to select the cells from both the BGRI and GRID1K datasets that indicated zero residential buildings.The BGRI and GRID1K were then merged through union deriving the non-residential census surface.The resulting vector data was then intersected with the AE18 built-up dataset, deriving the non-residential builtup surface (step 2 in Figure 3).This was the training data for the non-residential class, given that these were built up areas that we knew beforehand that did not have any residential buildings (step 1-2 in Figure 3).The residential surface was the remainder part of the built-up that was not considered as non-residential (step 3 in Figure 3).However, given that the boundaries of the census could include a given building block in two different cells due to the administrative nature (or grid) of the census data (Figure 4 -Top, red circle); a buffer around the non-residential built-up areas of 500 m was applied and annotated as "no class" (step 4 in Figure 3) (Figure 4 -Bottom).In this way we minimized the inclusion of non-residential areas in the residential class (Figure 4 -red circle), which would then impact the recognition capabilities of the image segmentation model to detect non-residential land use.

Image segmentation
The image segmentation model considered 4 classes: no class, non-residential, residential, other land cover.The inclusion of the no class resided in the fact that the objective was to indicate to the model to not consider these regions.This was performed by including such class and by considering a weight for it of 0. This weight was then applied by scaling the result of the loss function.This is often used to address class imbalance problems (Buda et al., 2018).The remainder of the classes, the weight was the log  of the inverse frequency of the presence (number of pixels) of the class in the whole dataset (i.e., the four Sentinel tiles).Equation 1illustrates how the weight of class a is computed, from a set of z classes.Table 3 shows the total number of pixels and final weights for each class (minimum class weight of of the final data coming from section 3.1. (1)   An adaptation of the Unet (Ronneberger et al., 2015) like network was used in this study as the segmentation model.Such network has been widely used in image recognition studies.In short, the network is composed of two main components, a contracting and an expanding part.The contracting part has a similar architecture when compared with traditional convolutional neural networks (i.e., with decreasing size of the feature map).Whereas the expanding part is replacing pooling layers with up sampling ones.These two components are combined to improve the localization capabilities of the network (Ronneberger et al., 2015).Instead of the original convolutions used, these were replaced by densely convolution sets (Huang et al., 2017).In the end, a network similar to the one presented in Guan et al. (2020) was used.Such type of models have been used and tested at length for several applications and studies.First, a cross validation approach was used to define the hyperparameters such as batch size (8), learning rate (10 -1 ), optimizer (stochastic gradient descent), training stop criteria and also assess overfitting effects.With the hyperparameters set, the whole data was used and trained until the model did not decrease the training loss for 10 epochs.The model was trained considering the four Sentinel-2 patches and was tested in the T29TNE patch, namely for the city of Coimbra, for which UA18 is available to assess the quality of the model in detecting the UA18 non-residential polygons.

Quality assessment
The quality assessment was mainly based on the capability of the results and baselines in detecting UA18 non-residential polygons in Coimbra metropolitan area.Given that the focus of this study was on the identification of built-up non-residential areas (class 1 in Table 2), these were extracted from the several datasets while class 2, residential class, was considered as the remaining of the regions in AE18 that was not present in class 1.In this way, we are only evaluating the capability of the model or baselines regarding the identification of non-residential land use and not on its capabilities of identifying built-up areas.
Several details of the resulting map from the segmentation are presented and accuracy metrics (f1-score, recall and precision) computed, regarding the capability of the detection of the UA18 non-residential polygons for Coimbra.Additionally, the percentage of both residential and non-residential UA18 polygons detected with each of the maps will also be analysed.A given polygon was considered non-residential if it contained any non-residential pixel.

RESULTS
Figure 5 shows a visual depiction of the results by comparing the map resulting from the application of the trained segmentation model to T29TNE Sentinel tile (NONRESM), the original census surface, which was part of the training generation procedure (without the no class regions) (CENSUS), and the CLC18 map coming from the nomenclature harmonization of CLC18.

Visual assessment
Both the results from the model and the CLC18 map show a significant larger non-residential area than the original census surface.Looking at the areas indicated by red ellipses, these are mostly industrial areas which are detected by the NONRESM but not in all the other maps.Only the CENSUS is indicating a nonresidential area only composed by industrial buildings (region C in Figure 5); even if only locating a smaller portion of the actual industrial area composed by regions highlighted as details A, B, C, D and E. It is visible the detail of the NONRESM when compared with the other maps.Nonetheless, there are also visible salt and pepper effects, especially in the boundaries of the nonresidential areas in the NONRESM map.

Accuracy metrics
The accuracy metrics were computed by testing the capability of both the resulting map from the image segmentation model and the two baselines (CLC18 and WPOP18) in correctly identifying non-residential polygons within UA18.
Table 4 presents the F1-score, precision and recall regarding the ability of the proposed model in identifying the UA18 polygons related with non-residential built-up.Hence, a true positive would indicate that a non-residential polygon in UA18 contained at least one non-residential pixel.From Table 4 it stands out that the F1-score of the non-residential map coming from the image segmentation model is much higher when compared to the baselines.The precision is in this case higher than the recall by 0.24 (0.57 compared to 0.33).The worst was the original census surface, which despite having 0.2 precision, presents a recall below 0.05.Hence, the census original non-residential surface, while detecting a few non-residential UA18 polygons, left out most of them.CLC is clearly overestimating the non-residential areas, given that while it has high recall (0.58) the precision is only of 0.07.WorldPop has overall lower precision and recall but more balanced values between both while only achieving a F1score of 0.13.

DISCUSSION
The results clearly show the benefit of using census data and Sentinel-2 data to improve the detection of non-residential builtup areas over national extents.This is shown both by the metrics regarding the detection of the non-residential UA18 polygons, but also from the visual assessment of the results.The experiments show the increase in the quality of the detection of non-residential areas when compared with existing data, either CLC, original census or WorldPop.
Nonetheless, the approach proposed in this paper only achieved a F1-score of 0.42 in the detection of non-residential built-up in UA18.While several buildings such as industrial areas, schools, hospitals and so on might have distinctive feature information in Sentinel-2 images; others may have no distinguishable features from nadir viewing imagery or these cannot be captured with the decametre resolution of Sentinel-2.The improvement, when compared with UA18 is that with this approach the mapping of non-residential areas can be performed for the whole national extent, while UA18 is only available for metropolitan areas with more than 100,000 inhabitants.Moreover, while overall the detection score is low within urban-areas it can still guide operators in charge of producing detailed land use information with an extra set of ancillary data.
The inclusion of a third class, which is not considered in the training of the image segmentation model to map non-residential built-up by attributing to that class a weight of zero is not optimal.This is due to the fact that excluding these pixels from the training not only possibly removes valuable feature information while at the same time it may interfere in the computation of other relevant features due to the gaps in the data.However, in this way we were able to eliminate, in a conservative way, several inconsistencies of the training data, especially when removing non-residential areas from the residential class.This happened due to due to the administrative nature of the census data, in which built-up with the same use may be in different census cells.
While only the census data were used, this approach could incorporate other ancillary sets of data.The training mask generated mainly from the census data could be enriched with land use information coming from, for example, OpenStreetMap data.In this case the built-up could be extracted with the focus on the residential status and be used to complement the training data (Fonte et al., 2020) produced with the census data.
The target class of this study, non-residential built-up use, is the actual use of the built-up and not the detection of the built-up areas.Hence, while we could have built a binary image classification model to detect only the non-residential areas, this was found not to perform as well as when considering the problem as a multi-class problem.This was most likely due to the very small amount of data regarding class 1non-residential.Only by introducing weights to each of the classes was possible to successfully train the model.To this regard, other approaches may be tested to deal with the class imbalance problem (Buda et al., 2018).This method can be replicated in regions which have census data, given the global extent of the Sentinel-2 constellation.Otherwise, other sets of data may be explored to gather population data (such as WorldPop), which then may be combined with the built-up class of, for example, the ESA WorldCover (Zanaga, Daniele et al., 2021) or ESRI LULC (Karra et al., 2021) products.However, this needs to be further studied given that these datasets differ from the ones used in this experiment.The WorldPop population counts is based on ancillary datasets, hence its quality is variable.However, likewise the LULC maps indicated before, it is available worldwide and its combination capabilities in the detection of non-residential areas could be assessed.

CONCLUSIONS
In this paper we propose an automated method to map nonresidential built-up areas with official national census data and Sentinel-2 images within a supervised image segmentation approach.The census data allowed to identify several nonresidential areas, which were then used as training for the image segmentation model to learn the relevant features to map such non-residential areas.
The results show that the proposed approach performed better than existing datasets which incorporate built-up land use information in their products, such as CLC18, WorldPop or the original census data (which was used as training), when comparing with the UA18 for the city of Coimbra.However, overall, the results were still low where the map coming from the proposed approach only achieved 42% F1-score in the experiments when it comes to the detection of non-residential built-up UA18 polygons.To this regard, the production of land use focused products such as the UA18 often rely on lengthy and costly visual inspection of ancillary datasets to determine the land use of a given neighbourhood.Hence, to have a method which makes use of already available data and improves on existing datasets can be of use.Methods such as the one proposed in this paper can also be used as another set of ancillary data, aiding a visual interpreter that needs to identify such areas manually.agriculture.Methodology for assessing rooftop greenhouse potential of non-residential areas using airborne sensors.Science of The Total Environment 601-602, 493-507.https://doi.org/10.1016/j.scitotenv.2017.03.214

Figure 1
Figure 1 Study area location and overview of testing area (blue) and city of Coimbra (dashed).Dataset Type Producer Role in experiments CLC18 LULC EEA Baseline WPOP18 Population counts WorldPop Baseline BGRI Census DGT Baseline / training GRID1K Census DGT Baseline / training Sentinel-2 Satellite imagery Imagery for training UA18 LULC EEA Ground truth AE18 Built-up areas DGT Training/baseline/ground truth

Figure 2
Figure 2 Example of the census (BGRI and GRID1K), where only cells with zero residential buildings were selected (in blue, several buildings of the University of Coimbra), overlayed with the Áreas Edificadas (AE18, pink) -the Portuguese dataset regarding areas with buildings.Orthophotos in the background from Direção Geral do Território, 2018.

11100Table 2
Continuous urban fabric (S.L. > 80%) 2 11210 Disc.Dense urban fabric (S.L. 50%-80%) 2 11220 Disc.Medium Density urban fabric (S.L. 30%-50%) 2 11230 Discontinuous Low Density urban fabric (S.L. 10%-30%) 2 11240 Discontinous very low density urban fabric (S.L. <10%) 2 11300 Isolated Structures (should be residential) 2 12100 Industrial, commercial, public, military and private units 1 12210 Fast transit roads and associated land 1 12220 Other roads and associated land 1 ... ... 1 50000 Railways and associated land 1 Table with the Urban Atlas (UA) nomenclature and how it was mapped in the classes considered (CC) this study (1 -non-residential and 2 -residential).All other classes were considered non-residential.(S.L.average degree of soil sealing) 3. METHOD The method comprises: 1) the automated processing of the census data to derive the training data; and 2) how the data was considered within an image segmentation model based on convolutional neural networks to map non-residential land use.The last sub-section focuses on the quality assessment by comparing the result coming from the image segmentation model and the other baseline experiments with the UA18 dataset for the city of Coimbra.

Figure 3
Figure 3 Flowchart showing the four main steps for the generation of the training mask for the non-residential image segmentation model

Figure 4
Figure 4Top: Census cells with no residential buildings (BGRI,GRID1K) over built-up: purple areas.Middle: Orthophotos 2021, Bottom: Census cells with no residential buildings (in purple) with a 500m buffer (grey) around built-up areas.The red circle indicates an industrial area which would be considered as residential if the buffer was not considered.

Figure 5
Figure 5 NONRESM: Detail of the results obtained with the proposed approach.CENSUS: Population census (used as training data); CLC18: CLC18 harmonized nomenclature residential/non-residential.Bottom: Orthophoto of the area.A-D are markers to better compare the differences between maps.

Table 3
Pixel counts per class and the class weights considered to train the image segmentation model

Table 5
indicates, for each of the harmonized UA18 classes, the percentage that were classified as non-residential by each of the non-residential sets of information.Hence, while higher values for the non-residential class means a given experiment is performing well; the contrary is true for the residential class where such higher values indicate that the experiment is wrongly considering many residential pixels as non-residential.The values in the table indicate that the result from the model is able to identify a third of the non-residential UA18 polygons while at the same time maintain a lower value for the residential polygons.

Table 4 -
Precision, recall and F1-score (0-1) when considering the detection of the non-residential UA18 polygons (normalized nomenclature into residential and non-residential areas).Higher values of each of the metrics are highlighted in bold.

Table 5 -
Percentage of polygons considered as non-residential for each of the UA18 harmonized classes.For each of the classes the best value, i.e., high percentage in the nonresidential and low percentage in other classes, is in bold.