A Shallow Neural Network Model for Urban Land Cover Classification Using VHR Satellite Image Features

: Recently, image classification techniques using neural networks have received considerable attention in sustainable urban development, since their applications have an extreme effect on building distribution, infrastructural networks, and water resource management. In this research, a back-propagation shallow neural network model is presented for very high resolution satellite image classification in urban environments. Workflow procedures consider selecting and collecting data, preparing required study areas, extracting distinctive features, and applying the classification process. Visual interpretation is performed to identify observed land cover classes and detect distinctive features in the urban environment. Pre-processing techniques are implemented to present the used images in a more suited form for the classification techniques. A shallow neural network model (supported by MathWorks MATLAB environment) is successfully applied and results are evaluated. The proposed model is tested for classifying both WorldView-2 and WorldView-3 multispectral images with different spatial and spectral characteristics to check the model’s applicability to various kinds of satellite imagery and different study areas. Model outcomes are compared to two well-known classification methods; the Nearest Neighbour object-based method and the Maximum Likelihood pixel-based classifier, to validate and check the model stability. The overall accuracy achieved by the proposed model is 86.25% and 83.25%, while the nearest neighbour approach has obtained 84.50% and 82.75%, and the maximum likelihood classifier has accomplished 82.50% and 80.25% for study area 1 and study area 2 respectively. Obtained results indicate that the developed shallow neural network model achieves a promising accuracy for urban land cover classification in comparison with the standard techniques.


INTRODUCTION
Remote sensing techniques offer cost-effective resources for Land Use and Land Cover (LULC) analysis using Very High Resolution (VHR) satellite images.Urban land cover interpretation is a challenging task due to the scene complexity and wide-range spatial and spectral resolution of available imagery systems.With recent development in the space industry, a variety of data are available for earth observation and sustainable LULC studies.According to the Union of Concerned Scientists (UCS), there were about one thousand Earth Observation (EO) satellites in orbit until 2022 (Geospatial World, 2023).Considerable interest and rapid progress in collecting data using EO satellites have been seen in recent years due to improved technology, relatively lower costs, and high enhancement in satellite image resolutions.Image resolutions have greatly increased; a centimetres-level of spatial resolution, spectral resolution over tens or even hundreds of different bands, radiometric resolution of 16-bit or higher, and temporal resolution with two images a day are now obtainable.The wide expansion of accessible satellite imagery rises a significant challenge to collect, arrange, and analyse data depending on standard soft-wares and manual traditional methodologies.Manual methods are time-consuming and require extensive human effort to finish the classification process.An automatic technique is needed for image classification to maximize the benefits of such available data.The 4 th Industrial Revolution offers more effective solutions for image analysis using Artificial Intelligence (AI), especially Neural Network (NN) tools.Neural networks are considered one of the most powerful techniques for satellite image studies according to their strength of using all available features and outperforming other image analysis methods in urban applications (Mäyrä, J., 2018).Several neural network algorithms have been adopted in remote sensing land cover classification involving deep and shallow neural networks with different structures based on model flexibility and required application.Classification approaches using neural networks are identified depending on the number of layers.NNs include various hidden layers using a network of interconnected units (or neurons) that learn nonlinear connections from input data (Goodfellow, et al., 2016).Deep neural networks consist of multi-convolution, pooling, and activation layers followed by fully connected layers that represent the results.Meanwhile, shallow neural network architecture is limited by a few layers only which is typical for feed-forward neural nets with the backpropagation training algorithm (Ososkov, G., and Goncharov, P., 2017).Although deep learning approaches can efficiently analyse rich spectral information contained in multispectral images to perform an accurate discrimination process, multidimension data poses additional challenges requiring powerful hardware and more time to train the deep network.Moreover, using plenty of features (i.e., spectral bands, indices, geometric, and spatial properties) causes more complexity to the classifier and increases the number of parameters for defining urban classes.The question of "How to choose a suitable network structure?"requires more effort to be answered (Paoletti, M., et al., 2019).In comparison with deep networks, shallow neural networks take the advantage of a simple structure, few numbers of layers, and consume less time and power for the training and classification process (Gorokhovatskyi, O., and Peredrii, O., 2018).Simplicity, flexibility, fast training, and using less computational resources and memory make shallow neural networks an effective choice for VHR image classification (Lei, F., et al., 2019).

RESEARCH PROBLEM
Urban environment represents one of the most complex study areas for remote sensing studies due to the wide-ranging spatial and spectral heterogeneity of surface materials.Also, spectral similarities between different land cover classes cause considerable confusion for more classification techniques.Urban classes have various similarities in geometry (e.g., roads, bridges, and railway routes), similarities in colour (e.g., buildings, walk lanes, and bare soil), and similarities in texture (e.g., asphalt roads, parking lots, and some building surfaces).Class resemblances make scene analysis more complicated in such urban environments.The recent development of spectral and spatial resolution leads to a significant interest in using VHR satellite imagery for urban feature extraction.VHR satellite images provide up-to-date information regarding the Earth's resources, which is vital for sustainable urban development, agricultural investigation, water resource management, and disaster control (Fawzy, et al., 2020).During the past two decades, hundreds of image classification methods were developed considering pixel and object-based techniques.Despite the increased number of classification approaches for land-use analysis, the accuracy and efficiency of these methods are still insufficient to fulfil the requirements of real-world applications such as sustainable urban planning and land management (Huang, B., et al., 2018).Image classification meets more challenges due to spectral similarities of land cover classes that pixel-based techniques fail to deal with.Object-Based Image Analysis (OBIA) techniques are effective for land use and land cover analysis.OBIA overcomes the problem of spectral similarities in pixels by dividing the satellite image into more appropriate segments according to both spectral and spatial characteristics.However, the growing number of LULC classes with various features makes OBIA a time-consuming method, requiring more human effort, in addition to utilizing a large amount of computer memory and extra computation power to finish the process (Fawzy, et al., 2022).Furthermore, OBIA requires rearranging parameters, functions, and/or algorithms for each different image and different study area, which limits the generalizability and transferability of the strategy.Essentially, an automated strategy is required to accurately identify the diversity of urban classes using VHR images.Neural network-based approaches offer multiple solutions for image classification drawbacks (Längkvist, M., et al., 2016).The key feature of NN-based algorithms is that NNs do not require prior feature extraction for each study area, thus increasing the generalization capabilities (Sertel, E., et al., 2022).Neural networks are capable of collecting, arranging, and analysing features to train, validate, and test models for classifying urban areas.In this article, shallow neural networks are investigated for VHR satellite image classification.A backpropagation shallow neural network model is developed, applied, and evaluated with the objective of maximizing image classification results in urban study areas.

OUTLINE OF OBJECTIVES
The main objective of this research is to study the effectiveness and competitiveness of shallow neural networks for satellite image classification in urban environments.A proposed methodology is introduced using neural networks to overcome the limitations of traditional classification strategies and maximize extracted urban land cover information from VHR satellite images.Also, the following procedures have been considered:  Developing, applying, and evaluating a shallow neural network model for urban land cover classification using VHR multispectral satellite images. Testing the efficiency of neural networks for image classification in terms of image resolution, type and degree of planning for the selected study areas, and the required level of accuracy for each application. Applying the suggested model to different case studies for land cover classification in the urban environment.

STATE-OF-THE-ART
Remote sensing applications using VHR satellite images have received widespread attention in urban development due to their notable effects on building allocation, infrastructural systems, and water resource utilization.Several approaches have applied NNs for urban land cover analysis depending on VHR satellite images.Neural network-based techniques are considered a highly promising tool for remote sensing image analysis which produce considerable results in the last few years (Martins, V. S., et al., 2020).Recently, remote sensing experts have paid great attention to neural networks to achieve significant success at many image classification tasks such as feature extraction and LULC analysis (Cheng, Gong, et al., 2018).Ma, Lei, et al., (2019) introduced the major neural network concepts pertinent to remote sensing reviewing 200 published papers during the previous two years covering nearly every application of remote sensing from pre-processing to mapping.Neupane, B., et al., (2021) focused on urban remote sensing and performed a metaanalysis for papers related to current research problems, data sources, data pre-processing, and training details for various architectures.Analysis showed that neural networks outperform traditional methods in terms of accuracy, and address several faced challenges.Nguyen et al., (2013)

METHODOLOGY
The indicated methodology applies a back-propagation shallow neural network model to deal with multiple features using VHR multispectral satellite images for land cover classification in urban environments.To meet the research objectives, the following procedures are presented (Figure 1), and the results are evaluated:  Collecting the required data and the used satellite images for the selected study areas.

Data Used
WorldView series provides high resolution Earth observation imagery for environment monitoring.WorldView-2 offers a panchromatic band of 0.50m spatial resolution and eight multispectral bands of 2.00m spatial resolution: blue, green, red, and first near-infrared, in addition to coastal blue, yellow, red-edge, and a second near-infrared bands (Figure 2).WorldView-3 supplies a panchromatic band of 0.30m spatial resolution and sixteen multispectral bands of 1.24m spatial resolution.As a first multi-payload, super-spectral, WorldView-3 comprises the same eight multispectral bands as WorldView-2, in addition to eight short waves infra-red bands (Figure 3).

Image Pre-Processing
Image pre-processing is essential to improve the analysis process and maximize the extracted information from VHR satellite images.Two main tasks of image pre-processing are presented for used images: data fusion and shadow correction.

Data Fusion
A data fusion process is required to inject high spatial resolution from the panchromatic band into multispectral resolution bands to get a high spatial and multispectral quality pan-sharpened image.Panchromatic images contain a high spatial resolution which is advantageous for geometric studies and feature extraction.Whereas, multispectral images present various spectral resolutions that are required for spectral land use and land cover studies.A variety of techniques for image fusion have been presented in the state-of-art including Principal Component Analysis (PCA), Brovey Transform (BT), and Intensity Hue Saturation (IHS) transformation (Pohl and Genderen, 2016;Ehlers et al., 2010).PCA is applied to used satellite images as all input bands are pan-sharpened and fitted for the feature extraction process (Fawzy, 2020).

Shadow Correction
Shadows pose a major challenge for VHR satellite images and affect information quality.However, shadow area has weak reflectance, it still provides valuable information to apply shadow restoration.Shadow correction process is presented to compensate the brightness difference between shadow and nonshadow areas within two main steps: shadow detection and shadow compensation.Shadow detection is applied using a set of indices depending on various colour models to detect shadows in the presence of dark objects, while shadow compensation aims to recompense brightness differences between shadow and non-shadow areas in VHR satellite images.Study areas, of the presented paper, involve different shadow and dark water regions.A distinct issue is a misclassification between shadow and water classes.The Optimized Shadow Index (OSI) proposed by Mostafa and Abdelhafiz, (2017) is considered to differentiate between shadow and water pixels.Additionally, the Linear Correlation Correction (LCC) method introduced by Sarabandi et al., ( 2004) is applied for shadow compensation to provide meaningful information despite the weak signals in shadow areas (Fawzy, et. al, 2020).

Training Sample Testing a) Intensity Distribution
Signature of each group of samples, for one class, requires viewing the intensity count histograms.It is significant to determine if the brightness histogram has a unimodal distribution for one type of land cover e.g., water, vegetation, and road samples, or multimodal for samples with heterogeneous materials e.g., building and bare soil (Figure 5).A distribution with more than one mode represents samples of more than one distinct class that need to be refined.

b) Separability of Classes
Separate and sufficient samples enable the classifier to distinguish various class signatures.Class separability could be measured and evaluated using multiple equations, for separability distance measurement, available in many remote sensing applications (e.g., Jefferies-Matusita in Erdas software with values from 0 to 1414).High separability refers to good representative samples with fewer spectral similarities.Meanwhile, low separability represents more confusion between samples due to spectral similarities and leads to a high number of misclassified areas where samples should be improved.Signature separability values for land cover classes of study area 1 are shown in Table 1.The highest separability can be seen between both water, vegetation, and other classes, while little fewer values exist between building-road and building-bare soil classes due to wide heterogeneous materials of building and spectral overlap with road and bare soil.

c) Homogeneity of Class
Training samples are essential to be tested according to the range of mean and standard deviation to determine the homogeneity or heterogeneity of each class.For a set of samples, a close range of mean values refers to one class, while a wide range refers to overlapped classes.Low standard deviation refers to high homogeneous samples (e.g., water, vegetation, and road), while high standard deviation indicates more heterogeneity of samples (e.g., building and bare soil classes).Table 2. Sample statistical properties of study area 1.

Training Samples Refining
Image processing soft-wares offer many tools for testing samples.Non-representing samples require refining, and incorrect ones should be replaced.Sample refinement focus on the purest samples of all classes.As a result, the classification process could achieve optimal results.

Feature Extraction
Neural networks have attracted a lot of interest due to their flexibility in processing all features and classifying various kinds of satellite images.Shallow learning requires manually chosen features and meaningful amounts of labelled data (Sainos-Vizuett, M., and Lopez-Nava, I. H., 2021).Image classification using neural networks depends on the prior feature extraction process to feed the model for the classification process.Selected samples are arranged to extract the required features for the shallow neural network.Each pixel represents one input with all values for spectral bands.In addition to spectral band values, a set of spectral indices are calculated and involved in the model inputs (Figure 6).Indices are important to detect classes using the differences between spectral reflectance.The used indices are calculated depending on WorldView-2 and WorldView-3 spectral bands, e.g., the Normalized Difference Water Index (NDWI) (Gao, 1996), Normalized Difference Vegetation Index (NDVI) (Knipling, 1970), World View Soil Index (WV-SI), WorldView Building Index (WV-BI) (Wolf, 2012), and Road Extraction Index (REI) (Shahi et al., 2015).Feature extraction is the first step in developing an effective shallow neural network model.

Shallow Neural Network Model
A shallow neural network requires manually selected samples and meaningful labelled data.The proposed model is built depending on a set of features and indices extracted from multispectral satellite images.The presented procedures (Figure 7) start with selecting, testing, and refining samples from the used satellite image.Selected samples are arranged as a matrix of required features focusing on the spectral value of all bands and spectral indices.Finally, a shallow neural network is applied depending on the input features and input targets to classify multispectral satellite images.To improve classification accuracy, the parameters of used shallow neural network model should be identified carefully (Lei, F., et al., 2019).The number of layers and neurons is changeable considering the required classification level for a particular application and the time of the classification process.The used model focuses on optimizing input features and the number of layers and neurons on the network to maximize classification results.After several trials, a model consisting of two hidden layers with one hundred neurons for each one is chosen, which balance between the classification accuracy and the required time for training and classifying processes.The suggested model is applied successfully in two different study areas and results are evaluated.

Standard Classification Methods
Shallow neural network results are evaluated compared to traditional classification methods to check the stability and efficiency of the proposed approach.For this comparison, Nearest-Neighbour and Maximum-Likelihood are selected as most promising classification methods for object and pixel-based strategies respectively (Fawzy, Et. Al, 2020).Nearest Neighbour is applied at the object level using eCognition software, while Maximum Likelihood is performed at the pixel level using Erdas software.Selected samples of the proposed NN model are used as training areas for both nearest neighbour and maximum likelihood methods.Finally, the accuracy assessment of classified images is implemented using error matrix depending on the same reference points.

Accuracy Assessment
Shallow neural network performance is evaluated on both a qualitative and quantitative level.Visual analysis is applied to evaluate classification outcomes qualitatively by comparing input and classified images to roughly identify the size and location of the errors (Figure 8).While quantitative evaluation is performed considering the correlation between the classification results and reference points.A confusion matrix is derived to the classified images with 400 random points distributed over all classes.Reference points are defined at least 50 points per class.Points for each category are adjusted considering the area of class in the entire scene and the relative importance of that class for a particular application.Error matrices are shown in Table 3 and Table 4.It can be seen that the proposed model has achieved an overall accuracy of 86.25% and 83.25%, and kappa statistics of 0.8193 and 0.7881 for study area 1 and study area 2 respectively.For individual classes of study area 1, the producer's accuracy ranges from 81.17% for building to 98.21% for water class, and the user's accuracy ranges from 77.53% for road to 100.00% for the water class.Study area 2 achieves producer's accuracy ranges from 67.97% for bare soil to 92.65% for road, and user's accuracy ranges from 67.02% for building to 100.00% for the water class.9).Also, it achieves convenient producer's and user's accuracy in both study areas (Figure 10).

Discussion
The suggested methodology introduces pre-processing tasks including pan sharpening and shadow correction processes to enhance the input raw data.Thanks to pre-processing, the results become more valuable.Also, the developed model depends on a designed neural networks function  In addition, building-road and building-bare soil confusions affect the classification outcomes.In study area 1, 12% (18/154) of building points were assigned to road and 6% (9/154) were seen as bare soil, against 14% (12/83) of road points and 5% (3/56) of bare soil points were seen as building class.
Confusion of building, road, and bare soil occurs due to the spectral similarities between their materials and wide heterogeneous surfaces of buildings in urban environments.Added to that, a few inaccurate points are noticed in water classification results because of the shallow water effect.
The proposed model has effectively performed the classification process for both well-planned and semi-planned study areas using WorldView-2 and WorldView-3 satellite images.Shallow neural networks have presented a fast response for the training and testing process, and are easy to be involved in automatic strategies.Results fulfil the scientific knowledge gaps in the performance of neural network algorithms for urban land cover classification, built-up area maps, infrastructural networks, and water resource distribution.However, shallow neural network outcomes are changed after every iteration of re-training or reapplying the model, which makes the neural network technique a black box.Also, the used strategy has a major drawback as it depends on classifying each pixel separately, which occasionally adds a "Salt-and-Pepper" effect to classified images."Salt-and-Pepper" impact is a result of high heterogeneity between nearby pixels.Each pixel is handled independently of its surrounding ones in pixel-based techniques.Despite their neighbourhood, the surrounding pixels frequently belong to different classes (Kelly, M., et al., 2011).
To overcome this pixel-based "Salt-and-Pepper" impact, it is recommended to apply an image segmentation process first and apply neural network classification on the object level instead of pixels.However, the acceptable performance of shallow neural networks for VHR image classification in urban environments, new strategies are needed to optimize the outcomes and maximize classification results.Segmenting the entire image into a set of objects is suggested to enhance the classification process depending on the object level.Integration between shallow neural networks and object-based features is predicted to add valuable information to the designed model and improve the shallow neural network performance.


Applying pre-processing techniques to present the used images in a better appropriate form for the classification process. Developing a shallow neural network model consisting of a few fully connected layers.Through the model, required features are extracted for each pixel focusing on basic characteristics (e.g., spectral bands and spectral indices) of input images. Applying the developed model for classifying urban case studies to validate and check the model stability in comparison with standard classification methods considering overall accuracy and kappa coefficient.

Figure 1 .
Figure 1.Flow chart of the proposed methodology.

Figure 5 .
Figure 5.Samples intensity distribution for study area 1.
Water, V: Vegetation, BS: Bare soil, R: Road, and B: Building Table 1.Signature separability for classes of study area 1.

Figure 6 .
Figure 6.Extracted features including spectral bands and indices values for each pixel.
(code)   considering not only the intensity value of each pixel, but also additional features like indices, neighbourhood, texture, and digital elevation models.Result analysis shows that shallow neural networks are an effective technique for classifying water and vegetation classes.Also, it offers appropriate results for road detection, while it needs more intensive work to enhance building and bare soil extraction.Error matrices illustrate some vegetation-bare soil misclassifications; 7% (4/56) of bare soil points in study area 1 and 12% (15/128) in study area 2 were classified as vegetation, against 6% (3/51) of vegetation points in study area 1 and 6% (3/48) in study area 2 were predicted as bare soil.Misclassifications between vegetation and bare soil classes occur since some patches among vegetation areas are bare of plants which are misclassified as bare soil.Additionally, a weak vegetation effect appears in some bare soil areas that are misclassified as vegetation (Figure11).

Figure 11 .
Figure 11.Confusion between vegetation and bare soil classes.
is one of the most essential steps for urban feature extraction using VHR satellite images.Most image classification techniques require adequate and effective samples for training.Samples are usually identified manually and labelled by human visual inspection or field exploration.Training samples strongly affect classification results.Consequently, training samples should be meaningful, pure, representative, and cover the entire image.Selecting effective samples requires two main procedures to fit the classification process and enhance final results: sample testing and sample refining Table 2 illustrates the statistical properties for samples of study area 1.

Table 3 .
Confusion matrix for shallow neural network classification of study area 1.

Table 4 .
Confusion matrix for shallow neural network classification of study area 2. The nearest neighbour object-based method has achieved an overall accuracy of 84.50% and 82.75%, and kappa statistics of 0.7941 and 0.7833, while the maximum likelihood pixel-based strategy has got an overall accuracy of 82.50% and 80.25%, and kappa statistics of 0.7729 and 0.7507 for study area 1 and study area 2 respectively.Detailed results obtained by object-based and pixel-based methods are shown in the Appendix.It can be seen that the shallow neural network model slightly outperforms traditional methods in terms of overall accuracy and KAPPA coefficient (Figure Neural networks achieve significant progress in a variety of image analysis tasks including image classification and feature extraction for sustainable urban management using VHR satellite images.In this article, a shallow neural network model has been developed for image classification consisting of two hidden layers with one hundred neurons structure and applied to different study areas.Shallow neural networks are effective for different spatial and various spectral resolution images.The proposed model is compared to two standard classification methods and achieves promising results with an overall accuracy of 86.25% and 83.25% for study area 1 and study area 2 respectively.It achieves efficient outcomes for water, vegetation, and road, meanwhile building and bare soil classes still require more enhancement.It is concluded that shallow neural networks can be effective enough to produce classification results that outperform state-of-the-art object and pixel-based classification techniques, with some limitations that require new strategies to enhance classification results.

.50% Overall kappa statistics 0.7941 Table A-1.
Confusion matrix for study area 1 using Nearest Neighbour object-based classification.

Table A -
2. Confusion matrix for study area 1 using Maximum likelihood pixel-based classification.

.75% Overall kappa statistics 0.7833 Table A-3.
Confusion matrix for study area 2 using Nearest Neighbour object-based classification.

Table A -
4.Confusion matrix for study area 2 using Maximum likelihood pixel-based classification.