COMBINING PIXEL-BASED AND OBJECT-ORIENTED SUPPORT VECTOR MACHINES USING BAYESIAN PROBABILITY THEORY

This study employed a hybrid system for the combination of pixel-based (PB) and object-oriented (OO) Support Vector Machines (SVMs) based on Bayesian Probability Theory (BPT) for improved land cover classification. A set of uncorrelated feature attributes have been generated from a one-meter IKONOS satellite image. Four different SVMs kernels were compared and tested to classify buildings, trees, roads and ground from satellite image and the generated attributes. The kernels used include: linear, polynomial, radial basis function (RBF), and sigmoid. PB and OO SVMs have been applied to classify the image. BPT was then applied for combining the class memberships from the PB and OO classifiers. Accuracy assessment was carried out using reference data sets derived from the one-meter IKONOS image. The outcomes demonstrate that the OO method has achieved an overall kappa coefficient of 0.8286, compared with 0.6327 that was derived from the conventional PB method. The improvement in overall kappa obtained from the combined system was 0.0608 over the OO SVMs. * Corresponding author.


INTRODUCTION
Research on land use/land cover classification from remotely sensed data has been fuelled in recent years by the increasing use of geographic information systems (GIS), and the need for data acquisition and updates for GIS.Different classification methods have been used for classification of satellite data by researchers.In addition to current PB classification methods, OO techniques also offer the suitable analyses to classify satellite data.
Advanced per-pixel classification algorithms include, but are not limited to, boosting and/or bagging-based classification and regression trees (Lawrence et al., 2004;Baker et al., 2006) and Random Forest (RF) algorithm (Lawrence et al., 2006).The PB classification is a current method because satellite data sets are acquired digitally on the basis of pixel units.PB classifiers are based exclusively on the pixel digital numbers (Shataee et al., 2004).Traditional PB classification is limited because of the following reasons: image pixels are not true geographical objects; PB classification largely neglects the spatial information within an object which is an important source of information to image classification; and the increased variability implicit within high spatial resolution imagery confuses traditional PB classifiers resulting in lower classification accuracies (Hay and Castilla, 2006).
On the other hand, OO classification provides new possibilities for multi-resolution segmentation of images (Shataee et al., 2004).In OO image analysis the basic processing units are not only individual pixels but also image objects or segments.The classifiers in OO image analysis are soft classifiers, which use 'membership' to express an object's assignment to a defined class.The membership value lies between 0.0 and 1.0, where 0.0 expresses absolute improbability and 1.0 expresses a complete assignment to a class.One advantage of these soft classifiers lies in their possibility to express uncertainties about the class descriptions.Despite the fact that results derived from these approaches are of considerable interest to researchers, segmentation is not an easy work.The object-based software eCognition is available, but requires adjusting the segmentation parameters according to the situation, but this is hardly an automated solution.In this context, a traditional approach based on pixels with models derived from advanced artificial intelligence techniques can achieve good classification results (Garcia-Gutierreza et al., 2009).Blaschke (2010) gave an overview of the development of object based methods.Some studies have utilized advanced classification algorithms within OO analysis.A decision tree classification was used in an object-based analysis of IKONIS imagery for forest inventories (Chubey et al., 2006).Hay and Castilla (2006) applied Object-based Image Analysis for partitioning remotely sensed imagery into meaningful image-objects, and assessing their characteristics through spatial, spectral and temporal scales.Kamagata et al. (2006) applied an OO classifier to HR multi-spectral (MS) imagery (QuickBird and Ikonos) with improved results over traditional techniques.The nearest neighbor (NN) classification that utilizes fuzzy logic and a membership function-based classification was also applied (Navulur, 2007).Li Haitao et al. (2007) presented a new OO land cover classification method based on SVMs by fusing spectral and textural information of HR aerial imagery and a lidar derived Digital Surface Model (DSM) in urban areas.Bruce (2008) summarized classification accuracies derived from: six banded Landsat TM data; MS and panchromatic QuickBird satellite imagery; and 0.15m MS aerial imagery to show how, for at least images exhibiting low spectral dimensionality, OO techniques are superior to the traditional PB methods, but still inferior to human interpretation.Myint et al. (2011) employed five different classification procedures based on the OO paradigm that separates spatially and spectrally similar pixels at different scales.The classifiers used to assign land cover types to segmented objects in the study include membership functions and the nearest neighbor classifier.The object-based classifier achieved a high overall accuracy (90.40%), whereas the most commonly used decision rule, namely maximum likelihood classifier, produced a lower overall accuracy (67.60%).
The combined approach using per-pixel and OO classification methods has proved useful in the analysis of HR satellite data, since it has resulted in higher per class accuracy (Blaschke et al., 2010).Hirose et al. (2004) described a hybrid analysis method combining OO and PB image classifiers for vegetation mapping using IKONOS.Lei et al. (2008) presented a hybrid classifier combining an expert system with an OO approach, which provides additional information for classification and improved accuracy.However, setting up the rules for the expert system is complicated and time consumed, and requires some experience in defining the land cover features and their spatial distribution.Bhaskaran et al. ( 2010) described an approach using both per-pixel and object-based classification methods for mapping urban features from HR satellite data over New York City.Whilst the per-pixel approach produced reasonable overall accuracy, the per-class accuracies registered low user's accuracy.The use of an OO classification method resulted in improved classification accuracies.Li et al. (2013) proposed a hybrid method combining PB and OO methods and its application in Hungary using the Chinese HJ-1 satellite images.Classification results showed that the hybrid method outperformed OO method, with an overall accuracy of 90.53%, compared with the overall accuracy of 77.53% for maximum likelihood classifier at the object level.
Amongst other classification methods, SVMs classification is a theoretically superior machine learning methodology for the classification of highly dimensional datasets and has been found competitive with the best machine learning algorithms.In the past, SVMs were only tested and evaluated as PB image classifiers with very good results (Gualtieri and Cromp 1999, Brown et al 2000, Huang et al 2002, Foody and Mathur 2004, Melgani and Bruzzone 2004).SVMs were compared to other classification methods, such as Neural Networks, Nearest Neighbor, Maximum Likelihood and Decision Tree classifiers for remote sensing imagery and have surpassed all of them in robustness and accuracy (Huang et al 2002, Foody andMathur 2004).SVMs have also been evaluated as OO image classifiers as a modern computationally intelligent method (Li Haitao et al., 2007).The major motivation of this work is to establish a framework for combining PB and OO SVMs based on the BPT.After describing the study area and data sources in the following section, this paper is organized as follows.Section 3 describes the classification methods used; Section 4 presents and evaluates the results which are summarized in Section 5.

Multispectral satellite image
In order to demonstrate the capability of the hybrid system, the area of study of approximately one Km 2 covers Roxi Square in Cairo city.One-meter spatial resolution and pansharpened IKONOS images over the area of study were collected in April 17, 2010 and supplied in a TIFF digital format.It is a largely dense urban area which includes residential buildings, large buildings, a network of main and local roads, open and green areas as well as trees as shown in figure 1.
Figure 1.A one-meter IKONOS Image over the test area.

Reference data
In order to evaluate the accuracy of the classifications undertaken in this research, reference data were captured by digitizing buildings, trees, roads and ground in the image as shown in figure 2. Class "ground" mainly corresponds to parking lots and bare fields.
All recognizable features independent of their size were digitized.Adjacent buildings that were joined but obviously separated were digitized as individual buildings; otherwise, they were merged into one polygon.Larger areas covered by trees were digitized as one polygon.
Figure 2. Reference data used for the research.Red: buildings, green: trees, black: roads and grey: ground.

Feature attributes
Feature attributes are necessary to compensate for some common problems associated with high resolution image data such as: shadows caused by tall buildings or trees; and the spectral variability within the same land-cover class (Lu and Weng, 2007).In conjunction with spectral information, texture and shape information of image objects provide useful information for detailed land cover classification (Hirose et al., 2004).This research was carried out using a set of attributes calculated for predefined segments or single pixels and presented as input data for the classifiers to help define classes when performing the classification.A detailed description of the formulas for calculating attributes can be found in Russ (2002).Some attributes are more useful when differentiating objects and classification results may not be as accurate when all attributes are used, since the irrelevant attributes could introduce noise into the results.The most useful attributes for the classification have been statistically determined.The underlying logic is based on Yang (2007).Table 1 shows the attributes and the classifier for which they have been selected.Although spatial information is remarkably useful for OO classifiers, how to effectively use it in PB classification remains a research topic.This is because PB classification is conducted based on individual pixels, instead of the objects.On the other hand, spectral attributes, color space and band ratio attributes were not applied in the case of PB SVMs in order to reduce the data redundancy that can greatly influence the performance of PB SVMs.

METHODOLOGY
The fusion process of the PB and OO classifications was implemented in several stages as shown in Figure 3: Figure 3.The Hybrid Classification Workflow.

Pixel-based Classification
The PB classification process was implemented in several stages as follow:

Training Datasets
The overall objective of the creation of training datasets is to assemble a set of statistics that describe the spectral response patterns for each land cover type to be classified in the image (Lillesand and Kiefer, 2004).The minimum number of pixels required for a signature is the number of bands plus one (N+1), which is the necessary condition for the covariance matrix to be positive definite (Schowengerdt, 2001).The training data used are sets of manually classified samples.
Polygons of approximately equal areas, for each land cover class, buildings, trees, roads and ground, were digitized from the image to generate the training data.The positions of the polygons were selected carefully near class centers to be representative and to capture changes in the spectral variability of each class.As well, it was necessary to avoid the effect of between-class local texture variability on the pixels near class boundaries that causes many of these pixels to be placed in an incorrect category (Ferro and Warner, 2002).Figure 4 shows the locations of the training data sets used for the experiments.

Evaluation of signatures:
The created signatures are compared in a box plot illustrating minimum and maximum reflectance values corresponding to the signatures of the features used for training, as shown in Figure 5.The box plot option shows completely separable minimum/maximum boxes.Image classification SVM is a classification system derived from statistical learning theory (Vapnik, 1979).It separates the classes with a decision surface that maximizes the margin between the classes.The surface is often called the optimal hyperplane, and the data points closest to the hyperplane are called support vectors.As a consequence they generalize well and often outperform other algorithms in terms of classification accuracies.Furthermore, the misclassification errors are minimized by maximizing the margin between the data points and the decision boundary.
Since the One-Against-One (1A1) technique usually results in a larger number of binary SVMs and then in subsequently intensive computations, the One-Against-All (1AA) technique was used to solve for the binary classification problem that exists with the SVMs.The SVMs classifier provides four types of kernels: linear, polynomial, radial basis function (RBF), and sigmoid.
In remote sensing applications the RBF kernel has proved to be effective with reasonable processing times (Van der Linden et al., 2009).The RBF kernel nonlinearly maps samples into a higher dimensional space.Unlike the linear kernel, which is a special case of the Gaussian kernel, a Gaussian RBF function can handle more complex and nonlinear class distributions.In addition, the sigmoid kernel behaves like RBF for certain parameters (Lin and Lin, 2003).On the other hand, the polynomial kernel requires more parameters and has more numerical difficulties than the RBF kernel (Hsu et al., 2009).
However, all of those kernels have been tested and compared in this research in order to form a robust decision about the behavior of SVMs in the case of HR satellite imagery.Table 2 shows the mathematical representation of each kernel: Table 2.The SVMs kernels.
Linear K (x i , x j ) = x i .x j Polynomial K (x i , x j ) = ((x i .x j ) + r) d , γ > 0 RBF K (x i , x j ) = exp(-γ||x i -x j || 2 ), γ > 0 Sigmoid K (x i , x j ) = tanh(γ (x i .x j ) + r) Where: γ :the gamma term in the kernel function for all kernel types except linear.d :the polynomial degree term in the kernel function for the polynomial kernel.r :the bias term in the kernel function for the polynomial and sigmoid kernels.γ, d and r : are user-controlled parameters, as their correct definition significantly increases the accuracy of the SVM solution.
A 10-fold cross-validation was applied to choose the almost best parameter.The cross-validation can prevent overfitting problems and results in better accuracy (Hsu et al., 2009).On the other hand, a second order polynomial kernel was applied for the current nonlinear problem.Table 3 shows kernels parameters used for the experiments.The sequential minimal optimization (SMO) algorithm, with a faster speed and much smaller memory requirements has been used for the experiments for training the SVMs (Platt, 1999).SMO breaks the large quadratic programming optimization problem into a series of smallest possible QP ones in order to avoid time-consuming.

Object-based Classification
The OO SVM classification process was implemented in several stages as follows: Image Segmentation Segmentation is the process of partitioning an image into segments by grouping neighboring pixels with similar feature values (brightness, texture, color, etc.).These segments ideally correspond to real-world objects.Each segment is assigned the mean spectral values of all the pixels that belong to that region.An edge-based segmentation algorithm was employed that is very fast and only requires one input parameter (Scale Level).By suppressing weak edges to different levels, the algorithm can yield multi-scale segmentation results from finer to coarser segmentation.The optimum segmentation scale that delineates the boundaries of features as well as possible was iteratively chosen to be 57 and the results are shown in figure 6 (a).

Merging Segments
Merging is a step used to aggregate small segments within larger ones where over-segmentation may be a problem.The Merge Level that delineates the boundaries of features as well as possible was iteratively chosen to be 27.The Full Lambda-Schedule algorithm created by Robinson et al. (2002) was employed.The algorithm iteratively merges adjacent segments based on a combination of spectral and spatial information.
Merging proceeds if the algorithm finds a pair of adjacent regions, i and j, such that the merging cost t i,j is less than a defined threshold lambda value: )) , ( ( Where O i : is region i of the image |O i | : is the area of region i u i : is the average value in region i u j : is the average value in region j ||u i -u j ||: is the Euclidean distance between the spectral values of regions i and j. length (∂(O i , O j )): the length of the common boundary of O i and O j .
Segmentation results were then refined using another merging method called thresholding.Thresholding is a raster operation that works with the first band of the image to group adjacent segments based on their brightness value.The lower and upper limits of the threshold were defined to be 100 and 905 respectively.Pixel values below the low threshold and above the high threshold are assigned a value of 0, and values between the thresholds are assigned a value of 255.As a result, a new masked image was generated.The black area in the masked image represents one big region, while the white areas represent other distinct regions.The masked image was then segmented and each distinct region was assigned a unique identifier as shown in figure 6 (b).The identifiers are then used in computing attributes.

Supervised Classification
Supervised classification is the process of using training data to assign objects of unknown identity to one or more known features.A variety of different sizes and colors of objects that represent features of interest has been selected.The more features and training samples selected, the better the results from supervised classification.However, selecting an overwhelming number of training samples will cause poor performance during classification and when previewing classification results.SVMs have been applied to classify the image with the same set of kernel parameters mentioned in table 3. SVM classification output is the decision values of each pixel for each class, which are used for probability estimates.
The probability values represent true probability in the range of 0 to 1, and the sum of these values for each pixel equals 1.The probability images for PB and OO SVMs have been used as input data for the BPT based hybrid system.

Bayesian Probability Theory based fusion
BPT is concerned with establishing the probability that an entity belongs to any of a number of different sets (classes or states).These are called hypotheses in the typical language of Bayes.BPT evaluates the probability that each hypothesis is true given the information contained in the prior probability and evidence images.When complete information is available or assumed, the primary tool for the evaluation of the relationship between the indirect evidence and the decision set is BPT.BPT is an extension of Classical Probability theory which allows combining new evidence about any hypothesis along with prior knowledge to arrive at an estimate of the likelihood that the hypothesis is true.The basis for this is Bayes' Theorem (Lee et al., 1987): p(h|e): the probability of the hypothesis being true given the evidence (posterior probability).p(e|h): the probability of finding that evidence given the hypothesis being true.p(h) : the probability of the hypothesis being true regardless of the evidence (prior probability).

Accuracy Assessment
Accuracy assessments of the proposed system were undertaken using confusion matrices and Kappa statistics.The Kappa Index of Agreement (KIA) is a statistical measure adapted for accuracy assessment in RS fields by Congalton and Read (1983).KIA is a means to test two images, if their differences are due to 'chance' or 'real disagreement'.It is often used to check for accuracy of classified satellite images versus some 'real' ground-truth data.
x ii :number of combinations along the diagonal.
x i+ : total observations in row i.
x +i : total observations in column i. N : total number of cells.
For the per-category-KAPPA, the following algorithm was introduced by Rosenfield and Fitzpatrick-Lins (1986):

Overall Kappa
Before incorporating the PP and OO SVMs into the hybrid system, the four SVMs kernels were tested and compared in terms of overall Kappa to select the kernel with the best performance as a representative of SVMs.In the case of PB SVMs, the overall Kappa of individual Kernels, based on the reference data, are given in Table 4.The RBF kernel performed the best with 0.6327 overall Kappa, followed by the Sigmoid and polynomial kernels with 0.6160 and 0.6031 overall Kappa respectively.The linear kernel performed the worst with overall Kappa of 0.5985.A closer examination of the PB results reveals that the kappa coefficient is relatively low, indicating the PB method is unsatisfactory for classifying remotely sensed images if non-spectral data, such as lidar data, is not incorporated into a classification procedure.Also, in the case of OO SVMs, the RBF kernel performed the best with 0.8286 overall Kappa, followed by the Sigmoid and polynomial kernels with 0.7470 and 0.7149 overall Kappa respectively.The linear kernel performed the worst with overall Kappa of 0.7020.The improvement in overall Kappa achieved by the combination of PB and OO classifications compared with the individual classifiers, is also shown in Table 4.This improvement in the overall kappa is 0.0608 compared that obtained by the OO SVMs.These results support those of Li et al. (2013) who conclude that the hybrid method outperformed OO method, with an overall accuracy of 90.53%, compared with the overall accuracy of 77.53% for maximum likelihood classifier at the object level.

Class-Specific Accuracies
An assessment of the KIA confirms that the hybrid system performed the best in most cases as shown in table 5. Most of the class-accuracies are improved by the Bayes fusion.Whereas the application of PB and OO SVMs resulted in average KIA of 0.6150 and 0.7755 respectively, the application of Bayes fusion resulted in average KIA of 0.9153.
Another advantage of the Bayes fusion is that the achieved errors are less variable.Whereas the application of SVMs resulted in standard deviation of 0.1281 and 0.0867, for KIA, in case of PB and OO respectively, the application of Bayes fusion resulted in a SD of 0.0769.Thus it meets the requirement of Anderson et al., (1976) that the accuracy of interpretation for the different categories should be about equal.Finally, it is worth noting that the classification accuracy for the land cover classes of buildings and trees using RBF kernel is lower compared to those using linear and sigmoid kernels.
Under such an observation, if a particular class is very important, kernels should be tested first to select the best kernels for that class before applying the Bayesian probability based-fusion.

CONCLUSION
In this paper, a powerful hybrid system to combine PB and OO SVMs classifiers based on BPT has been applied.A set of uncorrelated feature attributes have been generated from a onemeter IKONOS satellite image.Four different SVMs kernels were compared and tested to classify buildings, trees, roads and ground from satellite image and the generated attributes.The results show that the OO method has achieved an overall kappa coefficient of 0.8286, compared with 0.6327 that was derived from the conventional PB method.The improvement in overall kappa obtained from the combined system based was 0.0608 over the OO SVMs.As well, the fused system also performed best in terms of per-class accuracies.

Figure 4 .
Figure 4. Training data for Buildings, Ground, Roads and Trees classes.

Figure 5 .
Figure 5. Minimum and maximum reflectance values for signatures of training features of buildings, ground, roads and trees.

Figure 6 .
Figure 6.(a) The optimum segmented image at the scale of 57.(b) Merging adjacent segments based on their brightness values.
Figure7shows a typical example of the hybrid system output which is the decision values of each pixel for each class.The probability values have been used later to create a new classification image without having to recalculate the entire classification.The membership values from all the land covers were compared and the class with the highest membership value was assigned to the pixel label.Figure8is a typical example of the classification results.Red: buildings, green: trees, black: roads and grey: ground.

Figure 7 .
Figure 7.A typical example showing the membership values of the hybrid system.(a) Buildings, (b) Trees, (c) Roads, (d) Ground classes.

Figure 8 .
Figure 8.A typical example showing the classification results.(a) The MS satellite image, (b) The PB/RBF classified image, (c) The OO/RBF classified image, (d) The BPT/RBF classified image.
the Photogrammetry, Remote Sensing and Spatial Information Sciences, Volume II-7, 2014 ISPRS Technical Commission VII Symposium, 29 September -2 October 2014, Istanbul, Turkey This contribution has been peer-reviewed.The double-blind peer-review was conducted on the basis of the full paper.doi:10.5194/isprsannals-II-7-67-2014

Table 1 .
The full set of the possible attributes.√ and x indicate whether or not respectively the attribute has been selected for the classification process.

Table 3 .
The kernels parameters used for the classification process.

Table 4 .
Performance evaluation of single classifiers and hybrid system.
Most of the classaccuracies were improved by the Bayes fusion.Whereas the application of PB and OO SVMs resulted in average KIA of 0.6150 and 0.7755 respectively, the application of Bayes fusion resulted in average KIA of 0.9153.Another advantage of the Bayes fusion is that the achieved errors are less variable.The results in this paper demonstrate the overall advantages of the proposed fusion system for combining pixel-based and objectoriented classifiers.gradientboostingas a refinement of classification tree analysis.Remote Sensing ofEnvironment, 90: 331-336.Lawrence, R.L., S. D. Wood and R.L. Sheley.(2006).Mapping invasive plants using hyperspectral imagery and Breiman Cutler classifications (RandomForest).