SEMI-SUPERVISED MARGINAL FISHER ANALYSIS FOR HYPERSPECTRAL IMAGE CLASSIFICATION

The problem of learning with both labeled and unlabeled examples arises frequently in Hyperspectral image (HSI) classification. While marginal Fisher analysis is a supervised method, which cannot be directly applied for Semi-supervised classification. In this paper, we proposed a novel method, called semi-supervised marginal Fisher analysis (SSMFA), to process HSI of natural scenes, which uses a combination of semi-supervised learning and manifold learning. In SSMFA, a new difference-based optimization objective function with unlabeled samples has been designed. SSMFA preserves the manifold structure of labeled and unlabeled samples in addition to separating labeled samples in different classes from each other. The semi-supervised method has an analytic form of the globally optimal solution, and it can be computed based on eigen decomposition. Classification experiments with a challenging HSI task demonstrate that this method outperforms current state-of-the-art HSI-classification methods.


INTRODUCTION
With the development of the remote-sensing imaging technology and hyperspectral sensors, the use of hyperspectral image (HSI) is becoming more and more widespread, such as target detection and land cover investigation.Due to the dense sampling of spectral signatures of land covers, HSI has a better potential discrimination among similar ground cover classes than traditional multispectral scanners (Li et al., 2011, Guyon and Elisseeff, 2003, D. Tuia et al., 2009).However, classification of HSI still faces some challenges.One major challenge is that the number of training samples is typically limited compared with the dimensionality of the data (Huang er al., 2009).This procedure usually results in a loss of accuracy as the data dimensionality increases, which is called the curse of dimensionality (Chen andZhang, 2011, Kaya et al., 2011).Therefore, the most important and urgent issue is how to reduce the number of those bands largely, but without loss of information (Yang et al., 2011, Paskaleva et al., 2008).
The goal of dimensionality reduction is to reduce complexity of input data while some desired intrinsic information of the data is preserved.Techniques for representing the data in a low-dimensional space can be subdivided into two groups (Huang et al., 2011, Huang et al., 2011): (1) linear subspace algorithms; (2) manifold learning based nonlinear algorithms.Principal component analysis (PCA) and Linear discriminant analysis (LDA) are the most popular subspace methods among all dimensionality reduction algorithms, which seek linear subspaces to preserve the desired structure of the data in the original Euclidean space.In recent years, many researchers have considered that the real-world data may lie on or close to a lower dimensional manifold.The representative of such methods include locally linear embedding (LLE) (Roweis and Saul, 2000), isometric mapping (Isomap) (Tenenbaum et al., 2000), and Laplacian eigenmaps (Belkin and Niyogi, 2003), etc.However, these methods are defined only on training data, and the issue of how to map new test data remains difficult.Therefore, they cannot be applied directly to problems.
Recently, some algorithms resolve this difficulty by finding a mapping on the whole data space, rather than on training data.Locality Preserving Projection (LPP) (He et al., 2005) and neighborhood preserving embedding (NPE) (He et al., 2005) are defined everywhere in the ambient space rather than just on the training data set.Then, LPP and NPE outperform LLE and LE in locating and explaining new test data in the feature subspace.However, the two algorithms, like Isomap, LLE and LE, do not make use of the class label information, which is much available for classification tasks.Cai et al. (Cai et al., 2007) presented a Locality Sensitive Discriminant Analysis (LSDA) algorithm to find a projection which maximizes the margin between data points from different classes at each local area.Another discriminant analysis method should be mentioned here, which is called marginal Fisher analysis(MFA) (Yan et al., 2007).MFA provides a proper way to overcome the limitations of LDA, and a new criterion for MFA is designed to obtain an optimal transformation by characterizing the intra-class compactness and the inter-class separability.
Most of the existing discriminant methods are fully supervised, which can perform well when label information is sufficient.However, in the real world, the labeled examples are often very difficult and expensive to obtain.The traditional supervised methods cannot work well when lack of training examples; in contrast, unlabeled examples can be easily obtained (Camps-Valls et al., 2007, Velasco-Forero et al., 2009, D. Tuia et al., 2009).Therefore, in such situations, it can be beneficial to incorporate the information which is contained in unlabeled examples into a learning problem, i.e., semi-supervised learning (SSL) should be applied instead of supervised learning (Song et al., 2008, Zhu and Goldberg, 2009, Zha and Zhang, 2009).
In this paper, we propose a semi-supervised dimensionality reduction method, called semi-supervised marginal Fisher analysis (SSMFA), to process hyperspectral image for classification.SSMFA utilizes the labeled and unlabeled data points to discover both discriminant and geometrical structure of the data manifold.In SSMFA, the labeled points are used to maximize the intermanifold separability between data points from different classes, while the unlabeled points are used to minimize the intra-manifold compactness between data points belong to the same class or neighbors.
The remaining of the paper is organized as follows.In Section 2, we first present a brief review of MFA , and discuss the limitations of this algorithm.The SSMFA method is introduced in Section 3. Section 4 presents the experimental results on both illustrative toy examples and several hyperspectral image databases to demonstrate the effectiveness of SSMFA.Finally, we provide some concluding remarks and suggestions for future work in Section 5.

RELATED WORKS
In this section, we provide a brief review of marginal Fisher analysis (MFA) (Yan et al., 2007), which is relevant to our proposed method.We begin with a description of the dimensionality reduction problem.

The Dimensionality Reduction Problem
The generic dimensionality reduction problem is as follows.Given n data points X = {x1, x2, . . ., xn} ∈ m×n sampled from one underlying manifold M, and i ∈ {1, . . ., c} denotes the class label of xi.The goal of dimension reduction is to map where d m by using the information of examples.

Marginal Fisher Analysis (MFA)
To overcome this limitation of Linear discriminant analysis (LDA), marginal Fisher analysis (MFA) was proposed by developing a new criterion that characterized the intra-class compactness and the inter-class separability to obtain the optimal transformation.
In order to discover both geometrical and discriminant structure of data points, MFA designs two graphs that characterize the intra-class compactness and inter-class separability, respectively.The intrinsic graph Gc illustrates the intra-class point adjacency relationship, and each sample is connected to its k-nearest neighbors that belong to the same class.The penalty graph Gp illustrates the inter-class marginal point adjacency relationship, and the marginal point pairs of different classes are connected.
. ., x k 1 i } be the set of its k1 nearest neighbors.Thus, the weight matrix Wc of Gc can be defined as follows Then, the intra-class compactness is defined as the sum of distances between each node and its k1-nearest neighbors that belong to the same class: where Dc is a diagonal matrix with Dc,ii = j Wc,ij, Lc = Dc − Wc is the Laplacian matrix.
We consider each pair of points (xi, xj) from the different classes, and add an edge between xi and xj if xj is one of xi's k2-nearest neighbors whose class labels are different from the class label of . ., x k 2 i } be the set of its k2 nearest neighbors.The weight matrix Wp of Gp can be defined as The interclass separability is characterized by a penalty graph with the term where Dp is a diagonal matrix with Dp,ii = j Wp,ij, Lp = Dp − Wp is the Laplacian matrix.
Then, the Marginal Fisher Criterion is defined as follows: The projection direction V that minimizes the objective function ( 5) can be obtained by solving the generalized eigenvalue problem: In this section, we introduce the SSMFA method, which respects both discriminant and manifold structures in the data.We begin with a description of the semi-supervised dimensionality reduction problem.

The Semi-supervised Dimensionality Reduction Problem
The generic semi-supervised dimensionality reduction problem is as follows.Given n data points X = {xi, x2, . . ., xn} ∈ m sampled from one underlying manifold M, suppose that the first l points are labeled, and the rest n − l points are unlabeled.Let i ∈ {1, . . ., c} denote the class label of xi.The goal of semisupervised dimension reduction is to map where d m by using the information of both labeled and unlabeled examples.

SSMFA
To improve MFA, we consider to exploit the optimal discriminant features from both labeled and unlabeled examples, and it also treats the labeled and unlabeled data differently to construct the objective function for maximizing the inter-class margin while minimizing the intra-class disparity.
We assume that naturally occurring data may be generated by structured systems with possibly much fewer degrees of freedom than what the ambient dimension would suggest.Thus, we consider the case when the data lives on or close to a manifold M of the ambient space.In order to model the local geometrical structure of M, we first construct a nearest neighbor graph G.For each data point xi, we find its k nearest neighbors and put an edge between xi and its neighbors.Let N (xi) = {x 1 i , x 2 i , . . ., x k i } be the set of its k nearest neighbors.Thus, the weight matrix of G can be defined as follows: Once the graphs G b and Gw are constructed, their affinity weight matrices, denoted by W b and Ww, respectively, can be defined as where β > 1 is a trade-off parameter, which is used to adjust the contribution of labeled and unlabeled data.When two data share the same class label, it is with high confidence that they live on a same manifold.Therefore, the weight value should relatively be large.
The objective of SSMFA is embodied as that it maximizes the sum of distances between margin points and their neighboring points from different classes, and minimizes the sum of distances between data pairs of the same class and each sample in N l to its k3-nearest neighbors.Then, a reasonable criterion for choosing an optimal projection vector is formulated as The objective function (10) on the between-manifold graph G b is simple and natural.If W b,ij = 0, a good projection vector should be the one on which those two samples are far away from each other.With some algebraic deduction, (10) can be simplified as where The objective function ( 11) on the within-manifold graph Gw is to find a optimal projection vector with stronger intra-class/local compactness.Following some algebraic deduction, (11) can be simplified as where Dw is a diagonal matrix with Dw,ii = j Ww,ij and Lw = Dw − Ww is the Laplacian matrix.
Then, the objective function ( 10) and ( 11) can be reduced to the following: From the linearization of the graph embedding framework, we have the Semi-supervised Marginal Fisher Criterion The optimal projection vector v that maximizes ( 15) is generalized eigenvectors corresponding to the d largest eigenvalues in Let the column vector v1, v2, . . ., v d be the solutions of ( 16) ordered according to their eigenvalues λ1 ≤ λ2 ≤ . . .≤ λ d .Thus, the projection vector is given as follows:

EXPERIMENTS AND DISCUSSION
In this section, we compare SSMFA with the several representative dimensionality reduction algorithms on synthetic data and two well-known hyperspectral image databases.

Experiments on synthetic data
To  (Cai et al., 2007) and SSMFA, where two-class synthetic data samples are embedded into one-dimensional space.In this experiment, we select 20 data as the seen set and 200 data as the unseen set for each class.
Figure 1 shows the synthetic dataset, where the first class is represented by a single Gaussian distribution, while the second class is represented by two separated Gaussians.The results of this example indicate that both MFA and SSMFA are working well, whereas almost other methods other method almost mix samples of different classes into one cluster.The poor performance of LDA and MMC is caused by the overfitting phenomenon, and the failure of other semi-supervised methods, i.e., SSMMC, S 3 MPE and SDA, may be due to the assumption that the data of each class has a Gaussian distribution.
The above examples illustrate that both MFA and SSMFA have more discriminating power than other methods, and the performance of SSMFA is the best.In the experiments, the parameter β in SSMFA is set to 100.For all graph-based dimensionality reduction methods, the number of neighbors is set to 7 for all cases.
To test these algorithms, we select a part of data as the seen set and another part of data as the unseen set.Then, we randomly split the seen set into the labeled and unlabeled set.The details of experimental settings are shown in Table 1.For the unsupervised methods, all data points including the labeled and unlabeled data is used to learn the optimal lower-dimensional embedding without using the class labels of the training set.For supervised and semi-supervised methods, we increase the training set with the labeled images taken randomly from the seen set.
In our experiments, the training set is used to learn a lower-dimensional embedding space using the different methods.Then, all the testing samples are projected onto the lower-dimensional embedding space.After that, the nearest neighborhood (1-NN) classifier with Euclidean distance is employed for classification in all experiments.

Experimental Results on Washington DC Mall Dataset
In this experiment, we repeat the classification process 10 times by using different splits and calculate the average classification rates on different dimensions.Table 2 reports the best performance of different methods.
As can be seen from Table 2, for unsupervised and semi-supervised methods, the recognition accuracy increases with the increase in the extra sample size of the unlabeled set.Clearly, the semisupervised methods outperform the supervised counterparts, which indicates that combining the information provided by the labeled and unlabeled data gives the benefits to hyperspectral image classification.In most cases, SSMFA outperforms all the other methods.
To investigate the influence of the numbers of labeled data on the performances of semi-supervised algorithms, we replace the 2 labeled data in semi-supervised algorithms with 4, 6 and 8 labeled data, respectively.As expected, the classification accuracy rises with the increase in the labeled sample size.Note that SSMMC and SSMFA don't improve much on MMC and MFA when 8 labeled data for training, because 8×7 = 56 labeled examples are quite enough to discriminate the data and therefore the effects of unlabeled data has been limited.

Experimental Results on AVIRIS Indian Pines Data Set
In each experiment, we randomly select the seen and unseen set according to Table 1, and the top 50 features were selected from AVIRIS Indian Pines data set by different methods.
The results are shown in Table 3.
As can be seen from Table 3, all semi-supervised methods are all superior to supervised and unsupervised methods, which indicates that combining the information provided by the labeled and unlabeled data gives the benefits to feature selection for hyperspectral image classification.Our SSMFA method also outperforms the semi-supervised counterparts in most cases.Nevertheless, for unsupervised and semi-supervised methods, the classification accuracy increases with the increase in the extra sample size of the labeled set.
It's obvious that the semi-supervised methods greatly outperform the other methods and that SSMFA performs better than others semi-supervised methods.Nevertheless, most methods don't achieve good performance in this data set.One possible reason for the relatively poor performance of different methods in the data set lies in the strong mixture of the class signatures of AVIRIS Indian Pines data set.Hence, it is very interesting to develop more effective method to further improve the performance of hyperspectral image classification under complex conditions.

Discussion
The experiments on synthetic data and two hyperspectral image databases have revealed some interesting points.
(1) As more labeled data are used for training, the classification accuracy increases for all supervised and semi-supervised algorithms.This shows that a large number of labeled samples give benefits to identify the relevant features for classification.
(2) Semi-supervised methods, i.e. S 3 MPE, SDA and SSMFA, consistently outperform the supervised and unsupervised methods, which indicates that semi-supervised methods can take advantage of both labeled and unlabeled data points to find more discriminant information for hyperspectral image classification.
(3) An effective learning method can improve the performance when the number of the available unlabeled data increases.As expected, the recognition rate of the semi-supervised and unsupervised methods are significantly improved when the number of the available unlabeled data increases in Tables 2-3.Meanwhile, the supervised methods, i.e.MMC and LDA, degrade the performance of classification accuracy due to the overfitting or overtraining.
(4) As shown in Tables 2-3, it is clear that SSMFA performs much better than other semi-supervised methods and MFA achieves a good performance than other supervised methods.This is due to that many methods, i. e. LDA and MMC, assume a Gaussian distribution with the data of each class.MFA and SSMFA provide a proper way to overcome this limitations.Furthermore, SSMFA outperforms MFA in most cases.MFA can only use the labeled data, while SSMFA makes full use of the labeled and unlabeled data points to discover both discriminant and geometrical structure of the data manifold.

CONCLUSIONS
The application of machine learning to pattern classification in the area of hyperspectral image is rapidly gaining interest in the community.It is a very challenging issue of urgent importance to select a minimal and effective subset from those mass of bands.
In this paper, we present a semi-supervised dimensionality reduction method, called semi-supervised marginal Fisher analysis (SSMFA), for hyperspectral image classification.SSMFA has ISPRS Annals of the Photogrammetry, Remote Sensing and Spatial Information Sciences, Volume I-3, 2012 XXII ISPRS Congress, 25 August -01 September 2012, Melbourne, Australia two prominent characteristics.First, it is designed to achieve good discrimination ability by utilizing the labeled and unlabeled data points to discover both discriminant structure of the data manifold.In SSMFA, the labeled points are used to maximize the inter-manifold separability between data points from different classes, while the unlabeled points are used to minimize the intramanifold compactness between data points belong to the same class or neighbors.Second, it provides a proper way to overcome the limitations that many dimensionality reduction methods assume a Gaussian distribution with the data of each class, which further improves hyperspectral image recognition performance.Experimental results on synthetic data and two well-known hyperspectral image data sets demonstrate the effectiveness of the proposed SSMFA algorithm.

Figure 1 :
Figure1: The optimal projecting directions of different methods on the synthetic data.The circles and triangles are the data samples and the filled or unfilled symbols are the labeled or unlabeled samples; solid and dashed lines denote 1-dimensional embedding spaces found by different methods, respectively (onto which data samples will be projected)4.2Experiments on the real hyperspectral image datasetIn this subsection, classification experiments are conducted on three HSI datasets, i.e., Washington DC Mall dataset(Biehl, 2011)   and AVIRIS Indian Pines dataset(Biehl, 2011), to evaluate the performance of SSMFA.4.2.1 Data Description (I) The Washington DC Mall hyperspectral dataset (Biehl, 2011) is a section of the subscene taken over Washington DC mall(1280×307 pixels, 210 bands, and 7 classes composed of water, vegetation, manmade structures and shadow) by the Hyperspectral Digital Imagery Collection Experiment(HYDICE) sensor.This sensor collects data in 210 contiguous, relatively narrow, uniformly-spaced spectral channels in the 0.40-2.40µm region.A total of 19 channels can be identified as noisy(1, 108-111, 143-153, 208-210), and safely removed as a preprocessing step.This data set has been manually annotated in (Biehl, 2011), and there are 60 patches of seven classes with ground truth.The hyperspectral image in RGB color and the corresponding labeled field map based on the available ground truth are shown in Figure 2(a) and Figure 2(b) respectively.In the experiments, we use those annotated data points to valuate the performance of SSMFA.
ISPRS Annals of the Photogrammetry, Remote Sensing and Spatial Information Sciences, Volume I-3, 2012 XXII ISPRS Congress, 25 August -01 September 2012, Melbourne, Australia where σ is the local scaling parameter.The nearest neighbor graph G with weight matrix A characterizes the local geometry of the data manifold.However, this graph fails to discover the discriminant structure in the data.In order to discover both geometrical and discriminant structures of the data manifold, we construct two graphs, that is, between-manifold graph G b and within-manifold graph Gw.For between-manifold graph G b , the ith node corresponds to data point xi.Then, we put an edge between nodes i and j if xi and xi and xi have different class labels.For within-manifold graph Gw, we put an edge between nodes i and j if xi and xi have the same class labels or xi or xj is unlabeled but they are close, i.e. xi is among k3 nearest neighbors of xj or xj is among k3 nearest neighbors of xi.Let N l k 3

Table 2 :
Comparison of different methods for Washington DC Mall data set (mean + std(dim)).Lab'and 'Unlab' denote the number of labeled samples and the number of unlabeled samples.The bold values indicate that the corresponding methods obtain best performances under specific conditions.And this notes for each table. '