A NOVEL OBJECT OF INTEREST EXTRACTION METHOD FOR MOBILE MAPPING SYSTEMS

In this paper, we present a novel object of interest (OOI) extraction scheme that can work robustly for digital measurable image (DMI) sequences collected by mobile mapping systems (MMS). The proposed method integrates tracking and segmentation in a unified framework. We incorporate a new object-shaped kernel with the scale invariant mean shift algorithm to track the OOI through the DMI sequence and thus keep the temporal consistency. The well-known GrabCut approach for static 2D image segmentation is generalized to the DMI sequence for OOI segmentation. Experimental results on real DMI sequence collected by VISAT MMS demonstrate that the proposed approach is robust to the challenges such as low frame rate, large inter-frame displacement of the OOI and background clutter.


INTRODUCTION
Mobile mapping system (MMS) has been recognized as the optimal tool for making digital navigation map, map amendment and GIS data collection.Digital measurable image (DMI) sequences, gathered by MMS, covering both the geoinformation and street view images, have been widely applied to provide location-based service (Petrie, 2010;Wang, C. et al., 2008a).Extraction and update of GIS features, such as fire horns, guideboards, traffic lights, is the key post-processing for MMS.Up to now, it is highly dependent on human annotation and hence the bottleneck of MMS-based applications.Due to the complex background and the large number of GIS feature categories, computer-aided strategy is a promising tactical solution to improve the productivity.Therefore, many researchers and practitioners try to extract GIS object of interest (OOI) in DMI sequences with as less human labor as possible (Hinz, 2008).Here the OOI refers to the selected GIS feature.DMI sequence can be regarded as a subset of video sequence but with much lower frame rates, known 0.5-2 fps, and much larger size, typically larger than 1600×1200, and collected by moving surveying vehicle in natural scenes with no constraints.
Although noticeable success has been achieved in the video object extraction, the state-of-art video OOI extraction can not directly applied to the task of OOI extraction from DMI sequences because of the new challenges brought by the DMI sequences.First, the low frame rate of the DMI sequences often results in large scaling of the OOI through the sequence and this requests more robust estimation of the scale factor, which are often assumed to be relatively small in successive frames (Bae et al., 2009;Wang, C.H. et al., 2008b).Second, low frame rate of DMI sequence often cause large inter-frame displacement of the OOI and this violates the tight assumption of the spatiotemporal continuity of the OOI in previous work (Criminisi et al., 2006;Wang, J. et al., 2005).Third, the imaging platform, known as the surveying vehicle are moving and the static background assumption (Li et al., 2005;Wu et al., 2009) is no longer available.This challenge is compound with the complexity of the background.These challenges are the aim of our work in this paper.
In this paper, we propose a novel GIS OOI extraction approach that integrates tracking and segmentation in a unified framework.The diagram of the proposed method is how in Figure 1.The manual interaction is limited to a bounding box around the OOI in a keyframe.Two major contributions are included in the method, first, a scale invariant mean shift with an object-shaped kernel is proposed for OOI tracking; second, a generalized GrabCut approach is proposed for object segmentation through DMI sequence.The experimental results demonstrate the effectiveness and robustness of the proposed method.
Figure 1.Diagram of the proposed scheme.
The rest of the paper is organized as follows: Section 2 presents a brief review on the previous work; Section 3 proposes a object-shaped kernel mean shift tracking scheme.The proposed generalized GrabCut video object segmentation method will be presented in Section 4. Experimental results are provided in Section 5. Section 6 concludes the paper.

RELATED WORK
We firstly review previous graph cuts-based video OOI extraction approaches (Boykov, Y. et al., 2006;Boykov, Y.Y. et al., 2001) that are directly related to our work.Then, we revisit the mean shift-based tracking algorithms.

Graph Cuts-based OOI extraction 2.1
Video OOI extraction methods can be categorized into 3D volume-based and 2D frame-by-frame based groups.
The 3D spatiotemporal volume (STV)-based approach unifies the analysis of spatial and temporal information by constructing a volume of spatiotemporal data in which consecutive images are stacked to form a third temporal dimension (Boykov, Y. et al., 2006).Wang et al. (Wang, J. et al., 2005) present an interactive video cutout system, which provides a novel user interface letting user painting clue strokes of the foreground and background, to cut out dynamic foreground objects from a video sequence.The system proposed by Li et al (Li et al., 2005) first oversegments each frame into atomic regions using watershed transform and then performs a 3D-graph cuts to bi-label each atomic region as foreground or background.Such methods is quite limited in real-world applications for requirement of quite long preprocessing time for oversegmentation and laborintensive user interface.
In 2D single frame-based methods, objects are segmented in image/video sequences frame by frame, making these methods more feasible in practical applications.In general, pixels in a frame are labelled foreground or background according to color, intensity, texture, temporal consistency, motion or a mixture of them (Bae et al., 2009;Malcolm et al., 2007).State-of-art methods integrate such information to the energy term of the Graph Cuts and attempt to seek the global optimal segmentation of the object.Then the segmentation results, often in the form of labeling trimap, would be propagated to the following frames as the prior of the object.Once some pixels are segmented and labeled incorrectly, the error will be propagated through the sequence and there is no chance to correct the errors.
In order to keep temporal consistency, tracking is often employed (Comaniciu et al., 2003) and to be discussed below.

Mean shift object tracking 2.2
The mean shift algorithm is a nonparametric method based on kernel density estimation (KDE) for mode seeking (Comaniciu et al., 2002).It has been widely applied for visual object tracking in video and image sequence successfully due to its efficiency and simplicity (Mena, 2003;Petrie, 2010;Yi et al., 2008).However, the original mean shift algorithm is also well known for its limitations of background clutter in radically symmetric kernel, incapability to handle scale change and large displacement of objects.To overcome the first limitation, Yilmaz propose an asymmetric level set kernel to exclude background pixels (Yilmaz, 2007).To overcome the second limitation, Collins (Collins, 2003) proposed to compute the bandwidth of the kernel in the scale space.A better alternation is to simultaneously estimates the position and scale (Yilmaz, 2007).

OBJECT-SHAPED KERNEL SCALE INVARIANT MEAN SHIFT
In mean shift tracking (Comaniciu et al., 2003), the appearance of the object is represented by its probability density function in a given feature space, such as weighted color histogram ĥ because of its weak independence of scaling and rotation, robustness to partial occlusion and low-computational cost.
Having the weighted color histogram of ˆm h and ˆc h generated from the object model and candidate region within kernel K in image I , each pixel i x in the candidate region is given a weight: . (1) Let the initial hypothesized position of the object candidate region be ˆold y , the new position be ˆnew y , the pixels inside the candidate region be   Then using the weights in Eq. ( 1), the mean shift vector ˆnew The Object-Shaped Kernel: 3.1 In traditional mean shift object tracking (Comaniciu et al., 2003), the kernel is commonly chosen as a primitive geometric shape and can't describe the object's shape accurately.The histogram calculated inside the kernel is not accurate for considering the non-object regions residing inside the kernel as a part of the object.
To overcome this problem, we propose an object-shaped kernel generated by accurate object shape mask.The object shape mask is given by the segmentation results.The centroid of the object is positioned at, any point   the angle the point makes with the x -axis.In a similar way, the maximal distance from c x in the direction i  is defined as the radius in i In this formalism, the density mode is sought by mean shift in spatial dimension using the Epanechnikov kernel where d c is the volume of unit d -dimensional sphere.In the mean shift procedure in spatial dimension, we set 2 d  .Compared to the level set kernel proposed by Yilmaz (Yilmaz, 2007), where the value of the kernel is the distance from the boundary of the mask and may not guarantee a convex and monotonically decreasing profile in some situation as shown in Figure 2(b), the object-shaped Epanechnikov kernel generated from Eq. ( 4) has a convex and monotonically decreasing profile.Thus strict convergence is ensured (Comaniciu et al., 2002).Moreover, the object-shaped kernel has a very simple derivative and the mean shift vector can be derived by Eq. ( 5) for the use of Epanechnikov kernel whose profile's derivative is uniform anywhere and therefore we obatianed: is defined based on the assumption that the incremental updates to each dimension is independent to each other.Using the kernel given in Eq. ( 7) and let   For object segmentation from video sequences, graph cutsbased video object segmentation approaches generally treat the segmentation results in the previous frame as the label of foreground in current frame to guide the segmentation (Boykov, Y. et al., 2006), i.e., the trimap in the th i frame  The graph cuts-based methods are hindered by the limitations: (1) It requires that the frame rate of the sequences is high enough to ensure the spatiotemporal consistency of the object appearance in consecutive frames (Bae et al., 2009).
Unfortunately, the OOIs undergo large displacement and scaling in the DMI sequences.Direct propagation of the object segmentation results may lay foreground labels on background pixels and vice-versa and cause error segmentation.( 2) Segment error will be propagated through the following sequence and there is no chance to correct the errors.
(3) Initializations of the graph cuts-based methods need to specify both the foreground and background by trivial sketches.A more flexible labelling and propagation strategy is desired for the OOI extraction through DMI sequences in MMS applications.

Generalized GrabCut for Video Object Segmentation 4.2
Inspired by the GrabCut approach (Rother et al., 2004) for segmenting static images, we propose a generalized GrabCut framework for video object segmentation.
In contrast to the graph cuts-based approaches which build trimap T , we construct a dual map , for the current th i frame, where U D and B D are pixels selected as unlabelled and background respectively, by inheriting the segmentation result 1 1 , 1 For the th i frame, the image consists of N pixels n z in RGB color space and the color distribution is modelled by two Gaussian mixture models (GMM), F Giand B Gi for the unlabelled pixels and background respectively.The GMMs are taken to be a full-covariance with K components (typically The Gibbs energy, which is formulated as the sum of region term R and boundary term B for segmentation now becomes The data term R is now defined, taking account of the color GMM models, as where is posterior of appearance distribution of the th i frame based on the segmentation results The boundary term B is computed using Euclidean distance in color space: The constant is taken to be the expectation over an image sample.
Having all the parameters defined, the binary segmentation α in the th i frame can be formulated as estimation of a global minimum and is solved by max-flow min-cut algorithm.
In the proposed scheme, the energy minimization in ( 13) is run and the color distribution F Gi and B Giin frame i are updated iteratively until convergence.We adopt morphological post-processing to eliminate the isolated segments and propagate F Gi and B Gi to next frame on-line.By these means, the generalized GrabCut approach has been endowed the ability to segment objects from dynamic image sequences.

EXPERIMENTAL RESULTS ON MMS DATA
In this section, we evaluate our OOI extraction method with real-world LFR image sequences collected in Calgary, Canada, by the VISAT TM MMS.The size of the frames is 1600×1238, and the frame rate is 1 fps.Only some most significant results are displayed here for the limit of the paper length.To further demonstrate the proposed approach.The MATLAB demoes and more experimental results for this paper are available at http://chwang.xmu.edu.cn/works/demo_ooi_extraction.htm.

5.1
To evaluating the performance of the proposed object-shaped kernel scale invariant mean shift algorithm, we have performed comparisons with the standard adaptive mean shift algorithm (Comaniciu et al., 2003) and a newest scale adaptive mean shift algorithm, SOAMST (Ning et al., 2011), which adaptively selects scale by compute the moment.In all experiments, the RGB color space was taken as feature space and quantized to 8×8×8 bins.The OOI is initialized by bounding box drawn by the users in the first frame and then automatically segmented by the GrabCut algorithm (Rother et al., 2004) to generate the object mask for our improved mean shift tracker.The accuracy is evaluated by the distance between the tracked object centroid and the ground truth.
Figure 4 shows the results of tracking a brown rectangular signboard, the tracking details are magnified in the left-top rectangle.Tracking accuracy result of the brown signboard is shown in Figure 5.It can be seen that our method outperforms the other two methods especially when the OOI is moving fast and changing its size dramatically.We have introduced a Kalman filter module and test its performance.As shown in Figure 10, the accuracy was about the same of that without a Kalman filter.However, in practical MMS applications, a more accurate initial state of the Kalman filter module can be estimated from the measurable GIS information, which is beyond the discussion of this paper.So the introduction of Kalman filter is essential and promising.

5.2
We obtain the candidate region of the OOI in current frame as the result of tracking.Then we segment the object in the candidate region using the proposed generalized GrabCut approach.We present some segmentation results of the traffic light, red guide board and brown guide board sequence in Figure 6.In order to perform objective evaluation of our approach, we first manually segmented the reference objects (or ground truths) for the test DMI sequences.The segmentation performance is measured by the precision, recall and F-score defined Eq. ( 14) and compared to graph cuts-based methods (Boykov, Y.Y. et al., 2001).We haven't performed matting as the authors do in GrabCut approach (Rother et al., 2004).1, 2, 3, our method performs better and more stable segmentation than the graph cuts-based methods.However, our method reports a better average recall and the two other measures differ very little.Moreover, the total interactions involved in our method are much less than in graph cuts-based methods.
In a word, our approach can achieve satisfactory tracking and segmentation results in most sequences.For some extreme cases, where the object undergoes full occlusion, our method may fail.However, introducing some prior geo information is a promising measure to solve this problem and will be researched in future.

CONCLUSIONS
In this paper, we propose a novel method to extract OOI in DMI sequence collected by MMSs.The method integrates objectshaped kernel scale invariant mean shift tracking and the generalized GrabCut segmentation in a unified framework and needs only one bounding box in the keyframe as initialization.
The proposed object-shaped kernel is applied to mean shift tracking with adaptive scale selection to keep the temporal consistency of the OOI through the DMI sequence.We generalize the GrabCut method to object segmentation from image sequence by propagating the distribution through sequence.The effectiveness and robustness of the proposed OOI extraction method for DMI image sequences is demonstrated by experiments on real-world DMI sequences collected by the VISAT TM MMS.

)Figure 2 .
Figure 2. Object and object-shaped kernel.(a) An irregularshaped object.(b) The level set kernel.(c) The proposed kernel.Object-Shaped Scale Invariant Mean Shift 3.2 based interactive image segmentation methods(Boykov, Y.Y. et al., 2001)  first generate a trimap { pixels selected by the user as either foreground or background respectively, and U T is the remaining set of unknown pixels.The segmentation is formulated as estimating the global minimum and solved by the max-flow min-cut algorithm.The segmentation result is a mask with pixel labelled as the foreground and the background defined as , partitioned once while the labels of the other pixels remain unchanged.

Figure 3 .
Figure 3. (a) The yellow brush pot as an instance of OOI, (b) OOI mask, (c) The trimap of the graph cuts-based method, white, gray, and black indicate foreground, the unlabelled and background respectively, and (d) The dual map of the proposed method.
for current frame.As shown in Figure 3(d), no foreground pixels are specified.Since there is no mandatory labelling of the foreground, the mislabelling error propagation can be avoided by setting the U D properly larger than original object mask.Although OOI in the DMI sequence changes their locations and sizes dramatically in consecutive frames, their appearance remain relatively invariant.Therefore the dual map U Di and B Di in frame i can inherit the segment

5
K  ).In order to deal with the GMM tractably, in the optimization

Figure 5 .
Figure 5. Tracking accuracy of the brown signboard.

Figure 6 .
Figure 6.OOI segmentation results for real-world DMI sequence in MMS.
The expectation is to observe the precision, recall and F-score values close to 1 for both measures.The average precision, recall and F-score are shown in

Table 1 .
Average precision, recall and F-score of the brown

Table 2 .
Average precision, recall and F-score of the traffic light sequence

Table 3 .
Average precision, recall and F-score of the red