ROBUST LOCALLY WEIGHTED REGRESSION FOR GROUND SURFACE EXTRACTION IN MOBILE LASER SCANNING 3 D DATA

A new robust way for ground surface extraction from mobile laser scanning 3D point cloud data is proposed in this paper. Fitting polynomials along 2D/3D points is one of the well-known methods for filtering ground points, but it is evident that unorganized point clouds consist of multiple complex structures by nature so it is not suitable for fitting a parametric global model. The aim of this research is to develop and implement an algorithm to classify ground and non-ground points based on statistically robust locally weighted regression which fits a regression surface (line in 2D) by fitting without any predefined global functional relation among the variables of interest. Afterwards, the z (elevation)-values are robustly down weighted based on the residuals for the fitted points. The new set of down weighted z-values along with x (or y) values are used to get a new fit of the (lower) surface (line). The process of fitting and down-weighting continues until the difference between two consecutive fits is insignificant. Then the final fit represents the ground level of the given point cloud and the ground surface points can be extracted. The performance of the new method has been demonstrated through vehicle based mobile laser scanning 3D point cloud data from urban areas which include different problematic objects such as short walls, large buildings, electric poles, sign posts and cars. The method has potential in areas like building/construction footprint determination, 3D city modelling, corridor mapping and asset management.


INTRODUCTION
Terrestrial Mobile Laser Scanning (TMLS) has the advantage of 3D data acquisition of the environment and physical objects in the vicinity of the mapping vehicle.It collects reasonably accurate geo-referenced point sets (clouds) quickly and safely.This type of data has different applications in many fields such as autonomous navigation, change detection, three dimensional (3D) city modelling, Digital Terrain or Surface or Elevation Modelling (DTM/DSM/DEM), environmental monitoring, corridor mapping and constructions management (Garouani and Alobeid 2013, Hebert and Vandapel 2003, Pu et al. 2011, Yang et al. 2013).A laser scanning point cloud consists of both ground and non-ground points and many application areas require classifying points into ground and non-ground points.For example, in corridor mapping, road assets management and for segmentation of urban street scenes (Pu et al. 2011) separating ground points is necessary and useful for point cloud post processing.Extracting the ground surface also helps to minimize and make the remaining analysis (e.g.region growing, segmentation and feature extraction) easy, time and cost saving.Accordingly, many methods have been developed for DTM/DSM/DEM, filtering (removing non ground points from a point cloud) etc. for years, mainly introduced from multidisciplinary research including statistics, computer vision, pattern recognition, photogrammetry and remote sensing (e.g.Bartles et al. 2006, Zhou et al. 2012).To meet the challenges associated with classification/ground filtering (separating ground points) methods have been introduced over the last two decades (e.g.Belton and Bae 2010, Brovelli et al. 2004, Crosilla et al. 2011, Lindenberger 1993, Kraus and Pfeifer 1998, Nasser et al. 2005).A comparative analysis among different methods conducted by the ISPRS Working Group (WGIII/3; Sithole and Vosselman 2004) show that no method is sufficiently good for every dataset.Most of them do not perform well in the presence of multiple structures like ramps, sharp edges, steep slopes and isolated ground points.Hence there is much interest in developing a new general method to get better results.
It is known that parametric polynomials estimate parameters that fit the data best for a pre-specified family of functions.In many cases, this method yields easily interpretable models that do a good job of explaining the variation in the data, but it is not always true.The chosen family of functions can be overlyrestrictive for some types of data (Avery 2012).Fan and Gijbels (1996) show even a 4th-order polynomial fails to give visually satisfying fits.As an alternative, higher order fits may be attempted, but this may leads to numerical instability.As a remedy, the Locally Weighted Regression (LWR) approach can be used.We choose LWR because it satisfies many desirable statistical properties (most importantly, it adapts well to bias problems at boundaries and in regions of high curvature; Cleveland and Loader, 1996).In this paper, we propose a new algorithm based on Robust LWR (RLWR).Local fitting by means of a locally weighted interpolation function based on local neighbourhood finds fine details about point cloud by smoothing.Fitting within a local neighbourhood considers local point density accurately, which is not always possible for global model (polynomial) fitting for the whole dataset.We know that significant point density variation is typical in laser scanner point cloud data.In particular, for steep slopes, this type of global parametric model fitting may lead to misclassification results and local fitting typically gives better results.
The proposed algorithm is an iterative process, where for every iteration a predefined robust weight function imposed according to the residual values, which are the deviations between the current fits and the z values.Inclusion of a robust weight function in the proposed algorithm makes estimates robust and down-weights the height error of the points w.r.t. the fit for the intermediate steps in a robust fashion.Moreover, it reduces the outliers' influence on the fits.The remaining arrangement of the paper is as follows.Section 2 gives a short review of the relevant literature.Section 3 details the proposed algorithm along with related principles and methods.In Section 4 the algorithm is demonstrated and evaluated using three sets of real terrestrial mobile laser scanning data.The conclusion follows in Section 5.

A SHORT LITERATURE REVIEW
In this section we review some well-known methods that are relevant here and show historical advancement in this area.The methods that are developed over the years can be categorized in general as follows.
In mathematical morphologic filtering (Lindenberger 1993), initially, a rough ground surface is extracted by using a seed point that is the lowest based on the assumption that the lowest point belongs to the ground.Then the rough terrain is refined with an auto-regression process.This algorithm is vulnerable to the size of the structure element (Shan and Sampath, 2005).Later Kilian et al. (1996) used different morphologic operators and Vosselman (2000) developed a slope based filter incorporating the idea of height difference between two points as a function of the distance between the points.It is known that morphologic filtering is efficient for point cloud data in areas of small elevation difference.Zaksek and Pfeifer (2004) noticed that although the algorithm is effective in areas with small differences it is not so good in areas with steep slopes.Segmentation and clustering are processes for separating point clouds into homogeneous regions.This approach classifies points based on local geometrical relations like height, slope or curvature in a certain neighbourhood (Sithole and Vosselman 2005).Tovari and Pfeifer (2005) proposed a two-step segmentation algorithm that starts from a seed point for region growing that examines k neighbourhood points to see whether they fulfil certain criteria, and then uses robust interpolation for point groups.The authors require more information about segmentation parameters and explicit break-line information to get more accurate filter results.Edge based clustering is introduced by Brovelli et al. (2004).This detects edges by using a threshold of the gradient.Points inside the closed edges are considered as the object points and the rest are considered as the terrain points.Robust interpolation (Kraus and Pfeifer 1998) is an iterative process based on linear prediction.It uses the concept of new height to each measured point using a local interpolation.Zaksek and Pfeifer (2004) claim that a robust interpolation method is more efficient than morphologic filtering in steep slopes covered by forest.Akel et al. (2007) propose an algorithm based on orthogonal polynomials for extracting terrain points from LiDAR data.We know higher order fits may lead to numerical instability (Fan and Gijbels 1996).Skewness balancing introduced by Bartles et al. ( 2006) is mainly a segmentation algorithm based on the central limit theorem where the statistical measure Skewness is chosen to describe the characteristics of the point cloud distribution and has been used as a termination criterion in a segmentation algorithm.Later this algorithm has been developed combining with the Kurtosis measure by others (Crosilla et al. 2011).

PROPOSED ALGORITHM
The ground filtering algorithm proposed in this section for ground surface points extraction is a robust interpolation method.It couples locally weighted regression and the robustification of weighted regression.It works as a classification method to distinguish points into ground (terrain) and non-ground points (buildings, trees, walls etc.).

Locally Weighted Regression
Local regression is a nonparametric approach introduced in the statistical literature in the late 1970s (Cleveland 1979) and later developed by many others (e.g.Jacoby 2000, Loader 2004).It is used to model regression functions or surfaces between explanatory (independent) variable(s) and response (dependent) variable without any prior specified functional relation between the variables.Locally weighted regression is usually termed 'lowess' (LOcally WEighted Scatterplot Smoother) or 'loess'.It is a procedure in which a regression surface is determined by fitting parametric functions locally in the space of the independent variables using weighted least squares in a moving fashion.This is similar to the way that a time series is smoothed by moving averages (Cleveland and Grosse 1991).Let y i and x i = (x i1 , x i2 ,…, x ip ), i=1,2,…,n be the measurements of dependent and independent variables respectively.Assume that the dataset is modelled as where are independent and normally distributed with mean 0 and variance , and is a smooth function of x i .LWR gives an estimate (x i ) at any value of x i in the space of independent variables.LWR is nonparametric in the sense that it does not specify the functional form of the whole dataset and no specific assumption is made globally for (x) but locally around a point x i .We can assume that (x) can be well approximated by a member of a simple class of parametric functions (e.g. according to Taylor's theorem, any differentiable function can be approximated locally by a straight line).To estimate (x) at a point x i , LWR uses a local neighbourhood (N(x)) of k ( ) observations in the x space which are closest to x i .A smoothing parameter ( ) determines the size of k, which gives the proportion of points that is to be used in each neighbourhood (local regression).A larger local neighbourhood (i.e.larger ) makes the fit smoother.Every point in the local neighbourhood is weighted according to its distance to the interest point x i .Alternatively, a local neighbourhood can also be considered as a bandwidth (fixed distance) h (x), and a smoothing window may be used for fitting a point x i .If the same number of observations is on either side of the nearest point, the weight function is symmetric, otherwise it is asymmetric.A linear or non-linear polynomial (e.g.quadratic) function of the independent variables can be used to fit the model using Weighted Least Squares (WLS) method.If locally quadratic fitting is used, the fitting variables are the independent variable(s), their squares, and their cross-products.Locally quadratic fitting tends to perform better than linear fitting where the regression surface has substantial curvature (Cleveland and Devlin 1988) where is the distance between x i and x j in x-space.The value of is a maximum for the point closest to x i and becomes 0 for the k th nearest x j to x i .Points that are too far away with 0 weights will be classified as outliers and deemed useless to the analysis.Figure 1 depicts the shape of the tricube weight function.
Figure 1.Tricube and bisquare weight functions Finally, the estimates of the parameters of Eq. ( 1) are the values of the parameters that minimize . (3) The coefficients from each local neighbourhood are used to estimate the fitted values at x i , (x i ).Then the ordered pairs of give the fitted regression line for the whole dataset.Figure 2 gives the relevant ideas about the fitting parameters in LWR.

Robustification of Locally Weighted Regression
Like classical regression, LWR may be strongly influenced by outliers because of its least squares nature and hence can give inaccurate non robust results.The problems of outliers are compounded by the fact that the local regressions typically involve a subset of the complete dataset.Therefore, any erroneous data point will compromise a significant proportion of the points used in the local estimation and their degree of influence may cause false estimates (Jacoby 2000).To avoid the outlier effects and to get a robust fitted model we can use a robust weight for each data point in the neighbourhood.The well-known bisquare weight function can be used for this purpose, which is defined based on the residuals as: where , MAD is the median of the , and . (5) The shape of the bisquare weight is shown in Figure 1.To estimate the new set of Robust LWR (RLWR) coefficients, the bisquare weight function is used, and the following function is minimized: .
The newly estimated coefficients are used to obtain a new set of fitted values of (x i ).This robustness steps are repeated until the values of the estimated coefficients converge.

Implementation
The laser scanned point clouds considered in this paper are acquired along transport corridors using vehicle mounted laser scanners.In such a case, the long dataset is typically sliced into manageable "stripes" for processing and then the results merged.Ground surfaces such as the road and pavements are usually considered as the lowest features locally.That means, ground points can be defined as the points on the lowest, smooth, nominally horizontal surface (Belton and Bae 2010).
Based on this important property our algorithm proceeds to find the lowest level of the respective local region for every point in a stripe.A local region (neighbourhood) is defined for every point in a stripe.Searching the local neighbourhood for a given point in an unstructured point cloud is not trivial.Two wellknown local neighbourhood determination methods are Fixed Distance Neighbourhood (FDN) and k Nearest Neighbourhood (kNN).It is known that point density variation may misrepresent the real shape of a surface; hence, we consider kNN to avoid the problem with FDN for point density variation.This is a general phenomenon because of the movement of the data acquisition vehicle.For each stripe, the algorithm used here processes the two dimensional orthogonal profiles X-Z and Y-Z.The method is performed by iterating over two main steps.In the first step, robust locally weighted regression is used to get a robust polynomial fit for the whole stripe.We use locally quadratic fitting for every local neighbourhood of size k.The second step combines four related tasks as follows.
Step (i): calculation of residuals (differences between the z-values and the z-values from the current robust fit).
Step (ii): classification of points into two categories: points above the fitted RLWR line and points on or below the RWLR function (line).
Step (iii): bisquare robust weight function (Eq.4) used to down-weight the z-values of the points which are above the polynomial function, while the rest of the points are given weight 1 (i.e. points on or beneath the fitted line will be unchanged).If, in any case, the value of a fitted z (after down-weighting) becomes less than the lowest z-value of the respective neighbourhood then the new z-value is replaced by the lowest one to make it meaningful, since in a local neighbourhood the lowest features are generally regarded as the ground surface.
Step (iv): the new set of z-values is used to get the next RLWR polynomial function.The above two steps will be continued until the difference between the two Root Mean Squared Errors (RMSE) from the two latest consecutive fitted polynomials is insignificant (Δ, a very small value).We consider Δ=0.005 in our experiments.The final RLWR fit is considered as the lowest (ground) level for the concerned stripe and the points between a band created by the lowest level and lowest level added with a predefined threshold (based on the data examined) are considered as ground surface points from the profile.Finally, common points that are identified as ground points from the results of X-Z and Y-Z profiles are classified as the ground points for the stripe.Note that the threshold values for X-Z and Y-Z may vary because the X and Y axis measures different directions (for example in the case of mobile mapping through road corridors, the Y axis is the direction along the road and the X axis is vertical direction w.r.t. the road) in the point cloud.Therefore, the thresholds for a X-Z stripe depend on the difference between the points from the two opposite sides of the road and the threshold for Y-Z depends on the difference between the points of the two most distant positions on the road.Hence a smaller stripe has the advantage of enabling the fixing of the thresholds accurately.The algorithm is summarized in Figure 3.

EXPERIMENS, EVALUATION AND DISCUSSION
In this section, the proposed algorithm is demonstrated and evaluated through experiments on three datasets.We consider datasets consist of many different types of complex objects on and close to the road in urban areas.We assess the results visually and compare them with those for a recently proposed region growing based robust segmentation algorithm (Nurunnabi et al. 2012), and the necessary saliency features (e. g. normal and curvature) used in the segmentation algorithm are estimated using the robust method in Nurunnabi et al. (2013).
The segmentation algorithm begins the region growing with a seed point that has the least curvature value in the dataset and grows regions using surface point criteria (coherence and proximity) in a local neighbourhood, NP i of size k of the i th seed point p i .The algorithm uses Orthogonal Distance (OD) for the distance of point p i to its best-fit-plane, Euclidian Distance (ED) between the seed point p i and one of its neighbours p j in NP i , and the angle difference (θ) between the seed point and its neighbours.The angle θ is defined as: where 1 n and 2 n are the two unit normals for the seed point (p i ) and one of its neighbours (p j ).The robust saliency features estimation method (Nurunnabi et al. 2013) introduces a Mahalanobis type robust distance for finding outliers in a local neighbourhood, in combination with the ideas of orthogonal distance from a point to the best-fit-plane and local surface point consistency to get Maximum Consistency with Minimum Distance (MCMD).
Dataset 1.The first dataset (Figure 4 (a)) of 9,373 points consists a tree, part of a road side wall, part of a roof that enters into the tree, and road surfaces.We perform RLWR for the two bi-dimensional X-Z and Y-Z profiles.We use local quadratic fitting and the weight functions for every point within local neighbourhoods of size 300.We calculate residuals and perform the down-weighting based on bisquare robust weight to reduce the influences of off-terrain points.The iteration process continues until the difference between two Root Mean Squared Errors (diff RMSE) from two consecutive fits is < 0.005.Only after 6 iterations we get the ground level (magenta line; We compare our results with the above mentioned robust segmentation algorithm.We set the required parameters to be k=50, θ th =10°, and minimum region size R min =0.The segmentation results are in Figure 4 (g).Points in the segments (I), (II) and (III) in Figure 4 (g) are clearly visible and can be considered as ground points.In table 1, we see 1,901 points are identified as ground points from the segmentation method, which is the 98.24% similar to the ground surface points from the proposed algorithm.Besides robustness, the main advantage of the proposed algorithm is: it takes only 32.49 seconds which is 16 times faster than the time (503.79sec.) for the segmentation algorithm for the required filtering task.
Dataset 2. Meng et al. (2010) points that errors are mainly found in difficult to recognize features, such as bushes, short walls, and on the boundaries of the ground and non-ground objects.Moreover, it is more difficult to identify ground points in an area covered by dense urban features, such as power poles, flags and cars.Our 2 nd dataset (Figure 5(a)) of 104,301 points has been taken in such an urban area where short walls, a car, a power pole and sign posts are present.
We perform RLWR for the two X-Z and Y-Z profiles.We fit quadratic local polynomial and use weight functions for every point within their local neighbourhood of size 500.We calculate residuals and perform the down-weighting based on the bisquare robust weight function as for the previous experiment to reduce the influences of off-terrain points.The iteration process terminates when the difference between the two root mean squares from two consecutive fits is < 0.005.After only 5 and 6 iterations for X-Z and Y-Z profiles respectively we get the ground levels.Points within 0.35 and 1.35 meters vertical (Z) distance for X-Z and Y-Z respectively from the estimated ground level are treated as ground surface points.The common points (98,917 points) from X-Z and Y-Z profiles shown in grey colour in Figure 5(b) are finally extracted as the ground surface points.To get the ground surface from the segmentation algorithm, we set the required parameters to be the same as for Dataset 1. Figure 5 (c) shows the segmentation results.Segmentation extracts the ground surface as 98,554 points.All of them match the ground points from the proposed method: 99.63% of the ground points of the RLWR method.We see RLWR takes only 801.26 sec, which is 11 times less than the segmentation method to classify the Dataset 2. All the times are for non-optimized MATLAB @ code.Rates of times are of significance, not the actual times.Dataset 3. Our 3 rd dataset is considered as a large dataset of 1,703,315 points (Figure 6); and consists of large trees, buildings, small walls, signposts, power poles and different types of complex objects.We processed 10 stripes along the Yaxis.Figures 6 (a

CONCLUSIONS
The locally weighted regression (LWR) based proposed statistically robust approach can extract ground surfaces in urban areas.Most urban features such as complex large buildings, large trees, short walls, sign posts, power poles, traffic signals, vehicles are efficiently separated from the ground surfaces.Although the method is an iterative process using local weights it performs its task with a very low number of iterations that minimizes its time.Therefore the method is fast and applying the proposed ground surface extraction technique the method can reduce the cost of any required postprocessing tasks.Moreover, our algorithm depends on only a few parameters that can be learned automatically.Quality assessment based on comparing with a recently introduced robust segmentation algorithm shows highly (> 98%) correct classification rate of the ground surface points.The method is ISPRS Annals of the Photogrammetry, Remote Sensing and Spatial Information Sciences, Volume II-5/W2, 2013 ISPRS Workshop Laser Scanning 2013, 11 -13 November 2013, Antalya, Turkey very useful to exclude majority of non-ground points while it is not necessary to filter out points belonging to small vertical surfaces like road kerbs.Further work will be concerned with the development of an automatic process for ground surface extraction and for more specific classification and recognition of non-ground objects.
. The local parametric function should be chosen to produce an estimate that is sufficiently smooth without distorting the underlying pattern of the data.LWR uses a weight function for the least squares fit.A common function is the tricube weight function, defined as: ISPRS Annals of the Photogrammetry, Remote Sensing and Spatial Information Sciences, Volume II-5/W2, 2013 ISPRS Workshop Laser Scanning 2013, 11 -13 November 2013, Antalya, Turkey ,

Figure 3 .
Figure 3. Proposed algorithm; X-Z profile Figure 4. (a) Dataset 1, (b) RLWR fits in X-Z profile (c) RLWR fits in Y-Z profile (d) classification of ground and non-ground points for X-Z profile (e) classification of ground and nonground points for Y-Z profile (f) final ground and non-ground surface points from the proposed algorithm (g) segmentation results (a) Dataset 2 (b) ground and non-ground surface points from the proposed algorithm (g) segmentation results ) and (b) from two different views show that the proposed LWR based robust method efficiently classifies ground and non-ground surface points in areas covered by dense urban features.Dataset 3, Ground and non-ground surface points from the proposed algorithm (a) front view (b) back view

Table 1 .
Results from RLWR and segmentation algorithms