ROAD ORTHOPHOTO / DTM GENERATION FROM MOBILE LASER SCANNING

This paper proposes a pipeline to produce road orthophoto and DTM from Mobile Laser Scanning (MLS). For the ortho, modern laser scanners provide a reflectance information allowing for high quality grayscale images, at a much finer resolution than aerial photography can offer. For DTM, MLS offers a much higher accuracy and density than aerial products. This increased precision and resolution leverages new applications for both ortho and DEM. The first task is to filter ground vs non ground, then an interpolation is conducted to build image tiles from the filtered points. Finally, multiple layers are registered and blended to allow for seamless fusion. Our proposed approach achieves high quality products and scaling up is demonstrated.


INTRODUCTION 1.1 CONTEXT
Orthophotography and Digital Terrain Models (DTM) are ubiquitous and very standard products in the aerial photogrammetry pipeline.Their applications are numerous in remote sensing, geographical information and earth observation sciences, mapping, urban planning,...These products are traditionally built from satellite and aerial imagery, with a recent trend towards UAVS to increase resolution at the cost of lower productivity.However, to study urban cores in fine details, these products are limited by their resolution, accuracy, and occlusions (mainly by trees and tall buildings).In this paper, we advocate that ortho and DTM can be built from Mobile Laser Scanning (MLS) to overcome some of these limitations.The possible applications of such products are: • Precise mapping of road marks (Hervieu et al., 2015), road limits (McElhinney et al., 2010) or curbs (El-Halawany et al., 2011) (Zhao and Yuan, 2012) (Hervieu and Soheilian, 2013b),...
• Road inventory (Pu et al., 2011) • Road surface modelling (Hervieu and Soheilian, 2013a) and quality analysis • Itinerary computations and public building accessibility for soft mobilities (disabled, strollers, ...) (Serna and Marcotegui, 2013) • Mobile mapping registration on aerial images (Tournaire et al., 2006) • Image based localization using ground landmarks (Qu et al., 2015) • Water flow simulations In some European countries in particular, recent legislation calls for a subdecimetric accuracy mapping of buried objects (water and gas pipes, wires, ...) because the lack of precise data cause accidents and delays in many public construction works.This mapping calls for a precise occlusion free orthophoto of the ground surface.

RELATED WORKS
To our knowledge, the problem of generating an orthophoto from the intensity of a laser scan has scarsely been investigated, but it is mainly a problem of dense interpolation from sparse points that we will discuss in this Section.On the other hand, the problem of extracting a DTM from laser scanning is widely studied, the main challenges being the registration of overlapping strips, the filtering of ground vs off ground points, and the interpolation required to produce a dense image from sparse unstructured points.
Registration of MLS is usually based on ICP which alternates making matches and finding the transformation that minimizes distance between these.They can be separated depending on the sought transformation: either the data is split into rectilinear blocks that are rigidly registered (Gressin et al., 2012a), or a a non rigid transformation is sought for, the popular choice being a translation linearly interpolated in time (Takai et al., 2013) (Monnier et al., 2013).The matches used for this registration are point-to-point or point-to-plane (or both), and matches are filtered based on a distance criterion that can be enhanced using local feature (Gressin et al., 2012b).
Ground vs off ground filtering is a classisal task in Aerial Laser Scanning (ALS) processing.On forested areas, the ability of ALS to traverse the trees and hit the ground is exploited by simply keeping the last echo and applying standard filtering techniques (Wack and Wimmer, 2002).On urban areas however, this is not sufficient as the terrain must be interpolated below the rooftops, so the ground points filtering can be done using more advanced filtering tools (Kraus and Pfeifer, 2001) or posed as a classification problem (Rottensteiner and Briese, 2002).From Mobile Laser Scanning (MLS), this extraction can be done based on elevation and gradient features (Zhao and Yuan, 2012), based on extracting large planes at a certain distance below the 3D trajectory of the laser scanner and analysing them (Pu et al., 2011) or with Robust Locally Weighted Regression (Nurunnabi et al., 2013).
Concerning the interpolation problem, the most popular approach in ALS is Kriging (Kraus and Pfeifer, 2001) as it has a strong statistical support, but other popular approaches such as Moving Least Squares and Inverse Distance Weighting (Shan and Toth, 2008) are often used for their simplicity.If the desired output is a Triangular Irregular Network (TIN), then a Delaunay Triangulation should be performed instead of interpolation (Xiaobo et al., 1999).For MLS, (Bitenc et al., 2011) propose a linear interpolation based on plane fitting with the purpose of error propagation, but this is only suited to their sandy coast monitoring application where the pixel size if very large (1m) compared to ours (4cm).In this paper, we chose a Poisson interpolation (Pérez et al., 2003) as our pixel size is very small and sampling density is very variable, with large holes from occlusions that need to be filled.

METHOD OVERVIEW
A major issue in producing a DTM is the filtering of on ground vs off ground points which will be done based on both a global height estimation and local planarity priors.This will be discussed in Section 2.. Then a Poisson interpolation method used for both ortho and DTM will be presented in Section 3..The nonrigid altimetric registration problem will be tackled in Section 4.. Then the tiles blending for multiple passages will be described in Section 5.. Results will be presented and discussed at each step of the pipeline (Sections 2. to 5.).Finally conclusions will be drawn in Section 6..The full pipeline is illustrated in Figure 1.

DATA PRESENTATION
The data used in this study is a Mobile Laser Scan (MLS) acquired by the mobile mapping system described in (own citation).The scan was performed in around 3 hours in a very dense urban environment, covering around 15 linear kilometres of city streets.The sensor is a RIEGL LMS-Q120i that was mounted transversally in order to scan a plane orthogonal to the trajectory.It rotates at 100 Hz and emits 3000 pulses per rotation, which corresponds to an angular resolution around 0.12 • .Each pulse can result in 0 to 8 echoes producing an average of 250 thousand points per second.The resulting scan is very anisotropic because while points are very dense along scanlines (a few mm on the road below the sensor), scanlines can be separated by several cms according to the vehicle speed (5cm at a typical acquisition speed of 5m/s = 18km/h).In addition to the (x, y, z) coordinates (in sensor space), the sensor records multiple information for each pulse (direction, time of emission) and echo (amplitude, range).The amplitude being dependant on the range, it is corrected into a relative reflectance.This is the ratio of the received power to the power that would be received from a white diffuse target at the same distance expressed in dB.The reflectance represents a range independent property of the target.
The Stereopolis mobile mapping system is also composed of an Applanix POS-LV220 georeferencing system combining D-GPS, an inertial unit and an odometer.However, the GPS masks that frequently occur in urban areas induce drifts that can reach one meter for a 2 minutes mask.This initial georeferencing is corrected based on tie points with aerial photography resulting in around 10cm standard deviation in planimetry, and 15cm in altimetry.
The laser sensor is calibrated in order to recover the transformation between the sensor and Applanix coordinate systems which allows to transform the (x, y, z) coordinates from sensor to a geographic coordinate system.The result, with a colormap on reflectance, is displayed in Figure 2.

GROUND FILTERING
In order to produce a Digital Terrain Model (DTM), the off-ground points need to be filtered off.This filtering can be performed either directly on the points cloud or in the image space in which we wish to represent our DTM.We chose an image based filtering for efficiency, so we first need to discuss how we produce images from our MLS.

TILES AND LAYERS
Our input for DTM/Orthophoto generation is a mobile mapping point cloud, which has several specificities: • The coverage of the studied area is much less uniform than with Aerial Laser Scanning (ALS): only the roads where the MMS has driven are covered.• The overlaps are very specific: they happen either at road intersections, or along road sections that the vehicle has covered more than once (which often happens as it is not avoidable when an extensive coverage of the roads is required).
As our aim is to generate raster based products (DTM and Orthophoto), we need to tile the area of study in a way that takes these specificities into account.First, we use 50m by 50m tiles which size is the order of the efficient range of the laser scan.This is a good compromise between the number of tiles and the unused area (area covered by a tile but not by the scan).Concerning the pixel resolution, we chose a 4cm pixel (1250 x 1250 px tiles) which is below our expected absolute accuracy (10cm), and a good compromise between storage requirement and accuracy.It is not easy to adapt the pixel size to the scan horizontal density as it is very variable according to the scene geometry and distance from sensor.It is also very anisotropic because points are very dense along scanlines (a few mm on the road below the sensor), but scanlines can be separated by several cms according to the vehicle speed (5cm at a typical acquisition speed of 5m/s = 18km/h).That latter consideration also advocates for a pixel size around 4cm (close to the distance between scanlines).A view of this tiling over the whole area of study is given in Figure 3.
As we need to convert to image space at one point of our pipeline anyways, we would like to do it as soon as possible such all the processing is done with image tools, which are generally faster that point cloud processing tools (in particular thanks to trivial neighbourhoods).So the first step of the DTM/Ortho production pipeline could be to project all the points from the laser scan in such tiles.However, we prefer to keep the various passages sepa-rated in order to allow for a fine altimetric registration (cf Section 4.), that is to generate not one image for each tile, but as many as the number of times that the MMS has acquired points over that tile.This is done based on time: points are projected in the order that they were acquired, and an active layer is created when a point projects on a tile where there is no active layer.After a certain time without points projecting in it, an active layer become inactive (and is stored to the disk as a complete layer).This way, if the acquisition comes back to the same tile, a new layer will be created.This process produces frequently 2 to 3 layers on some tiles, and up to 5 in our experiments.On our area of study, this process generated 1022 layers (1.6.10 9 pixels) on 412 tiles.
Finally, we need to detail how we convert the point cloud in a raster during the accumulation.For each pixel, if no point projects in it we store a "no-data" information defining a data mask M data .If one point or more projects in a pixel, we keep the lowest one P low (because our aim is to model the ground) and store in the pixel: • The z component of P low • The range of P low (distance between P low and the laser center) • The relative reflectance of the target at P low .
• The acquisition time of P low all of which will be used in latter steps.An illustration of this rasterization for the relative reflectance image is displayed in Figure 4(a).After this rasterization step, all the subsequent steps detailed in this paper are performed in image space (2.5D for DTM, 2D for the other features).

GROUND FILTERING
Now that the points are projected in independent layers, the main remaining issue is to define what is a ground point and what isn't.
A simple definition for the ground in urban areas is that it consists of points which lie on a locally planar and horizontal surface patch.However this definition does not tackle points on flat roof surfaces, so we also need either a building detection, or more simply a good local approximation of where the ground is.For a mobile laser scan (MLS), there is a simple such approximation that we describe in Section 2.2.1.For the horizontal planarity, we propose a simple filter in Section 2.2.2.

ELEVATION FILTERING
With MLS, we have an easy tip on where is the ground: the vehicle acquiring the point cloud is lying on it (more precisely on the road surface).Knowing the height hsensor of the laser sensor referential Rsensor above the ground, and the 3D points coordinates in Rsensor, we can simply filter out points according to their vertical component in Rsensor, by keeping points satisfying: where the parameter δ needs to be chosen according to the specificities of the studied area, and in particular the type of locally planar horizontal surfaces that might exists close to the ground.

LOCAL FILTERING
The previous filter ensures that the selected points are close enough to the road surface.However, urban scenes contain numerous objects that still need to filtered out in order to keep only the ground: mobile objects (cars, pedestrians, 2 wheelers,...), static objects (poles, trees, faades, street furniture, ...) that might still have a part in the previously selected points.To filter out these points, we use the strong prior that the ground is locally a plane close to horizontal by analysing two vertical cylindrical neighbourhoods with the same radius r cyl for each point P .The first cylinder C∞ is infinite in z while the second cylinder C δ 2 is limited to [Pz − δ2, Pz + δ2].If a sufficient proportion of points in C∞ are also in C δ 2 , this means that P belongs to a sufficiently horizontal planar surface so we keep it, else we remove it.Note that this filtering is computed from the z component of each layer, but also applied to the other component such that their interpolation will only be performed on ground points.

RESULTS
The ground points filtering described here gives very satisfactory results as the local filter achieves to remove most points on objects lying on the ground to keep only points on the ground itself.Concerning timings, this filtering is very fast to compute.It took a few minutes on our 1022 layers in single thread on an Intel Core i7 CPU at 3.33GHz.

POISSON INTERPOLATION
A DTM is a raster representation of a surface without overhangs.
The filtered point cloud from the previous section is sparse, in the sense that it does not cover all the pixels of the DTM, which calls for an interpolation of the height component.However, with mobile mapping, one cannot hope for an integral coverage of some area (defined by a bounding box for instance), but only of a limited strip along the trajectory of the mobile mapping system that acquired the data, which precise footprint depends on the scene and acquisition geometries.In order to interpolate between the points, the first question we have to answer is thus where do we want to interpolate ?We will answer explicitly to this question by creating a mask defining where we want to interpolate (Section 3.1), then explain how the interpolation is performed (Section 3.2).

INTERPOLATION MASK DEFINITION
Interpolation allows to fill holes and gaps in the data.The mask should be adapted to the application of the DTM.In our case, the targeted application was producing slope maps to assess the accessibility of public buildings to wheelchairs.Thus we need to smoothly interpolate the DTM everywhere on the side-walk, and in particular in its occlusion by cars, pedestrians,... On the other hand, we also want to remove isolated points that are useless for that purpose.This is why we chose to apply a succession of morphological dilation/erosions: • A dilation of 10px (40cm) to connect neighbouring points.
• A closure (erosion then dilation) of 30px (1.2m) to remove isolated points.Because of the previous dilation, it removes isolated point groups of size around 80cm.
• A dilation of 90px (3.6m) to fill occlusions by objects of the size of a car.
The result is called the interpolation mask Minter.Its boundary is very smooth and it extends 4m around the areas where the points density is sufficient (less than 40cms between two points).Optionnaly, if a database of the buildings' footprints is available on the area of study (such as a cadaster), it can be integrated in this mask to avoid extrapolating the ground inside the buildings.

INTERPOLATION
Poisson interpolation expresses simply the interpolation problem as that of finding a function u that approximates a function f as closely as possible where f is defined and is as smooth as possible where is is not, which is exactly the behaviour that we want for DTM generation.In our case, this is done by minimizing: where S f is the support of f (pixels where at least one point projects) and: and i, j are 4-connected} is the set of (4-connected) neighbours in Minter (computed in Section 3.1).
The solution to (2) is obtained by putting it in matrix form: where: • f is the (known) vector of data pixel values of size n data • u is the (unknown) vector of interpolated pixel values of size ninter • A data is the (n data , ninter) matrix of value ai,j = 1 if ui and fj correspond to the same pixel (else 0).It is created from the chosen indexation for pixels from the data and interpolation masks.
• G is the (n data , n neigh ) is the gradient matrix of term g i,(i,j) = 1, g j,(i,j) = −1 for all neighbours (i, j) ∈ N4(Minter) The minimizer of ( 3) is given by: In practice, we used the LDL t factorization from the Open source library Eigen, which takes around 20s per tile.Once computed, (4) can be calculated very efficiently for all data f that we want to interpolate (z, reflectance, range, time).

RESULTS
The data involved in interpolation (MLS projection in a tile, mask and Poisson interpolation in the mask) is shown in figure 4, where the reflectance is used to illustrate f (top) and u (bottom).The interpolation result for the other layers of the same tile are shown in Figure 5.Note that the same interpolation process is also applied to the z, range and time components (not shown here but used in the next Sections).
This step of the pipeline is the most time consuming because of the factorization.It is however very easy to parallelize as the computation is independent on every tile.The whole interpolation process (including mask generation and interpolation of z, reflectance, time and range) took around 2 hours on 10 threads an Intel Core i7 CPU at 3.33GHz.
A drawback however of the tiling is that we do not ensure continuity across tiles.This is however minor because the interpolation is based on the MLS, so if the MLS is continuous across the tile boundary, so will its interpolation.However, if an occlusion running across a tile boundary needs to be filled, continuity will not be ensured.For our application, we used the DTM to compute slopes (gradient) which is also tile based, so a discontinuity across boundary will not generate a high slope.If discontinuities are a problem for the desired application, then borders should be added to the tiles in order to enable a seamless blending.

NON RIGID ALTIMETRIC REGISTRATION
As explained in Section 2.1, we chose to split the tiles in layers according to the passages of the vehicle in order to allow for a fine registration.As explained in Section 1.4, the vertical abolute accuracy of our MLS is around 15cm.For our building accessibility application, 15 cm is not sufficient as blending tiles might induce a false 15cm step that cannot be climbed by a wheelchair resulting in an erroneous decision.This is why we propose a non rigid registration procedure inspired by (Monnier et al., 2013).We propose an adaptation and simplification of (Monnier et al., 2013) in order to register the layers in Z only (which is our main issue for DTM generation).The method relies on the definition of a non-rigid deformation defined by a translation given at nT control times Tc and linearly interpolated in between.The first difference is we only look for a translation in the z direction, so our unknowns are simply nT shifts in z: δ z c .

ICP BASED REGISTRATION
The registration is then done following the principle of ICP: 1. Match points between layers 2. Find the deformation that minimizes distance between matched points 3. Iterate 1 and 2 until some convergence criterion is met For the matching, the tile structures makes it obvious: for each tile, we match pixels from each pair of layers in their intersection.For robustness, we keep only the matches for which the difference in z is below the mean absolute difference over the tile intersection.This avoids the necessity to choose a threshold and decide how it will decrease and naturally adapts to all inputs.Thus a match m is characterized by the tile it belongs to, the two layers i and j involved in the tile, the pixel p on which it occurs and the values zi(p), zj(p) and ti(p), tj(p) of the DTM and time from the two layers on that pixel p, resulting from the interpolation of Section 3..Once the matches are selected, we minimize: where δ z is the linear interpolation of the translation in z: and c − (t) and c + (t) are the indices of the control times before and after t. γ is a rigidity parameter reflecting the confidence we have in the shape of the trajectory given by the georeferencing system.Equation ( 5) can be written in matrix form: where: • z is the vector of shifts in z δ z i • M is the (nT ,n match ) matches matrix which non null terms are: for each match m.
• ∆ z is the vector of term zi − zj for each match.
The minimum of Equation ( 6) is given by: Finally, for the stopping criterion, we simply chose a 1mm threshold on the average |δ z i | which is more than sufficient for our purpose and is usually reached in less than 5 iterations.

RESULTS
In our experiments, we take one control time every second for a 3 hours acquisition such that nT ≈ 10 4 .Computing (7) requires inverting a sparse symmetric matrix of size nT , which takes much less than a second using the Open source library Eigen.The most time consuming step is in fact the matching because we need two passes for each layer (one to compute the average and one to compute the matches), which can be optimized by subsampling, yielding a total computation time for the registration of 10 minutes (for one ICP iteration) for the 1022 layers in single thread on an Intel Core i7 CPU at 3.33GHz.
A close-up on slices of two DTM layer is shown before and after registration on Figure 6.On this example, the registration reduces Figure 6: Registration result for two DTM layers (one textured and one in white wireframe).Top: before registration.Bottom: after registration.The scale is given by the step of the wireframe (4cm) the shift in Z from 30cm to below 2cm.In average, over the whole 20kms of the acquisition, our non-rigid registration lowers the average shift in Z from 16cm to 4.3cm in the first iteration and 3.7cm in the second and ends at 3.5cm at iteration 4, for a total computation time of 40 minutes.

EXPONENTIAL BLENDING
Once the layers are registered, they can be blended in order to produce the final tiles.That blending needs to be adapted to the specific geometry of a MLS.In particular, we want smooth transitions between layers, and we want to give more weight to the points which are the closest to the sensor (because the resolution, precision and reflectance information are better for points closer to the sensor).To fulfil these requirements, we found that a good choice was an exponential weighting in the range, which is why we interpolate it in Section 3.: and the final z and reflectance values at each pixel are a simple weighted means over all layers defined at this pixel.The φ parameter controls the rate of decay of the weights so a lower φ implies smoother transitions, at the cost of sharpness: because registration is not perfect (10cm=2.5pxplanimetric accuracy) blended areas look less sharp.In practice, we chose φ = 1m ensuring a good compromise between transition smoothness and sharpness.

RESULTS
A results of this blending procedure for reflectance is given on Figure 7.We see the vehicle trajectories as highlights on the ground because the road has a specular component that makes the backscattered intensity higher for near orthogonal incidence angles.Despite multiple crossing or overlapping trajectories in the same area, the blending achieves a smooth result without visible seams or blending artefacts.This result has a much better quality than an aerial orthophoto and ability to see under the vegetation as illustrated in Figure 8.The laser sensor used emits in near infra-red, so the resulting orthoimage can be seen as an infrared image that is very independent of the lighting conditions (no shadow) because the sensor is active.Another possible output of this pipeline is a DTM surface textured with the reflectance ortho as shown in Figure 9.Because both information come from the same data (the MLS) geometry and radiometry are perfectly aligned, resulting in a high quality textured ground model.
Blending with this procedure is extremely fast and took below one minute to generate 412 final tiles (DTM and ortho) from the 1022 layers in single thread on an Intel Core i7 CPU at 3.33GHz.Overall, the whole Ortho/DTM generation took around 3 hours for a 3 hours acquisition, which shows that they can be produced as fast as the data is acquired on a single laptop which is very promising for a possible industrialization.

CONCLUSIONS AND FUTURE WORK
The DTM produced by our pipeline has been successfully integrated in a GIS that exploited this information to assess accessibility of public buildings to the disabled and provide an itinerary computation service for soft mobilities (wheelchairs, strollers,...) The reflectance image was used for visualization only in this scenario.However, we believe that this highly accurate DTM/ortho from MLS can be used in a variety of other applications such as precise image based localization, road surface quality assessment and defaults detection, water flow simulations, road surface modelling, curb detection and road marks extraction.
In the future, we plan on exploiting all the information from the lidar scan (not only ground points) in order to produce a more informative "lidar orthophoto" where all vertical elements (walls, Figure 9: Digital Terrain Model textured with the reflectance orthophoto poles, ...) would appear with color codes to distinguish them, and with transparency to handle multiple layers.The result would be an orthophoto where all the scanned elements would appear even if not visible from above.

Figure 1 :
Figure 1: Full ortho/DTM production pipeline from MLS. Framed rectangles are processing steps, unframed rectangles are exchanged data types.The pipeline input/output is in yellow.

Figure 2 :
Figure 2: A view of the Mobile Laser Scan (MLS) used in this study, coloured with reflectance.

Figure 3 :
Figure 3: Mobile laser scan reflectance projected into the 412 tiles of our area of study.A single tile is shown on Figure 4 (top) Figure 4: Illustration of Poisson interpolation on the reflectance component

Figure 5 :
Figure 5: Illustration of Poisson interpolation on the other two layers of the tile of Figure 4 4.1 NON RIGID DEFORMATION MODEL

Figure 7 :
Figure 7: Blending result for reflectance image on an area around the tile which layers are displayed in Figures 4(b) and 5 and a close up near a trajectory crossing.