ADVANCED DTM GENERATION FROM VERY HIGH RESOLUTION SATELLITE STEREO IMAGES

This work proposes a simple filtering approach that can be applied to digital surface models in order to extract digital terrain models. The method focusses on robustness and computational efficiency and is in particular tailored to filter DSMs that are extracted from satellite stereo images. It represents an evolution of an existing DTM generation method and includes distinct advancement through the integration of multi-directional processing as well as slope dependent filtering, thus denoted “MSD filtering”. The DTM generation workflow is fully automatic and requires no user interaction. Exemplary results are presented for a DSM generated from a Pléiades tri-stereo image data set. Qualitative and quantitative evaluations with respect to highly accurate reference LiDAR data confirm the effectiveness of the proposed algorithm.


INTRODUCTION
For many remote sensing applications it is beneficial or even mandatory to have a digital terrain model (DTM) in addition to a digital surface model (DSM) for the region of interest.Having information of object and ground height allows to generate a normalized digital surface model (nDSM), representing the relative height above ground.Such products further can be employed to calculate e.g. the forest height and thus its biomass (Hyyppä et al., 2000) or the height of buildings and other manmade structures (Yu et al., 2010;Steinnocher et al., 2014).
While it is a standard procedure to extract a DTM from airborne LiDAR data, this is not the case for DSMs derived from satellite stereo data.With upcoming novel satellites holding highresolution optical sensors, which provide the ability to capture stereo or even tri-stereo imagery, the question arises if stereobased DSMs could replace LiDAR data for certain applications (Durand et al., 2013;Jacobsen, 2013).However, when relative height information is needed, a suitable algorithm that is able to generate a DTM purely from the DSM needs to be available.This work aims to present a DTM extraction approach that is especially suited for DSMs that are generated by a state-of-theart photogrammetric workflow from very-high-resolution satellite stereo or tri-stereo images.One constraint of such DSMs is that 3D breaklines in such DSMs are not always clearly defined, which can be traced back to occluded areas that cannot be reconstructed.The underlying idea is to extend the algorithm of (Meng et al., 2009), which is designed for LiDAR data, to be slope dependent and to be really multi-directional, i.e. 8-directional to span the 2D image space.A focus nonetheless is put onto simplicity, robustness and computational efficiency to follow integration needs into automatic end-to-end workflows.
Tests are performed with DSMs with 1 m spacing derived from tri-stereo Pléiades images.The special interest in using Pléiades data comes from the fact that this sensor is able to acquire tri-stereo panchromatic images in one single over flight (single pass along track stereo capacity) with 0.7 m GSD at nadir (Astrium, 2012).Several works show that it is possible to derive highly accurate DSMs from such stereo or tri-stereo images (Stumpf et al., 2014;Perko et al., 2014;Berthier et al., 2014).However, the presented method is not limited to Pléiades-based DSMs.Images of any optical sensors which provide a ground sampling distance (GSD) in the range of 0.5 to 1.0 meters may be utilized, with the same parameter settings (e.g.Ikonos, GeoEye, WorldView, Pléiades).For DSMs with lower or higher resolution the processing parameters very likely have to be appropriately tuned to achieve optimal filtering results.

STATE-OF-THE-ART
In the literature DTM generation approaches are described for different kind of input data.Most of them are designated for airborne LiDAR data, some for DSMs extracted from highresolution airborne optical images and from spaceborne optical imagery.A general comparison between photogrammetry and airborne laser scanning can be found in (Baltsavias, 1999).
Airborne LiDAR Data: There are many publications on DTM generation from LiDAR data, where first and last pulse data exists.Good review papers are (Sithole and Vosselman 2004;Meng et al., 2010).In regions of vegetation the first pulse measurement corresponds to the top height of the vegetation while the last pulse is reflected from bare-earth, thus defining the terrain height.Therefore, shrubs, trees and whole forests can be easily filtered out based upon LiDAR full-pulse data.Manmade objects like buildings, however, are present in both LiDAR measurements and still have to be filtered out by appropriate algorithms to get a correct DTM.Since LiDAR provides very accurate height measurements, 3D breaklines are in general very well preserved in LiDAR DSMs.When moving along a 1D profile from street level to a building a sharp steep height jump is observed, which can be used as a distinct feature for the DTM generation process (cf.e.g.(Axelsson, 2000)).One representative algorithm is described in (Meng et al., 2009) which will serve as the baseline for the algorithmic evolution presented in this paper.(Krauss and Pfeifer, 2001) introduced a method based on hierarchical robust linear prediction employing LiDAR data.
Airborne Optical Data: For very high-resolution airborne cameras (Wiechert and Gruber, 2010) report on a method developed for UltraCam images (Leberl et al., 2003).They perform a classification of the ortho images into 15 classes, both using texture and height information.The classification is then employed within the DTM generation process.The main difference to spaceborne data is that the underlying DSMs are of significantly higher resolution and of higher accuracy such that an image classification also works very robust -which would not be the case for spaceborne data.(Unger et al., 2009) present a variational approach which uses a DSM as single input.The basic concept is to extract a very smooth surface using a strong regularization weight within a variational formulation.This over-smoothed surface is used to determine potential points on ground (called "detection mask") which are then used to interpolate a DTM by employing the same variational formulation as before.However, problems occur on large buildings that are detected as belonging to the ground.Thus, the authors proposed to add a segmentation based on maximally stable extremal regions and a manual interaction step.
Spaceborne Optical Data: For satellite based stereo DSMs (Krauss et al., 2011) evaluated three DTM generation approaches based on morphology, geodesic dilation and steep edge detection on simulated synthetic urban scenes.(Arefi et al., 2011) presented an approach based on iterative geodesic reconstruction tested on Cartosat-1 stereo images.(Tian et al., 2014) reported on DTM generation in forested regions also from Cartosat-1 stereo images by classifying the panchromatic image into semantic classes, like buildings, low forest, high forest or ground, and applying different filtering on DSM regions according to the class.The proprietary commercial software PCI Geomatica 2013 employs algorithms based on filtering.First, minimal and maximal values are determined within a given spatial extent (e.g. 100 x 100 square meters).Second, a moving polynomial function is fitted to these values, directly yielding a DTM.Unfortunately, more details on this algorithm are not available.

METHOD
First, we sum up the problem statement.Second, we give a recap on DTM generation of the reference method and describe its main drawbacks.Third, our multi-directional slope dependent (MSD) DTM generation approach is explained in detail.

Problem Statement
When DSMs are derived from spaceborne optical stereo images, only one height measurement exists per discretized location.In addition, sharp 3D breaklines are often modelled as smooth transitions, especially if the gap next to the step edge is not visible in one of the two stereo images due to occlusions.Depending on the viewing angles of the satellite during acquisition a multitude of breaklines are only observed from one direction.As the Plèiades and other satellites are also able to change their view angle in across track direction, in many stereo or tri-stereo images one side of each building is never visible.Thus, it is inevitable that many 3D breaklines are modelled as more or less smooth transitions from ground level to building level.This aspect poses a difficulty and causes LiDAR based DTM generation algorithms to fail on such spaceborne DSMs.

Recap on DTM Filtering
Overall, the main concepts of DSM to DTM filtering can be summarized as: 1. Determine points in the DSM that are located on bareearth regions.2. Remove all other non-bare-earth regions.3. Fill the resulting holes by means of DTM interpolation.
Obviously, the crucial and most difficult step is the first one.
Based upon the literature study the most promising algorithm to start with is the directional filtering method by (Meng et al., 2009;Meng et al., 2010).Since it works on 1D image profiles, hereby denoted as scanlines, it yields fast runtimes.However, when applying single scanline processing only, the complete 2D context into which an object is embedded is lost.In addition, the given DSM can have unsharp 3D edges in some scanline directions.Another bottleneck is inherent to the negligence of the local slope of the underlying terrain, such that no useful results can be expected on tilted surfaces or in mountainous areas.Also many other published algorithms are not considering the local terrain slope (cf.(Sitehole and Vosselman, 2004)).
A short recap on directional filtering according to (Meng et al., 2009) can be given as follows: The basic idea is to process each line of the given DSM separately, e.g. from left to right with a sliding window of given extension.First, the minimal value in this window is determined, which is considered to represent the bare-earth terrain at the minimal position.In this step, it becomes obvious, that an object to be filtered out has to be smaller than the filter extent.Then, if the current pixel under examination has a large difference to the minimal value w.r.t. a given height threshold it is considered as a non-ground point.If this is not the case and the slope between the current pixel and the next one in scanline direction is larger than a given slope threshold, the pixel is also considered as a non-ground pixel.If the slope is positive and smaller than this threshold the pixel gets the same label as the previous pixel.If the slope is negative, then the distance to the closest ground point is used to decide whether the pixel is classified as ground or as nonground.
The main drawback is the fact that the local slope of the terrain is not considered in the processing.If this algorithm is used to process regions which are not flat, incorrect filtering occurs.This issue is sketched in Figure 1 where a 1D profile of an artificial DSM to be filtered is given in (a).It shows a tilted surface with some noise and a building.The potential filter extent is visualized as the blue dashed line.The minimal height value within the filter extent is marked as the black dot.It is always dragged towards the terrain fall and thus is incorrect (in the example a height of 95.8 m).If we apply a robust terrain slope fitting, the green dashed line in (b) is received.Using that tilt, the initial profile could be slope corrected (c) and the new minimal value can now be correctly extracted (99.2 m in this example).In addition to such an incorrect minimal value also the slope estimate of two adjacent DSM values should be corrected w.r.t. the local terrain slope.

MSD DTM Generation Approach
Using these simple insights, two main extensions to the directional filtering concept are proposed for development and implementation of our novel multi-directional slope dependent (MSD) DTM generation method: 1. Incorporation of the local terrain slope.
2. Extension of the local horizontal scanline approach to multiple scanlines, thus spanning the 2D image space.
The first algorithmic extension is very intuitive and sketched in Figure 1.In the workflow a robust fit of the height values within the filter extent must be performed.An obvious solution would be to use an iterative weighted least squares method with bisquare (Tukey) or Huber weighting function (cf.(Fox and Weisberg, 2010)).However, this robust fit has to be performed for each pixel and for all scanline directions and thus would slow down the process.Experiments have shown that a simple fit via smoothing with a huge kernel size yields very similar and thus satisfactory results.Therefore, a 2D Gaussian smoothing was implemented with a spatial sigma of 25.0 meters and a kernel size of 101x101 meters 2 .Then the pixel difference of the central pixels defines the local terrain slope value.
Figure 2 shows a real example for the downward scanline.
Given are a subset of the DSM and the local slope estimate as achieved by smoothing.The plot shows the original 1D DSM profile (black), the robust fit of the terrain slope (green), the simple fit of the terrain slope (cyan) as well as the robustly and simply corrected data (red and blue).It is obvious, that the simple and fast Gaussian-based approximation yields very similar results in comparison to the iterative bisquare weighted least squares solution.Actually, initial tests revealed that it is faster by a factor of at least four magnitudes for a filter size of 91 pixel.
The second algorithmic extension concerns the data processing scheme in terms of scanlines.As stated above, consideration of only one scanline direction from left to right yields a local solution and the filtering result could abruptly change within two adjacent pixels from two neighbouring scanlines.Instead we propose to use 8 scanlines and fuse the results for final pixel classification.The principal concept to solve a problem in 2D by fusing multiple 1D solutions is based on the work of (Hirschmüller, 2008).In the presented case the fusion is simply based on a majority voting.If more than 5 scanlines classify the current point as a ground point, this point is classified as ground point, else as non-ground point.This aspect is very important for satellite stereo DSMs, as some scanlines may classify a nonground point incorrectly (e.g.due to a smooth height transition) while the combination of classifications achieved from 8 directions certainly helps to improve final pixel classification accuracy.The whole novel algorithm is described in Algorithm 1 for one single scanline direction resulting in a ground / non-ground label image.Instead of processing each scanline sequentially, the image could be scanned in 2 passes, first from upper-left to lower-right corner and second from lower-right to upper-left corner, processing four scanlines at each pass (cf. Figure 3).Doing so, image data has to be read only once and instead of storing eight label images, one label image can be used to sum up all ground pixels, followed by a thresholding as stated above.This specific processing reduces the memory consumption and results in a speed-up.

DSM local downward slope
The main reason for the simplicity of the MSD algorithm is the fact, that the robust fitting was replaced by a smoothing done beforehand.Now the terrain slope fit is directly gathered using the difference value of this smoothed input DSM.
After removing the non-ground points, the resulting holes are filled employing a triangulation based linear interpolation method.
The algorithm uses three parameters, i.e. the filter extent in pixels, the height threshold in meters and the slope threshold in degrees.For all tests the values given in

Data Set
An Austrian alpine test site covering the city of Innsbruck was chosen as region of interest.A Pléiades image triplet, acquired with platform PHR 1A in acquisition mode PX and spectral processing PA, was used to test the DSM generation and DTM extraction workflow.Acquisition parameters of the image data, like along, across and overall incidence angle are summarized in Table 2.The test site covers urban, rural and mountainous terrain, with ellipsoidal heights ranging from 560 to 2750 meters a.s.l. and spans over an area of 212 km 2 .In this study only the VHR panchromatic bands are used, which originally are acquired with a GSD of 0.7 to 0.77 m, but always are delivered with an upsampled GSD of 0.5 m (2.0 m for multispectral bands).Using the workflow presented in (Perko et al., 2014) a DSM with 1 m spacing is extracted that serves as input for the DTM generation process.An overview of the scene is given in Figure 4, which shows the true-ortho image (a), the DSM (b) and the advanced MSD processed DTM as presented before (c).For quantitative evaluation, an airborne LiDAR DSM and its corresponding DTM are employed, which have been collected in 2006 and 2007 at 2.0 points/m 2 .The LiDAR data is available as raster data in UTM 32 North map projection and WGS84 datum at 1 m spacing with ellipsoidal heights.

Qualitative Evaluation
For visual comparison the results of several ground pixel filtering methods are given in Figure 5 for a small area of 721x361 m 2 , showing a residential area at a hillside.The Figure shows the pansharpened true-ortho image (a) along with the input DSM (b).Further, it shows the ground masks as resulting from DSM filtering by using only two directions (left and right; c), using 8 directions (d) and using 8 directions as well as slope dependent processing (e).It is clearly visible that due to the hillside location the first two methods filter out too many points due to negligence of the local terrain slope, whereas our advanced MSD method clearly yields a highly plausible and reliable segmentation.Forest areas and houses are removed while many ground points are correctly detected.Figure 6 shows a comparison of 3D views of the input DSM (a) and the resulting DTM (b), which makes the removal of trees and houses very well visible.

Quantitative Evaluation with LiDAR Data
First, the Pléiades tri-stereo based DSM is compared to the LiDAR DSM.Due to the large temporal gap between LiDAR and Pléiades data acquisitions only selected areas were analysed, which are not affected by temporal change due to construction, vegetation growth or cloud cover.Mean values as well as standard deviation of the height differences are summarized in Table 3 for an area of 25.4 km 2 .Here the Pléiades based DSM has a bias of 0.64 m and thus is actually too high.However, this is within the height uncertainty given for this sensor (Stumpf et al., 2014;Perko et al., 2014;Berthier et al., 2014).Consequently, the differences of LiDAR and Pléiades based DTMs will have the same bias, which however is not the fault of the presented DTM generation algorithm.
Second, the difference of the reference LiDAR DTM to the Pléiades DTM, which has been extracted using the proposed algorithm, is analysed.The results are also given in Table 3.
The mean bias between both DTMs indeed is very similar to the one achieved for the DSMs.Due to the fact, that not all nonground points are perfectly removed during the Pléiades DTM generation, the achieved DTM is locally above the LiDAR DTM, resulting in an additional mean height difference of 0.11 meters.However, this bias is really small and it is below the absolute accuracy of LiDAR height measurement as well as Pléiades stereo height measurement.Figure 9 shows DSMs and DTMs as generated from LiDAR and Pléiades data for two small areas of 100x100 m 2 each, as well as profiles for selected objects, representing a skyscraper (left) and a round building (right), respectively.Analysis of the plotted profiles gives indication, that small structures, like the buildup on the first skyscraper, or the hole in the center of the round building, are not reconstructed using Pléiades data.The DTM generation is able to remove buildings and locally our DTM looks even better than the LiDAR DTM, e.g. on the left border of the round building where the LiDAR DTM is even below ground level.

Discussion
Overall, the resulting DTMs are visually appealing, which is also confirmed in the quantitative evaluation w.r.t.LiDAR reference data.Nonetheless, as long as the extraction of a DTM solely relies on the corresponding DSM, dedicated problem features and areas are inherent to the filtering algorithm being applied: 1. First, large non-ground objects may not be filtered out.In particular if an object is larger than the DTM filtering size in all eight directions, it will remain in the DTM. 2. Second, this obviously holds for larger forests, where no ground point is visible.Such forest stands will also remain in the DTM. 3. Third, the presented robust slope estimation will fail on mountain peaks such that the minimal value will be incorrectly determined.Then, the peaks will be cut off with the size of approximately half of the filter kernel size.
Currently no post-processing is applied to the filtered DTMs.
Obviously the final quality could be increased by removing very small patches in the ground mask (Figure 5), since e.g.single pixels are more unreliable than others.Also the resulting interpolated DTM could be filtered to get smoother surfaces, e.g. using a Gaussian, a median or a bilateral filter.

CONCLUSIONS
This work presents a simple, robust and computationally efficient filtering method that can be applied to a single digital surface model in order to extract a terrain model.The method is based on the DTM generation algorithm in (Meng et al., 2009) where distinct advancements were presented through the multidirectional slope dependent (MSD) filtering.In contrast to the original work, these extensions allow the DTM generation process being applicable on DSMs extracted from highresolution satellite stereo optical imagery.The presented DTM generation workflow is fully automatic and requires no user interaction.Exemplary results were shown for a DSM generated from a Pléiades tri-stereo image data set.Qualitative and quantitative evaluations to highly accurate reference LiDAR data demonstrated the functionality of the proposed algorithm.
DSM profiles for an artificial building on a tilted surface with added noise: (a) Original DSM profile and the minimal height value (black) within the window (blue).(b) Robust fit of the surface (green).(c) Slope corrected DSM profile and the according minimal height value.

Figure 2 .Figure 3 .
Figure 2. Comparison of robust terrain slope fit to the proposed simple fit.

FigureFigure 4 .Figure 5 .
Figure 7 and Figure 8 show detailed views of an urban and a suburban region.

Figure 6 .
Figure 6.3D view of the DSM shown in Figure 5 (left) and the resulting DTM after applying the proposed filtering method (right).

Figure 9 .
Figure 9. Profile comparisons of DSMs and DTMs from LiDAR and Pléiades for two small scenes.

Table 1 .
Parameters used in all tests.

Table 2 .
Acquisition parameters of the Pléiades data.

Table 3 .
Statistics of height differences between LiDAR and Pléiades based DSM and DTM.