TOWARDS AUTOMATIC VALIDATION AND HEALING OF CITYGML MODELS FOR GEOMETRIC AND SEMANTIC CONSISTENCY

A steadily growing number of application fields for large 3D city models have emerged in recent years. Like in many other domains, data quality is recognized as a key factor for successful business. Quality management is mandatory in the production chain nowadays. Automated domain-specific tools are widely used for validation of business-critical data but still common standards defining correct geometric modeling are not precise enough to define a sound base for data validation of 3D city models. Although the workflow for 3D city models is well-established from data acquisition to processing, analysis and visualization, quality management is not yet a standard during this workflow. Processing data sets with unclear specification leads to erroneous results and application defects. We show that this problem persists even if data are standard compliant. Validation results of real-world city models are presented to demonstrate the potential of the approach. A tool to repair the errors detected during the validation process is under development; first results are presented and discussed. The goal is to heal defects of the models automatically and export a corrected CityGML model.


INTRODUCTION
Application and analysis of geo data is moving from traditional GIS applications with 2D map data towards deployment of real 3D data.Virtual 3D city models become more and more available for urban areas.More sophisticated tools for data analysis and information extraction are under development.Quality assessment becomes mandatory because reliable and reproducible processing results can only be obtained with correct original data.Different views on the term "correctness" exist, existing standards such as ISO 19107 or CityGML spectification provide a good starting point.This is not sufficient for an unambiguous definition of modeling guidelines.Consequently, a discussion of the definition of guidelines for modelers and users and methods to check the data set for compliance with these specifications are necessary.A general overview of the concept of data quality in the geographic domain is included in (Kresse & Fadaie 2004), which offers a comprehensive summary of the relevant standards, notably of the ISO 19100 series.The paper of (Akca et al. 2010) has a focus on geometric accuracy with respect to the generation process of a model from Lidar data.Discussing the problems of polygonal models, (Krämer et al. 2007) define quality measurements for 3D city models.Some simple algorithms for quality assessment and healing of geometries are presented.(Campen et al. 2012) provide an extensive collection of typical defects of polygonal 3D models and existing techniques for processing and repair with respect to different fields of application.A detailed analysis of completeness and separation issues in city models is presented by (Zhao et al. 2012).They consider typical properties of semi-automatic generated models and their insufficiencies and develop a generalization method.However, other geometric errors are not investigated.
Limited research was done regarding healing of 3D city models so far.Approaches to repair triangle meshes such as (Liepa 2003) and (Attene & Falcidieno 2006) exist but can only be tied loosely to our approach.We map CityGML features to an internal data structure which is designed to maintain links to the semantic properties of the original model.Using volumetric techniques, as suggested by (Nooruddin & Turk 2003) requires conversion to a voxel representation which creates difficulties in maintaining model-inherent semantics.An alternative approach is presented recently by (Ledoux 2013).A top-down approach is described as favorable because it enables repairing a model in one single step.The implementation shows that a hierarchical processing of the model is necessary before the actual volume-based approach for healing solid defects can be performed.We present an overview of our research results leading to the definition of certain quality criteria for CityGML models and the development of an automated validation tool.A quality report is the result of this processing step.It includes detailed descriptions of all detected errors.This information is used as input of a healing process which tries to repair as many errors automatically as possible.The healing procedures are described in detail and experiences with the tool are discussed.

Validation Rules
A validated a data set is expected to be clean, correct and useful for a given application.This implies that different sets of validation rules exist, depending on the intended application.We separate the validation process in two general steps: first a schema validation for CityGML data to assure schema conformant input to the second step, geometric and semantic validation of the data set.Only the second step is discussed here ISPRS Annals of the Photogrammetry, Remote Sensing and Spatial Information Sciences, Volume II-2/W1, ISPRS 8th 3DGeoInfo Conference & WG II/2 Workshop, 27 -29 November 2013, Istanbul, Turkey because XML schema validation is a standard procedure for which sophisticated tools are available.For the basics of geodata validation we refer to the explanations in (Wagner et al. 2013).The Special Interest Group 3D has developed guidelines for modeling of 3D city models.The goal is to clearly specify valid alternatives and recommend one of them for general usage.This should lead to city models with known specifications in contrast to the situation today where only the modeler knows how certain features are reproduced.These recommendations are the base for geometric validation rules which have been developed and implemented as part of the research project CityDoctor at University of Applied Sciences Stuttgart, Germany.The geometric model as described by (Gröger & Plümer 2011) is used.
In addition, some geometric-semantic rules resulting from CityGML requirements are included as plausibility checks for consistency of the data set.A short listing of the checks is given in the following, more detailed explanations are given in (Wagner et al. 2013).The algorithms and the underlying data structure are suitable for CityGML LODs 1 and 2. Polygon checks 1.A linear ring must consist of a minimum of 4 points 2. First and last point of a linear ring are identical.
3. All points of a linear ring R are different, with exception of first and last point.4. Two edges can intersect only in one start-/end point.
Other points of intersection or touching are not allowed (to account for rounding errors or polygons which are not perfectly planar, a small tolerance is allowed).5.All points of the polygon must be located in a plane (a small tolerance is allowed).NB: since we consider only outer rings, polygons with holes are not processed currently (they occur only rarely in LOD 1 or 2).Solid checks 6.The minimum number n of polygons to define a solid is four.They must be situated in different planes.7. A valid intersection of two polygons of a solid either contains a common edge, a common point of a linear ring, or is empty.Common edges and points must be elements of both polygons.8.Each edge of a linear ring defining a polygon is used by exactly one neighboring polygon.9. Consistent orientation of polygons of a solid such that common edges according to check 8 are used in opposite direction.10.The normal vectors of the polygons must point towards the outside of the solid.11.All parts of a solid must be connected.12.The graph G S = (V P ,E P ) of polygons and edges which are meeting in point p i is connected for all p.Each

HEALING
For each error detected during the validation process a specific error object contains all necessary parameters for healing.Our approach assumes that all errors should be healed hierarchically, according to the dependency of the respective checks.An iterative approach assures that after an error is healed, the geometry is checked repeatedly for new errors which might have been introduced during the last healing step.This enables to manipulate to original model in a controlled and reproducible way.For the cases where problems can't be solved by the healing algorithms after a user-defined maximum number of iterations, an error object is returned.Healing is done in two phases.Firstly all the polygons are healed and then if polygons pass the validation process solid-errors are healed.In figure 3 the healing process is illustrated.

POLYGON HEALING
In this phase one error is healed at a time.That means each iteration heals an error and checks the result for validation.CP_CLOSE: The first and the last vertex of a polygon must be same, therefore healing would be just to copy the first vertex at the end of the pointlist.If a Linear Ring contains four vertices in a sequence {P 1 , P 2 , P 3 , P 4 } where the last point and the first point are not same.The healed pointlist would look like {P 1 , P 2 , P 3 , P 4 , P 1 }.CP_NUMPOINTS: A Linear Ring must contain a minimum number of four vertices in the sequence where the first and the last point are same.A closed Linear Ring with less than 4 vertices is either a line or a point but not a valid polygon, in this case healing is to delete these polygons.But if a polygon contains 3 different vertices and the first and the last vertex are not same then it will be healed by the previous healing process and then the number of vertices of the healed polygon will be 4. CP_DUPPOINT: In a Linear ring only the first vertex is allowed to repeat at the end of the point sequence.No other vertex is allowed to repeat within the sequence at any position.In Figure 4 first Linear Ring X contains the point sequence of {P 1 , P 2 , P 3 , P 4 , P 5 , P 3 , P 1 } where P 3 is repeating twice.In second Linear Ring Y point sequence is {P 1 , P 2 , P 3 , P 4 , P 4 , P 1 } where P 4 is repeating twice.In third Linear Ring Z point sequence is {P 1 , P 2 , P 3 , P 4 , P 2 , P 1 } where P 2 is repeating twice.In all the Linear Rings one or more vertex is repeating more than once excluding the last point so there are Duplicate Point Error in the Linear Rings.These are healed in two different ways.For Linear Ring Y vertex P 4 comes twice back to back so only one of the instance are kept and the other one is deleted from the pointlist.But for X and Y deleting an instance of vertex will result change in the shape of polygon.So here loops are searched within the pointlist.For X the loops will be {P 3 , P 4 , P 5 , P 3 } and {P 3 , P 1 , P 2 , P 3 } and for Z it will be {P 2 , P 3 , P 4 , P 2 } and {P 2 , P 1 , P 2 }.So the polygons will be split into multiple polygons according to the newly found loops.
Figure 4. Healing CP_DUPPOINT error CP_SELFINT: Two edges are allowed to intersect only at start and end point of the edge and any other intersection will considered as an error.In Figure 5 first polygon contains point sequence {P 1 , P 2 , P 3 , P 4 , P 1 } where edge (P 2 , P 3 ) and edge (P 4 , P 1 ) intersects at a point which doesn't belong to the point sequence and in second polygon point sequence is {P 1 , P 2 , P 3 , P 4 , P 5 , P 1 } where edge (P 2 , P 3 ), edge (P 4 , P 5 ) and edge (P 5 , P 1 ) intersects at a point which doesn't belong to the point sequence.So, both has Self-intersection Error of Edges.There are two healing options one is to rearrange the point sequence which works sometimes fine with simple polygon and another one is to extract the intersection points create new vertices with those and place the new vertices in between each intersecting edges.So the first polygon would be {P 1 , P 2 , P x , P 3 , P 4 , P x , P 1 } where P x is the new vertex.This is not a valid polygon but there is no self-intersection error any more, the double point errors will be healed by the next iteration by its healing process.
Figure 5. Healing CP_SELFINT error CP_PLAN: This is very common error and difficult to heal.All vertices of a polygon must lie within the same plane regarding a user specified tolerance.If a polygon contains point sequence of {P 1 , P 2 ,…., P n , P 1 }, all of the vertices will lie within a plane formed by any three vertices from the point sequence and normal of all vertices on the surface must be parallel.In Figure 6 the polygon has a point sequence of {P 1 , P 2 , P 3 , P 4 , P 1 } where i.e. vertex P 3 doesn't lie within the plane formed by P 1 , P 2 , P 4 .Sometimes the error is very small like less than a 1 mm.Those are most probably caused by the measurement issue or floating number.In this case a little adjustment of vertices might heal the polygon.Another healing option is to triangulate the polygon and split it into multiple triangular polygons.But again it is very difficult to decide how to triangulate because there is always more than one possibility and only one is correct.So a little bit of customization according to the error pattern of the model helps a lot.For example while repairing a vertical non planner polygon of wall surface only vertical triangles are accepted as newly triangulated polygons.The healing of the GroundSurface is identical for the LoD1 and the LoD2.We are identifying the GroundSurface for a LoD1 geometry as the surface with the smallest z-coordinate and the least deviation in respect to direction of the normal vector of the xy-plane .All points belonging to the linear ring of the GroundSurface are being projected on a plane, parallel to and passing through the minimal z-value of the ring.See figure 7 for an example.The blue, non-planar polygon is being projected on the green, planar one.Healing of the WallSurfaces: As above we are not distinguishing between LoD1 and LoD2 during the healing process of WallSurfaces.We assume that each WallSurface shares a common edge with the GroundSurface and each Surface of a LoD1 geometry, adjacent to the GroundSurface is a WallSurface.Let be the i-th WallSurface, the common edge with the GroundSurface and the normal vector of the least squares plane through all points of the linear ring of .If is smaller than a given , then all points of the linear ring are being projected into the plane, spanned by the directional vector of and .With this approach we omit walls with a given angle of slope.An example is shown in figure 7. Healing of the RoofSurfaces: There are two algorithms for healing the RoofSurfaces.The first one handles LoD1 roofs and LoD2 flat roofs.The RoofSurface of a LoD1 building is determined similar to its GroundSurface.It's the polygon with the least deviation according to and maximal z-value.Additionally it's not adjacent to the GroundSurface.We will projecting all points of the linear ring of the RoofSurface into a plane parallel to , passing through the average z-value of all points in the ring, as you can see in figure 8 .Each point of a linear ring of the corresponding RoofSurface is projected along the z-axis into the according least square fitting plane of its linear ring.This procedure will be repeated until all RoofSurfaces are planar or a maximum number of iteration is reached.Note that the points are only projected along the zaxis.Hence healed the WallSurfaces remaining planar, even if shared points of RoofSurfaces are moved.See figure 8 for an example.

SOLID HEALING
In this phase one error is healed at a time.That means each iteration heals an error and checks the result for validation.

CS_NUMFACES:
To form a solid minimum four surface is required.Any solid having less than four valid polygons has insufficient number of face error.If a solid has less than 4 polygons then it is not possible to repair, only exception is a triangular pyramid with one missing triangle.For all other cases the solid would be invalid and deleted from the model by the healing process.
CS_SELFINT: Polygons of a solid must meet each other only through edges.Any other intersection of polygons will be considered as a self-intersection error of solid.We assume that the polygons of a solid, in the sense of CP_PLANAR, are planar, hence the user defined tolerance for the intersection algorithm should not be greater than the one used for CP_PLANAR.This might lead otherwise to false positives and/or irrational results.The Tolerance i.e. is used to determine if the intersection of two polygons is a line or only a point (the length of the line segment is below the tolerance).Furthermore each intersection is classified by its type: partially embedded edge, fully embedded edge, partially embedded polygons, fully embedded polygon, normal intersection and undefined intersection.In figure 9 for partially and fully embedded and edge errors like A and B overlapping edge (P 1 , P 3 ) and edge (P 2 , P 4 ) are merged into 3 edges (P 1 , P 2 ), (P 2 , P 3 ) and (P 3 , P 4 ) and the pointlists of respective polygons are rearranged like H.And for partially embedded polygon errors like Y the overlapping regions are trimmed out from the overlapped polygons and new polygons are created from the overlapping regions like R. For fully embedded polygon errors like X the overlapping region is only trimmed out from the bigger polygon.
Figure 9. Healing CS_SELFINT error For a normal intersection like figure 10 healing doesn't brings a valid result.If the intersecting polygons are split into multiple polygons an overused edge error occurs which is very difficult to heal.But still it solves the issue with self-intersection.
Figure 10.Healing complex CS_SELFINT error CS_OUTEREDGE: Every edge of a solid will bound exactly two polygons.Any edge of the solid bounding less causes incorrect number of polygons with edge error and there is a hole somewhere in the solid.Firstly all the error edges are searched for loops.And new polygons are formed with the newly found loops.The incomplete parts of the loop are left out without healing.CS_OVERUSEDEDGE: Any edge of the solid bounding more than two polygons causes a topological error.In figure 10 if the self-intersection error is healed then there will be an edge sharing 4 polygons.This type of error are not possible to heal automatically the possible options are to manually edit the solid or delete the polygons.CS_FACEORIENT: Each edge must bound two polygon and the orientation of the edge must be opposite in the polygons.In figure 11a two polygons P {P 1 , P 2 , P 4 , P 3 , P 1 } anti-clockwise oriented and Q {P 6 , P 4 , P 3 , P 5 , P 6 } clockwise oriented, are bound by the edge (P 4 , P 3 ).But the order should be in one polygon (P 4 , P 3 ) and in another polygon (P 3 , P 4 ).If all or most ISPRS Annals of the Photogrammetry, Remote Sensing and Spatial Information Sciences, Volume II-2/W1, ISPRS 8th 3DGeoInfo Conference & WG II/2 Workshop, 27 -29 November 2013, Istanbul, Turkey of the edges of a polygon have wrong orientation then it is wrong oriented and healing would be to reverse the order of the pointlist.So if pointlist Q is wrong oriented then the healing process will correct the pointlist in {P 6 , P 5 , P 3 , P 4 , P 6 } and the orientation will be anti-clockwise.
Figure 11.Healing CS_FACEORIENT error CS_FACEOUT: If every polygon of a solid is wrong oriented then the solid will be valid by face orientation check because each edge will find an opposite pair.So every surface normal of a solid must direct outwards.Even after healing the face orientation error it is not guaranteed that all the polygons will face outward.In figure 11b the red polygon has a face out error.Healing of this error is to reverse the orientation of the polygon.CS_CONCOMP: If a solid contains a disconnected polygon then it has an error and it will be detected by the outer edge check.But if the polygons defined in a solid forms two valid solid like figure 12 then it will pass all the checks until connected component check.Healing of this error is to convert each disconnected solid into a solid data structure and delete the original solid.14.Only 496 buildings were found valid.8474 buildings were healed after an iterative healing process.So 82% of invalid buildings were healed.Those building which couldn't be healed has been replaced by the original model.Although some of those errors can be healed but if still the building contains error then it is not possible to know by the automated process whether the process has minimized the error or made it more complex.Figure 14.Error distribution before healing So if there were 50 outer edge error and 49 edged has been found in different loops then still the remaining 1 edge will cause error and we wouldn't know those new polygons were correctly drawn or not until someone takes a look into it manually.

Figure 15. Error distribution after healing
There have also some difficulties while healing Rotterdam model.Some corner buildings of a series of houses were very strangely modeled like in figure 16.It has been modeled like two buildings joined together by a wall surface but the opposite walls of each building have been removed (like H shaped cross section).Those buildings couldn't be healed at this moment because the wall surface which lie in the middle, causes some over used edge error which are not possible to heal.
Figure 16.WallSurface splitting a building Another type of error commonly found was some walls between two buildings out of nowhere like in figure 17.It is not clear that the walls should be modeled within the building or should have its own geometry in different building or building part.The buildings have complete structure without that wall.But together it creates same type of overused error which makes it difficult to heal.
Figure 17.Extra WallSurface attached to the building There has also been some overused error because of the structure of the building.Like in figure 18 height difference of roof has caused 4 polygons sharing an edge.There is nothing wrong with the modeling, the original building has been built like this.

CONCLUSION
All of the checks and most of the healing process has been already implemented and tested and the results have been discussed here.There are always new problems arising with new model.Mostly a model has similar type of errors in each building.One thing to mention here is computation time and system requirements.Normally the process works pretty fast but it depends upon how big the model is and how many iterations are set as limit.
Figure 1.Umbrella check 14. measuredHeight in same range as height of building geometry 15. numberOfStoreysAboveGround plausible for height of the building geometry 16. numberOfStoreysBelowGround plausible for height of underground geometry of the building 17.Relationship of Building and BuilidingPart

Figure 6 .
Figure 6.Healing CP_PLAN error The healing of non-planar surfaces of a building with the first method is divided in three phases: Healing of the GroundSurface:

Figure 8 .
Figure 8. Healing of the LOD1 and LOD2 RoofSurface The second one handles all other LoD2 roof types and calculates the least square fitting plane for all RoofSurfaces.Each point of a linear ring of the corresponding RoofSurface is projected along the z-axis into the according least square fitting plane of its linear ring.This procedure will be repeated until all RoofSurfaces are planar or a maximum number of iteration is reached.Note that the points are only projected along the zaxis.Hence healed the WallSurfaces remaining planar, even if shared points of RoofSurfaces are moved.See figure8for an example.SOLID HEALING

Figure 12 .
Figure 12.Healing CS_CONCOMP error CS_UMBRELLA: Healing this error is still in progress but one option is to split the adjacent polygons into groups where they are connected by edges and then create new vertices for each group with same coordinates then move those vertices a little bit away from each other like figure 13.

Figure 13 .
Figure 13.Healing CS_UMBRELLA error4.RESULT AND DISCUSSIONSome real world models have been validated and healed using this tool.LOD1 and LOD2 Models of Stuttgart, Ludwigsburg, Dusseldorf and Rotterdam are some of those.An overview of the validation result is given in Table1.All models are in LOD2.Here A represents 1 building with 580 polygons, J represents 61 buildings with 3455 polygons, L represents 4 buildings with 69 polygons and X represents 1922 building with 32546 polygons.

Figure 18 .
Figure 18.Two corner edges of two box shaped parts of a building touching each other at an edge.

Table 1 .
All models are in LOD2.Here A represents 1 building with 580 polygons, J represents 61 buildings with 3455 polygons, L represents 4 buildings with 69 polygons and X represents 1922 building with 32546 polygons.

Table 1 .
Results for geometry validation In 16-99-HOOGVLIET-ZUID of Rotterdam model there are 10828 buildings with around 11000 ground, 23000 roof and 68000 wall surfaces.After validation with all geometric check approximately 118000 CS_OUTEREDGE, 33000 CS_SELFINT and 2000 CP_DUPPOINT error have been found in 10332 buildings like in figure