3D MODELING FROM GNOMONIC PROJECTIONS

The paper presents a strategy able to derive and pr ocess high resolution images created by means of gn omo ic projections. The implemented pipeline can be split into two phases: fir t, the sensor resolution of the camera is physi cally increased by acquiring and merging a set of images with a rotating camera equi pped with a long focal lens; then the new set of gn omonic projections is processed with a 3D reconstruction methodology able to deal with very large images. Several issues are addressed in the paper, starting from image acquisition up to 3D modelling. Gnomonic projections have been demonstrated to be powerful tools when traditional pinhole images do not allow the reconst ruction of small and fine details. Examples and com parisons aimed at determining the correctness of the mathematical approach for im age orientation are illustrated as well.


INTRODUCTION
The chance to obtain accurate and detailed 3D models of close-range and architectural objects from image-based processing has been widely demonstrated over the last years.Nowadays all traditional photogrammetric and computer vision tasks can be carried out in an automatic way: camera calibration, image orientation, and 3D surface reconstruction through dense image matching.Very often existing methods integrate traditional techniques of both photogrammetry and computer vision (Barazzetti et al., 2011).The result is that today these solutions can compete with close-range 3D scanners, usually more expensive and cumbersome although simpler to be used at the data acquisition stage.On the other hand, some problems related to the physical characteristics of imaging sensors need to be addressed.Modern CCD and CMOS sensors capture images with geometric resolution superior to 20 Mpx and radiometric resolution higher than 16 bits.The level of detail of a 3D modelling project strictly depends on the ground sample distance (GSD).The reconstruction of fine details needs the acquisition of ad-hoc datasets of images where all small parts must be clearly visible.This can be usually achieved by reducing the camera-object distance and then by increasing the total number of images to be processed.Another solution is instead based on Super Resolution (SR), which can be intended as a procedure which increases image resolution (Milanfar, 2010).One possible way is a reduction of the pixel size (preserving the metric sensor size), which however tends to worsen the signal-to-noise ratio.Alternatively, the chip-size could be increased but capacitance increases and storage problems arise.Another method is based on the use of a set of low resolution (LR) images that are then merged to obtain a SR mosaic.The main idea is the acquisition of images with sub-pixel shifts and a following data processing system capable of fusing all the different information.In the literature, there is an impressive number of papers dealing with this topic (some examples are Bascle at al., 1996;Berthod et al., 1994;Dellaert at al, 1998;Elad and Feuer, 1999;Irani et al., 1992;Numnonda et al., 1993, Shekarforoush et al., 1996).The workflow is often quite similar: after a preliminary image alignment, images are combined to extract a 'sharp' image with a superior resolution.In some cases, shifts can be replaced by a series of multi-focus data (Elad and Feuer, 1997), where multiple shots with different focus points can be acquired and merged to obtain a sharp image.
The key concept is a subdivision of source images into decompositions that are then integrated to obtain a composite reconstruction.Then the sharp image is created with an inverse multi-resolution transform.One of the most remarkable advantage of these methods regards the opportunity to use standard cameras with a consequent reduction of costs.Standard applications are the ones in the medical domain, microscopy, micro-mineralogy, macro-photography, and satellite images, among the others.In this paper we present an alternative solution (encapsulated now in a complete data processing pipeline) where the actual camera's sensor size is virtually increased by using long focal lenses coupled with the gnomonic projection to fuse standard pinhole images.Shown in Figure 1 is a synthetic flowchart.Starting from the intuition of Kauhanen et. al (2009), the metric pixel size and focal length of original images can be transferred to the final mosaic, whereas sensor size is increased as a function of the field of view covered during image acquisition.In addition, the geometric barrel or cushion distortion of the new virtual sensor is also removed for further matching and surface reconstruction purposes.After the automatic creation of several gnomonic projections with a variable sensor size, an automated methodology for image orientation was implemented to handle imagery with very large image resolution, preserving original data with a multi-resolution matching approach.Then, starting from the estimated camera poses diverse algorithms for dense multi-view matching can be run to obtain 3D surface reconstruction.In the current implementation a modified release of Multiphoto Geometrically Constrained Matching (MGCM+) algorithm was used (Previtali et al., 2011).At the end, surface can by meshed and textured.The experiments carried out showed a sub-pixel precision during bundle adjustment, proving the correctness of the adopted gnomonic camera model.The advantages of the method will be demonstrated by the products that can be created by using only a limited number of high resolution gnomonic projections; these include 3D models, DEMs, orthophotos and true-orthophotos.The method allowed the creation of synthetic cameras with sensor size twice or three times as much as standard matrix sensors, which however is not the physical limit of the method.Indeed, the method could provide gigapixel images for very long telephoto lenses, although memory issues arise and increase CPU time.

Generation of high resolution images through gnomonic projections
Multiple images taken with a rotating camera can be registered and stitched within a homographic transformation: where K is the camera calibration matrix (Hartley and Zisserman, 2004): The rotation matrices R i and R j of two generic images i and j can be parameterized as proposed in Brown and Lowe (2007): A rotating camera is here intended as a standard pinhole sensor able to rotate around its perspective centre.For this reason we created an ad-hoc rotating head that consists of a cardanic joint that turns the camera around the perspective centre.Once the head is calibrated (e.g. by checking the alignment of several vertical wires in pictures taken with different camera attitudes), parallax errors can be removed from the dataset.In addition, the perspective centre is aligned with a pole on top of the head.Here, a prism or a GNSS receiver can be placed in order to geo-reference the survey into a geodetic reference system (see Fig. 3).The coordinates are measured with these external sensors (theodolite or GNSS) to give the location of the perspective centre of the final gnomonic projection.The vertical shift between the pole and the camera can be easily estimated with a calibration project, and can be therefore assumed as being constant for all images.These 3D coordinates will be directly used as pseudo-observations (see next Section) in bundle adjustment in order to control block deformations.In general, a projection is a mapping of the Earth onto a flat surface.Here, the scene around the camera represents the globe.Gnomonic projections are obtained by projecting the point on the globe onto a plane tangent to the sphere (Fig. 2).The equations for a projection with central latitude φ 0 and longitude λ 0 are: During the last step (re-projection), a gain compensation is first applied to reduce the intensity difference between overlapping images (Uyttendaele et al., 2001).Secondly, a multi-blending algorithm (Burt and Adelson, 1983) removes the remaining image edges avoiding blurring of high frequency details.Fig. 3 shows an example: 28 images were acquired with a Nikon D700 (4256×2832 pix, pixel size 8.4 µm) equipped with a 90 mm lens.Matching is carried out with the SIFT operator (Lowe, 2004) in order to extract a set of features from the images (some other methods work with the SURF operator).All descriptors are compared with a kd-tree search in O(nlog n) time.Finally, the matched image points allow the estimation of the unknown parameters within a bundle adjustment based on the Levenberg-Marquardt algorithm (Brown and Lowe, 2007).Images can then be mapped with a gnomonic projection to obtain a high resolution mosaic.Image distortion can be removed beforehand from original data with a proper calibration project (Remondino and Fraser, 2006).The final mosaic is a (distortion-free) gnomonic projection with a focal length equal to that of the original images.The sensor is virtually increased (in this case the final mosaic is 6512×8900 pix) but pixel size is preserved.In a few words, if the original sensor size was 36 mm ×23.9 mm, the new projection is virtually acquired with a camera with a sensor of about 54.7 mm × 74.76 mm.

Processing of multiple gnomonic projections
As previously mentioned, the 3D reconstruction pipeline is based on a block of gnomonic projections virtually acquired with different sensors.However, the focal length and pixel size of each projection are constant, whereas radial and tangential distortions can be compensated for during the generation of the mosaic.If radial and tangential distortions are modelled, the principal point is the centre of the projection.This means that calibration parameters can be automatically fixed by looking at the size (in pixel) of each new projection and the metric pixel size of single pinhole shots.
Obviously, the reader might ask why original images are not used for data processing, as they are standard pinhole imagery and 3D reconstruction can be achieved with the normal procedures reported in the literature.First of all, a gnomonic can be crated from images that have a limited overlap (at least between 2 images), while image orientation requires points matched on multiple shots (at least 3 images).Therefore, the use of a single image from a set of many improves the robustness of image orientation and reduces the global number of views.Then, according to authors' experience, if many images have the same perspective centre, convergence problems might arise during a standard photogrammetric bundle block adjustment (exception made for bundle implementations which consider an appropriate set of initial values).However, it could be difficult to generate these initial approximations for image blocks having complex configurations.Finally, as demonstrated by Stamatopoulos and Fraser (2011), the standard collinearity principle is not always appropriate if long focal length lenses (field of view less than 10°) are employed.Here, although images are acquired with telephoto lenses, the final projection is virtually acquired with a normal lens when compared to the new sensor size.An example of 3D processing is the main façade of the church shown in Figures 3 and 4, which was reconstructed from a set of 5 SR images only.One of the main problems of data processing is the final image size, which makes coarseto-fine approaches indispensable.SURF operator and robust estimators are initially run to detect a set of corresponding points on sub-sampled images (generally 25% of the original size), then Least Squares Matching (LSM) allows a refinement on full resolution images (Baltsavias, 1991).A free-network bundle adjustment (Granshaw, 1980) is used to recover camera poses (Fig. 4), obtaining an estimated sigmanaught of about 0.6 pixels.This value confirms the correctness of the mathematical model for image orientation, although more details are given in the following Section.
Finally, the surface of the object can be reconstructed with the dense multi-photo matcher proposed in Previtali et al. (2011).This allows one to deal with SR images (at their original size) and provided the results shown in Figure 4, which consist in a high resolution textured 3D model.Metric rectification is a very common product for this category of objects and can be easily achieved by using the central projection taken in front of the façade.In this case, the photogrammetric project provided a set of image-toobject points for the estimation of the rectifying homography.
It is interesting that the direct use of sets of parallel and perpendicular lines is a mistake for the considered church: elements like pillars or beams are not completely vertical or horizontal (this effect is not clearly visible with a simple visual inspection).Results are visible in Fig. 5, with a final image of 6512×8900 pix and a GSD equal to 1 mm.This value is about three times better than that achievable with a standard 35 mm lens, which could be the authors' first choice in the case of standard pinhole images.
Obviously, the use of a detailed 3D model (DEM or mesh for this 2.5D object) provides not only rectified images, but also orthophotos and true-orthophotos in order to correct the location of elements out of the chosen object plane.

EXPERIMENTS
In this section a comparison between data extracted from gnomonic projections and benchmark datasets is reported.
The aim was the analysis of metric performances and the experimental validation of mathematical models for image orientation.

Comparison with accurate metric data
It is normal to compare image measurements with a corresponding (accurate) dataset to try out the accuracy of the reconstruction (Fig. 6).If the goal is the analysis of bundle adjustment accuracy, a possible solution is the use of independent check points.Obviously, in this case we are interested not only in an accuracy evaluation, but also in bundle statistics (sigma-naught and covariance matrices) to check the correctness of all algorithms implemented.The reference dataset consists in 28 photogrammetric targets (white dot and black background with a cross in the middle) placed on a building façade (9×12 m).The 3D coordinates were initially measured with a theodolite Leica TS30 and a geodetic network based on three stations (multiple intersection).The adjustment provided points with precision of about σ X = σ Z = ±0.3mm (façade plane) and σ Y = ±0.5 mm (depth).A total number of 3 convergent projections was then created with a Nikon D700 equipped with a 50 mm lens and image coordinates were measured manually (we assumed a precision of these points of ±1 pix).Bundle adjustment was carried out by including also all projection centres (measured with the theodolite after placing a 360° prism on top of the head), and 8 control points (a priori sigma naught of ±1 pix).This allows the registration of both projects in the same reference system and makes possible to analyse accuracy using the remaining check points.Obviously, as the head was calibrated using an optical alignment to obtain a precision of a few millimetres, perspective centres were weighted using a precision σ X0 =σ Y0 =σ Z0 =± 1 cm, whereas for 3D coordinates the precision given by geodetic network adjustment (always less than 1 mm) has been utilised.The difference between the perspective centre coordinates measured with the theodolite (360° prism on top of the head and Z coordinates corrected using a known vertical shift) and the adjusted coordinates is also interesting (less than 1 cm, confirming the correct calibration of the head), as shown in Table 2.
Fig. 6.Projections, geodetic network with error ellipses (targets, image locations and stations), and 3D view of projection poses.
Table 1.Difference statistics on a set of 20 check points.

Surface measurement
Shown in Figure 7 are the orientation and 3D modelling results from a set made up of 5 projections.The object is a portion of the Basilica of San Pietro al Monte (Civate, Italy) and the camera is a Nikon D700 with a 90 mm lens.Image size varied from 46.8 to 62.7 Mpix and the final reconstruction was scaled with a known distance measured with a graduated tape.Then the surface was reconstructed using the coarse-to-fine approach offered by MGCM+.This approach is needed in order to preserve all the information contained in the gnomonic projections and process the images at the finest level (their original size).Indeed, an image matching approach based on subsampled images without a coarse-to-fine strategy contradicts the gnomonic projection concept.MGCM+ is a dense image matching algorithm developed to deal with high resolution images.The final model is obtained by processing the images at their full size so that high quality of the final result can be maintained.At the coarser level the tie points extracted in the orientation phase are meshed in order to obtain an approximate model of the surface.This initial model is then refined with MGCM+.In the next step the quasi-dense point cloud obtained is meshed again and the obtained surface is used as an initial model for the following iteration.
Operatively, three iterations were used in the reported experiments.
The matching algorithm used here is based on the Multi-Photo Geometrically Constrained Matching -MGCM (Baltsavias, 1991) adapted to deal with dense reconstruction of 3D surfaces (not only 2.5D) for close-range applications.
For this reason, it has been renamed MGCM+.Although the MGCM algorithm is more than twenty years old, probably it is still one of the most precise and reliable methods for image coordinate measurement.However, in order to improve its performances in the case of large and complex 3D objects, some improvements were needed: a proper selection method for the 'master image', the choice of the images to be matched together, the definition of reasonable approximate parameters in the linearized least squares problem, etc.Using the previously defined coarse-to-fine approach with an approximate model, some of the previously mentioned problems can be solved as described in Previtali et al. (2011).
In fact, starting from a coarse model of the surface the 'master image' can be selected considering the image scale and the convergence angle.In a similar way the images to be matched together can be chosen using some visibility considerations preventing blunders due to the use of occluded parts.The problem of approximate parameters can be partially overcome using OE parameters and refined approximate models in different iterations.
The point cloud of the considered basilica was compared with a laser scanning dataset acquired by a Riegl LMS-Z420i (www.riegl.com),obtaining a discrepancy of about ±7 mm after the alignment with the ICP algorithm (Besl and McKay, 1992 -Fig. 8).This value is quite similar to the nominal precision of the laser scanner employed.

CONSIDERATIONS AND CONCLUSIONS
The use of gnomonic projections seems a promising field of research and does not modify significantly the traditional processing methods based on pinhole images.The needed hardware is surely more cumbersome, as a tripod and a rotating head are mandatory to obtain precise results and remove parallax errors.The direct use of a hand-held digital camera is not here considered, although this case would deserve to be investigated in the future.At the current development stage, gnomonic projections are created with image matching techniques.This requires an overlap between consecutive images and well-textured objects.This limitation can be overcome with motorized heads able to provide the rotation matrices of all single pinhole images.In this case it is possible to obtain projections with the same set of calibration parameters (even the sensor size) since a constant acquisition procedure can be replicated for all poses.Moreover, this kind of head is already available on the market and can be purchased for less than 1000 €.An alternative solution could be a robotic theodolite coupled with a camera.In this case, the theodolite could be directly used for image registration into a geodetic reference system.
It is important to underline that all calibration parameters of each single projection are known.The focal length f and the pixel size p are constant whereas the sensor size is increased depending on the rotation during image acquisition.If we consider a full frame sensor (e.g.Nikon D700 with a pixel size of 8.4 µm) and a 35 mm lens, which is a quite standard configuration for real projects, we can assume that the gnomonic projection created with 200 mm lens gives a new image (using a similar field of view) of about 23800×23800 pix.This simple consideration stresses the potential of the method and makes easier to understand that the use of very long focal lenses (e.g.600 mm -ca 71400×71400 pix) can produce images that cannot be easily processed with standard PCs (for this reason no longer focal length than 90 mm lens has been in the reported experiments).The head is also useful for direct geo-referencing.If (at least) three non-collinear station points are available, the use of a GNSS antenna provides coordinates in a global reference system.It is well-known that the RTK modality has a precision of a few centimetres, which however can be improved (millimetres) with static surveys.The opportunity to setup the camera on a geodetic tripod and the use of several tripods allow one to interchange different sensors (camera, GNSS, theodolite, terrestrial laser scanner), like during a standard survey.The adjustment is not here intended as an absolute orientation with a similarity transformation, but pseudo-observations are used to (i) remove the rank deficiency and (ii) to control block deformations, as shown in the example with theodolite data (Subsect.31.).The pipeline for 3D processing follows a coarse-to-fine approach to exploit the full potential of these images.Our solution uses the LSM algorithm during the orientation and surface reconstruction phases.Initial matches extracted from low resolution images with the SURF operator are considered as approximate locations and are then refined.The sub-pixel precision after bundle adjustment confirmed the validity of processing algorithms and the correctness of the mathematic models.
Obviously, the method is very attractive not only for 3D modelling, but also for metric rectification and especially for building facades.Many shots can be reduced to few (even just one) projections that can be processed quickly.Other similar interesting applications are the analysis of flat-like objects (e.g.paintings) where high resolution orthophotos can be produced.
To conclude, the direct use of this technique could overcome many limitations of traditional pinhole images.It is also noteworthy how a combined bundle adjustment (pinhole and gnomonic) is feasible as the general formulation does not change significantly.This means that gnomonic projections could be employed to reconstruct fine details whereas pinhole shots could provide a general network around the object.Further experiments will be carried out to try out the feasibility of this combined adjustment.

Fig. 2 .
Fig. 2. Generation of a gnomonic projection, i.e. a non conformal mapping where great circles are mapped to straight lines.It replicates the effect of image acquisition by means of spherical lenses.

Fig. 3 .
Fig. 3.In the upper rows the images (#28) used to create a high resolution mosaic of an ancient church in Tresivio (Valtellina, Italy).In the lower row, from left to right, the calibrated head for image acquisition and the global image (6512×8900 pix) with a zoom showing the achieved level of detail.

Fig. 8 .
Fig. 8.Comparison between the point cloud extracted from the projections and laser scanner data: the discrepancy is 7 mm.The colour bar ranges from -0.03 to 0.03 m.