UAV LARGE OBLIQUE IMAGE GEO-LOCALIZATION USING SATELLITE IMAGES IN THE DENSE BUILDINGS AREA

: For UAV large oblique image geo-localization in the dense buildings area, there are still two main challenges. One is the presence of obvious occlusion and large viewpoint differences in UAV images, and the other arises from the fact that reference images, particularly orthographic satellite images, lack façade information of man-made structures (such as buildings and roads), which is crucial for UAV large oblique images. Most of existing image-based geo-localization methods only address the first challenge, neglecting the interference brought by the second challenge, especially for UAV large oblique image geo-localization in the dense buildings area. Motivated by both these two challenges, we have proposed a novel method for UAV large oblique image geo-localization in the dense buildings areas, with the segments direction statistics (SDS) features and their histogram descriptors designed. By considering both the local and global features of man-made structures, the proposed method effectively addresses the significant information difference encountered in cross-view image matching. We conducted experiments on both the public UAV images dataset University-1652 and our own collected dataset of UAV large oblique long focal whiskbroom (LO-LF-W) images. Comparative analysis with state-of-the-art (SOTA) methods demonstrated that the proposed method improves the geo-localization accuracy by approximately 10%. Furthermore, the proposed method exhibits greater robustness to noise and changing orientation of reference images, making it particularly well-suited for dense buildings areas that pose challenges for existing methods.


INTROSUCTION
Image-based geo-localization is to determine the approximate location of UAV images, which plays a crucial role in various UAV-based applications such as border management, building monitoring, and urban modelling.(Berton et al., 2022;Zhou et al., 2021) However, existing image-based geo-localization methods face challenges when applied to UAV large oblique images, especially in urban areas with dense buildings.There are several reasons for this limitation.Firstly, UAV oblique images significantly differ form orthographic images in terms of viewpoint, scale and resolution.Secondly, orthophotos lack detailed faç ade information of buildings that is present in UAV oblique images.This absence of faç ade information hampers the matching of cross-view images, leading to reduced accuracy in UAV image-based geo-localization.
The image-based geo-localization methods for UAV images can be categorized into two main groups: methods utilizing handcrafted features and methods leveraging deep learning.(Yao et al., 2019) Existing handcrafted features are often designed for specific types of scenes, making them less applicable to unknown or novel scenes.Consequently, there is a lack of suitable handcrafted features to address the challenges of image-based geo-localization in dense buildings areas.Moreover, gathering training datasets specifically for UAV large oblique images poses significant challenges.The scarcity of such datasets limits the accuracy of existing deep learning networks, as they are predominantly trained on a limited number of publicly accessible datasets.
The term "UAV large oblique image" in this paper refers to images captured at an oblique angle greater than 32 degrees.Specifically, in our own collected Dataset2, the oblique angle of UAV images is approximately 60 degrees.
We mainly made the following three contributions in this paper: 1. We have introduced feature filters to address the interference segments caused by faç ade information, which is crucial aspect of our method for solving geo-localization in dense buildings areas.
2. We have designed SDS features and histogram descriptors as the stable features for UAV large oblique images geolocalization with obvious viewpoint and scale gaps.
3. Instead of soring original reference images, we opted to store extracted feature descriptors, which significantly reduces the memory and computation costs on the online platform.

RELATED WORK
The airborne POS (Position and Orientation System) has emerged as a popular strategy for UAV positioning.However, the large oblique angle of UAV images amplifies the influence of POS data errors on image-based geo-localization.Additionally, the limited load capacity of UAVs often prevents them from carrying high-precision POS equipment.Therefore, UAV image geo-localization has gained significant attention, focusing on image matching between UAV images and orthographic reference images with geo-tags, typically satelliteview images or Digital Ortho Maps (DOMs).
Early research on image-based geo-localization primarily focused on ground-view images collected by mobile platforms, and it allowed for image localization in environments where GNSS (Global Navigation Satellite System) signals are unavailable or unreliable.(Zamir et al., 2014) Compared to ground-view images, satellite-view images offer a closer view range that closely aligns with UAV images.Moreover, satelliteview images are less affected by occlusion and moving targets, making them valuable reference images for UAV image geolocalization.(Couturier et al., 2021) Due to significant differences in flight height, acquisition time and shooting pose between UAV and satellite images, existing image-based geo-localization methods that rely on handcrafted features demonstrate poor performance.As a result, researches have directed their efforts towards enhancing these traditional features or exploring approaches that combine multiple features for effective cross-view image matching.(Goel et al., 2022) combined extracted SURF (Speed-Up Robust Features) and FLANN (Fast Library for Approximate Nearest Neighbours) feature matcher to improve the accuracy of geo-localization.(Hasheminasab et al., 2021) improved SIFT (Scale-Invariant Feature Transform) with image consistency check to realize the UAV image geo-localization with DOM.However, the above improvements on existing handcrafted features still retain the limitations for images with large viewpoint differences.(Xu et al., 2021) designed a novel contour line feature descriptor, which can be used for image matching with deformed UAV images and reference images.While handcrafted features typically demonstrate high geo-localization accuracy and efficiency for small oblique images in non-building areas, they often encounter limitations when applied to UAV large oblique images due to the presence of faç ade information.Consequently, it becomes necessary to develop novel feature patterns, descriptors and matching strategies that are specifically tailored for UAV large oblique images.
Since deep learning networks are able to obtain more advanced features and descriptors, learning-based UAV image geolocalization has become a popular research topic.(Ding et al., 2021) transformed the image-based geo-localization into image classification task.(Zhang et al., 2020) extracted RCF (Richer Convolutional Features) generated feature maps and determined the similarity of UAV-view and satellite-view images.Meanwhile, to reduce the memory cost, many studies take pretraining or coding reference images into account.(Bianchi et al., 2021) encoded Google maps automatically, and compressed them into dimensional vectors and key features.Since CNNbased methods lead to information and features loss with downsampling, feature extraction and matching network based on Transformer framework has been applied in image-based geolocalization.(Zhuang et al., 2022) proposed a Transformerbased network to extract more image context features, and matched them by the semantic guidance module (SGM).As another kind of strategy, (Tian et al., 2021) transformed UAV oblique images into nadir view images closer to satellite images by PPT (Perspective Projection Transformation), and used the local feature pattern network for image-based geo-localization.However, the application of the aforementioned deep learning networks is challenging due to the scarcity of datasets specifically tailored for UAV oblique image geo-localization.As a result, these networks face limitations in terms of generalizability and may struggle to perform effectively in scenarios beyond the scope of publicly available datasets.
In summary, for above handcrafted features based and learningbased geo-localization methods, they obey the development trend of low memory and computation costs.(Ma et al., 2021) Most existing methods mainly solve the problems caused by viewpoint differences.However, in densely built-up areas, the faç ade interference information can also easily lead to mislocalization, which is rarely considered in existing methods.To overcome these difficulties, we designed SDS features and their histogram descriptors, for accurate and efficient UAV large oblique images geo-localization.

METHODOLOGY
For UAV large oblique images, image-based geo-localization faces two primary challenges.Firstly, the large oblique viewpoints result in image deformation, leading to poor matching results.Extensive research has been conducted to address this challenge.Secondly, UAV large oblique images contain abundant faç ade information that is rarely found in orthographic reference images.This faç ade information poses a significant interference to the topological consistency of corresponding targets, especially in dense buildings areas.To overcome these challenges, particularly the latter, the proposed UAV large oblique images geo-localization method consists of three main components: contour segments extraction, SDS feature and descriptor, and image-based geo-localization.

Contour Segments Extraction
For extracting contour segments from images, the LSD (Line Segment Detection) algorithm is widely employed due to its favourable performance in accuracy and efficiency.In the proposed method, the target feature segments are the contour segments of building top surfaces and roads.As a result, the segments extracted by LSD serve as the initial extracted specifically vertical features.To enhance accuracy, filters are designed to eliminate interference segments, specifically vertical lines of buildings, from the initial extraction results.

Segments Length Filter:
Due to the influence of noise, occlusion and shadow, there are plenty of scattered short segments in initial extracted results, which greatly interfere with statistics and description of features, and are necessary to be screened out.Segments length filter is usually used in this case, and it generally uses the average length or the maximum length to set the filter threshold.However, the contour segments extracted from UAV large oblique images contain both building and road contours with large length differences.Therefore, existing length filters could wrongly screen out many valuable contour segments due to long road contour segments.Since the interference segments still account for a mall proportion in the initial extracted results, a length filter based on the median length is constructed, as shown in formula (1): where i L = the length of the th i segment m L = the median length of extracted segments  = the set filtering threshold

Facades Segments Filter:
In the extracted segments of UAV large oblique images, the interference features also include contour segments of architectural facades, which does not exist in the orthophotos.Architectural facades information is widely used in the field of buildings reconstruction.However, this faç ade information on UAV large oblique images leads to great interference on image-based geo-localization using orthographic reference images.It is necessary to filter out these segments of facades in the initial extracted ones before crossview image matching.Meanwhile, though low-precision POS directly assisted geo-localization shows poor accuracy with UAV large oblique images, it still can be used to provide constraints for architectural facades segments filter in geolocalization.The vertical segments are the significant feature of faç ade information, which should satisfy the constraints like ∆X⁄∆Z→0 and ∆Y⁄∆Z→0.In the proposed method, the ratio threshold is set for the faç ade segments filter, as formula (2): where 12 ,  = the set filtering thresholds X , Y , Z    = the coordinate difference of endpoints Since the faç ade constraint belongs to WCS (World Coordinate System) while the segments feature is located in IPCS (Image Plane Coordinate System), the transformation between these two coordinate systems is determined with the collinear condition equation, as shown in formula (3): Due to the poor geometric intersection conditions of UAVbased large oblique images, the location directly solved are not accurate.However, we find that the spatial relationship between these locations is still stable, which can be used to filter out the faç ade information.With only the contour segments of building top surface retained, the faç ade segments are screened out.So far, based on the initial extraction segments by LSD, the designed filters can screen out interference like short segments and architectural facades segments.Contour segments of road and building top surface are extracted, as shown in Fig. 1.

SDS Feature and Descriptor
Due to the substantial disparities in viewpoint, resolution and scale between UAV large oblique images and orthographic images, the existing features of the same target exhibit noticeable differences, resulting in reduced accuracy in geolocalization.To address these limitations, this paper introduces the SDS (Segments Direction Statistics) feature and its corresponding descriptor.The SDS feature comprises both a local part and a global part, specifically designed to mitigate the impact of the impact of these significant differences on UAV large oblique image geo-localization.

Local SDS:
With large viewpoint and scale gaps, there are significant differences in length and direction of extracted homonymous segments.It is difficult to apply segments features for cross-view image matching directly, but the statistics of segments direction shows stable for image translation, rotation and other deformations.Therefore, SDS is designed in this paper as features to be matched.For the extracted contour segments    ( ) Since there is a large resolution difference between UAV images and orthographic images, the window size should be set inversely proportion to the resolution, which is able to ensure the scale invariance of the local SDS feature.

Global SDS:
With the local SDS feature, it can ensure the robustness when there are only simple deformation and scale difference between images.But for UAV large oblique images, the deformation is more complex.(Wu et al., 2021) verified that the angle between feature lines can be used as a stable feature for matching the images with large deformation.Therefore, it can be applicated to construct the global SDS feature in the proposed method.The angle between contour segments can be determined with formula (7): where i  = the angle between contour segments Due to the specific segment slopes, there are several cases where i  needs to be determined separately, as shown in formula ( 8): where nan = this parameter cannot be obtained correctly Similarly, i  of contour segments is counted to form the global SDS feature, which is described by its histogram descriptor, as shown in formula (9):

Image-Based Geo-Localization
In this paper, UAV large oblique image geo-localization is to obtain the location of UAV images through matching them with the orthographic reference images.Based on SDS feature and its descriptor, the similarity measure is designed as formula (10): By determining the similarity between UAV images and orthographic reference images with formula (9), geo-tags with the highest similarity can be obtained as the location of UAV images, which means geo-localization has been completed.However, with some interference features in UAV large oblique images caused by viewpoint and scale gaps, mismatching is hard to be avoided in image-based geo-localization.With UAV sequence images, the bilateral matching strategy is designed to improve the robustness of the proposed method, as formula ( 12): ,SAT is the main factor and ( )

S SAT
,UAV plays a fine-tuning role,  can usually be set within 0.5.Similarity with bilateral matching can be used as a better geo-localization evaluation criterion.

Experimental Datasets
4.1.1Dataset1: University-1652: University-1652 is a UAV oblique image dataset provided by (Zheng et al., 2020), which is a popular public dataset for image classification, matching and image-based geo-localization.It contains images of 1652 buildings from 72 universities around the world.Each target includes images from three viewpoints: UAV-view, satelliteview and street-view, as shown in Fig. 3. UAV-based image geo-localization experiments mainly use UAV-view and satellite-view images in this paper, usually with 54 UAV oblique images and a satellite-view image of each target.

Dataset2: UAV LO-LF-W Images:
Different from UAV-view images in the University-1652, there is a significant gap in the actually collected UAV large oblique image.In order to verify the reliability of the proposed method in real applications, UAV large oblique long focal whiskbroom images (Ye et al., 2022) are collected for geo-localization experiments.As shown in Fig. 4, the oblique angle of these images is about 66°, and the flight height is more than 4 km, with smaller scale and lower spatial resolution than Dataset1.Therefore, it is much more challenging for image-based geo-localization on this actual image dataset.This dataset is collected in Weinan city, Shaanxi province, China.Similarly, Google Map products are obtained as the orthographic images with geo-tags.

Evaluation Indicators:
In most existing image-based geo-localization researches (Patel et al., 2022), Recall @ K is a popular accuracy evaluation indicator, which reflects the correct geo-localization result occurring in the top K results of matching score.
K is often set to 1, 5 and 10.The average precision of image retrieval AP is another popular evaluation indicator for image-based geo-localization, which shows the area under the precision-recall curve, and can be determined with formula (14): where i P = the precision of the previous th i images TP = the number of correctly geo-located images FP = the number of incorrectly geo-located images

Geo-Localization Accuracy Evaluation on Dataset1:
Three UAV-view images and the top five satellite-view images are selected from Dataset1, to show the geo-localization results with the proposed method, as shown in Fig. 5.The images with green border are the correctly matched satellite-view image, and images with the red border are the mismatched results.
On Dataset1, the geo-localization accuracy of the proposed method is evaluated and compared with Zheng's model (Zheng et al., 2020) and Ding's model (Ding et al., 2021).The experiments are carried out with 37854 UAV-view images, and the accuracy of geo-localization is counted in Tab. 1.Compared with Zheng's model, the proposed method shows the improvement of 16.66%, 8.80% and 16.89% on Recall @1 , Recall @5 and AP ; Meanwhile, compared with Ding's model, the proposed method shows the improvement of 8.50%, 3.96%, 4.01% and 9.20% on Recall @1 , Recall @5 , Recall @10 and AP .On Recall @1 and AP indicators, the proposed method shows obvious advantages over the other two models.To achieve automatic geo-localization of UAV images, Recall @1 needs to be close to 100%.However, due to the large differences between oblique images and orthographic images, the existing methods, including the proposed method, are still difficult to reach this ideal goal.
Recall @5 and Recall @10 of the proposed method reach about 90% and 95%, which means that the vast majority of correctly geo-located results will occur in the top 10 or even the top 5 images in the matching score.For the man-machine interactive UAV image geo-localization, compared with selecting among a large number of UAV images, the efficiency and accuracy of geolocalization can be greatly improved with the proposed method.

Geo-Localization Accuracy Evaluation on Dataset2:
Similarly, three UAV LO-LF-W images and their top five orthographic images are selected from Dataset2, to show the geo-localization results with the proposed method, as shown in Fig. 6.With the proposed method, the geo-localization accuracy of 150 UAV image blocks in the UAV LO-LF-W images is evaluated, and the results are shown in Tab. 2. Due to the larger viewpoint gap of images on Dataset2, the scene is more complex and the resolution is greatly different from that of orthographic images.The accuracy of geo-localization on this dataset is significantly lower than that of Dataset1, especially with an obvious decrease of about 20% on Recall @1 .However, most methods (including Zheng's model and Ding's model) that perform well on public UAV large oblique images datasets even fail on Dataset2.Therefore, on the challenging Dataset2, the geo-localization accuracy of the proposed method is relatively high.Moreover, the Recall @10 of the proposed method on Dataset2 can still reach 85%.Compared with the existing UAVbased image geo-localization methods used in engineering such as SIFT, ASIFT and HAPCG, the accuracy and efficiency of the proposed method have been improved.

Orientation and noise of nonstandard reference images:
The two experimental datasets in this paper use the satellite-view images provided by Google Maps, which possess consistent orientation and quality.However, real-world scenarios often involve nonstandard reference images that exhibit varying orientations and significant noise.To assess the geo-localization accuracy of the proposed method under such conditions, the reference images in the two datasets have been rotated and subjected to noise augmentation.These modified reference images, featuring nonstandard orientations and Gaussian noise, are then used as input for the original dataset.
Subsequently, the geo-localization accuracy of the proposed method has been evaluated as shown in Fig. 7.The results indicate that the geo-localization accuracy of the proposed method experiences minimal fluctuations, about 1%, when reference images are subjected to rotation and Gaussian noise.Interestingly, the accuracy demonstrates both slight improvements and declines.This highlights the robustness of the proposed method in handling variations in reference image orientation and noise.Consequently, the method exhibits favourable suitability for practical applications involving multisource reference images with diverse orientations and image qualities, facilitating UAV large oblique image geo-localization.

Artificial feature abundance:
With designed SDS feature constructed by the contour segments of buildings and roads, the abundance of these artificial features should be one of the important influencing factors.Since the UAV images on Dataset1 are taken around single of a few buildings, the artificial feature abundance is much higher.However, for images of Dataset2, the artificial feature abundance is lower.Therefore, it is assumed that the artificial feature abundance is the main reason for lower accuracy of geo-localization on Dataset2.The artificial feature abundance is described by the proportion of artificial feature pixels on UAV images, determined with formula (15): where abd = the artificial feature abundance of images a PN ,PN = the number of artificial feature pixels and the whole image pixels The artificial constructions on the UAV oblique image are segmented, and the number of pixels it contains is counted.
According to the artificial feature abundance, the UAV images of Dataset2 can be divided into four intervals: {0~25%, 25%~50%, 50%~75%, 75%~1}.The average and standard deviation of the geo-localization accuracy in different features abundance intervals have been obtained respectively, as shown in Fig. 8.When the artificial feature abundance belongs to 0~25%, the geo-localization accuracy of the image is below 10%, that is, it is difficult for the proposed method to obtain accurate geo-localization results of UAV large oblique images with rare buildings areas.However, when the feature abundance is higher than 50%, the geo-localization accuracy of the proposed method is higher than that of the entire Dataset2.Moreover, when the feature abundance reaches 75%, the geolocalization accuracy evaluation index Recall @10 can reach 90%, indicating that when the feature abundance is high enough, the proposed method can well meet the geo-localization requirements of challenging UAV LO-LF-W images.It shows that the abundance of artificial features is the key influencing factor of the proposed geo-localization method.In summary, for UAV large oblique images, the proposed method shows high accuracy and robustness, especially in the challenging scenes with dense buildings.On the public dataset University-1652, the geo-localization accuracy of the proposed method is much higher than existing methods; On the actually collected UAV LO-LF-W images, the proposed method can still obtain reliable geo-localization results, while many other methods that perform well on the public dataset are hardly to solve the image-based geo-localization on these actually collected images.Moreover, for reference satellite-view images with different orientation and noise, geo-localization results obtained by the proposed method are very stable, which proves that it can use challenging multi-source nonstandard reference images to realize UAV large oblique images geo-localization.
And the accuracy of geo-localization depends on the abundance of artificial features.

CONCLUSIONS
Accurately geo-locating UAV large oblique images remains an immensely challenging task due to the substantial viewpoint differences and the presence of considerable faç ade interference information, particularly in dense buildings areas.To address these challenges, this paper presents a novel method for UAV large oblique image geo-localization.The proposed method incorporates feature filters to eliminate interference features, introduces SDS feature patterns for extraction and description, and employs a bilateral similarity measure strategy for matching and geo-localization.The proposed method fully addresses the robustness of the designed features in handling image matching across varying viewpoints and scales while considering both local and global features.In addition to focusing on the significant viewpoint differences addressed by existing methods, the proposed method also accounts for the interference caused by architectural faç ade information, resulting in improved geolocalization accuracy.Experimental results show that the proposed method can better serve the geo-localization for UAV large oblique images, especially in the dense building areas.
Meanwhile, since the proposed method is based on the segment features of artificial structures, it is hard to obtain accurate geolocalization results for natural landforms or areas with sparse buildings and roads.Therefore, we will consider to expand the application scenes into scenes with more complex or sparse features as our future work.
the IPCS is established with the lower left corner as the origin point.The slope of these segments are obtained, shown in formula (5): , x , y = the endpoints of segments Due to the large number of segments and uneven distribution density, segments in dense building areas are counted repeatedly.Therefore, the images are divided by set windows, and the major value of i  in the window is determined as the local SDS feature.The local SDS features are counted as 16 intervals (through lots of experiments, SDS with 16 intervals show the best geo-localization results), and each window provides the largest number of statistics.The local SDS features are counted to establish its histogram descriptor, as shown in formula (6): histogram descriptor of local SDS j i  = the local SDS feature of the th j image window histogram descriptor of global SDS For extracted contour segments, the SDS feature including the local and global parts is described with
feature histograms of UAV oblique images and satellite orthographic images In order to highlight the effective features and suppress the interference information, two weight parameters L p and G p are introduced as similarity measure, as shown in formula (11): SAT ,S SAT ,UAV = the matching similarity  = the set relaxation factor Since ( ) S UAV

，
can also reduce the labour cost in the manmachine interactive UAV-based image geo-localization task.total number of geo-localization images N Recall = the number of correctly geo-located images

Figure 7 .
Figure 7. Geo-localization accuracy on original and augmented datasets.

Figure 8 .
Figure 8. Geo-localization accuracy on images with different artificial feature abundance (abd).

Table 2 .
Comparison of geo-localization accuracy of our method on Dataset1 and Dataset2.