Adaptive parameter local consistency automatic outlier removal algorithm for area-based matching

Due to the influence of image differences and matching methods, geometric calibration of remote sensing images often results in the extraction of control points with inevitable outliers. Moreover, it is susceptible to limitations imposed by locally constrained outlier rejection methods, making it challenging to automatically remove relatively small gross errors. This paper introduces an adaptive parameter local consistency automatic outlier removal algorithm, referred to as APLC. Initially, we construct k-nearest neighbors for each pair of matching points, deriving distance and topological uncertainty based on the accuracy of point matching. Subsequently, we conduct cross-validation on the uncertainty between the two pairs of vectors formed by points within the neighborhood, aiming for parameter adaptation. Finally, a cost-defined function is introduced to assess the consistency of local structures. Through a two-stage outlier removal strategy, matching points that do not maintain local structural consistency are eliminated. To assess the effectiveness of the proposed algorithm, we conduct experimental comparisons using region-based initial matching results from the FY-3D remote sensing dataset, demonstrating its superiority compared to three state-of-the-art methods.


Introduction
Image matching is employed to find an optimal set of corresponding points from two images with overlapping regions, and it is widely utilized in remote sensing tasks such as image stitching, image fusion, change detection, and 3D reconstruction.Matching methods are typically categorized into feature-based and region-based approaches (Zitová and Flusser, 2003).Feature-based methods describe detected features and compare the similarity of feature descriptors, constructing assumed correspondences.Area-based methods search for matching information within a certain-sized window around the overlapping region of two images based on the original pixel intensity.In geometric calibration of remote sensing images, achieving high precision in control point matching is often crucial (Heikkila, 2000).Typically, area-based methods are employed to obtain high-precision control points，it is more sensitive to matching outliers, and small matching errors can easily lead to unsatisfactory calibration results.However, areabased matching methods typically produce outliers deviating not far from the correct location, usually within around ten pixels or even smaller, depending on the window size and modal differences in the image data, as illustrated in the Figure 1.Dealing with such outliers is highly challenging as they are non-rigid and quite random, making it difficult to find a unified transformation both globally and locally.Thus, removing these outlier matches poses a significant challenge.
Due to the influence of image quality, modal differences, and the robustness of matching algorithms, mismatches of different proportions are inevitably generated in the matching process.The automatic elimination of mismatched points is a crucial step as it directly impacts the accuracy of post-processing for remote sensing products.Existing methods can be categorized into two types: global geometric constraint methods and local geometric constraint methods (Jiang et al., 2020a).In the field of global constraint parameter methods, Random Sample Consensus (RANSAC) (Fischler and Bolles, 1981) is the most typical representative.It involves iteratively selecting a random subset of input data, fitting a model, and returning the model with the highest support.These hypothesis and verification methods have been successfully applied to various visual and remote sensing tasks.A comprehensive review of these modifications has been undertaken, with notable contributions from methodologies such as USAC (Raguram et al., 2013).Raguram et al. provided a comprehensive discussion on the adjustments made to RANSAC for robust outlier removal in the context of remote sensing image matching.The idea of local optimization is already incorporated in the latest methods.Graph-Cut RANSAC (GCRANSAC) employs the graph-cut algorithm in the local optimization step to separate inliers and outliers, which aims to find a better application model (Barath and Matas, 2018).Among other global constraint elimination methods, Aguilar et al. introduced a point matching approach known as Graph Transformation Matching (GTM) (Aguilar et al., 2009).This method involves searching through a consensus graph derived from initial matches, iteratively eliminating suspicious matches, and enhancing the similarity between two graphs.In these algorithms, for remote sensing image matching, graph matching techniques have been employed to iteratively remove outliers globally by calculating confidence probabilities using graph nodes.However, this often demands a high computational cost.
Global constraint methods are mostly sensitive to the correct proportion of original samples, requiring the estimation of a universal transformation model across the entire dataset.They exhibit less flexibility and generality compared to local constraint approaches.On one hand, local geometric constraints can describe variations in different local contexts, as they rely solely on the local spatial relationships between adjacent feature points.On the other hand, local geometric constraints do not necessitate estimating the transformation model from the entire set of initial matches dominated by outliers, which can enhance the efficiency of outlier removal (Jiang and Jiang, 2019).Numerous methods have been proposed based on local geometric constraints (Ma et al., 2015;Ma et al., 2018a), among which Local Preserving Matching (LPM) (Ma et al., 2018b) Figure 1.Parallax comparison before and after matching stands out as the most representative: it is concise, robust, and effectively eliminates erroneous matches.By considering the length ratio and vector angle between corresponding feature points in two images, local topological consensus is defined as the cost of filtering out inconsistencies in spatial neighborhood structures between matched feature points.This method has proven to be effective in enhancing the robustness and accuracy of matching processes.To enhance the discriminative power of local constraints for similar local structures, Shao et al. employed a descriptor to identify repetitive patterns, ambiguous features, and similar local structures (Shao et al., 2021).Ma et al. took into consideration the stable neighborhood topology of potential true matches, further improving matching performance in scenarios of severe data degradation (Ma et al., 2022).Jiang et al. designed a local graph structure that preserves geometric topology.Through local graph structure consensus (LGSC), it is effective in removing outliers introduced by feature matching (Jiang et al., 2022).However, these methods are often designed to handle feature-based initial matches, with a notable characteristic being that errors in matches, particularly those generated by methods like the Scale-Invariant Feature Transform (SIFT) (Lowe, 2004), tend to deviate significantly from their correct positions.In a few instances, researchers observed that feature matches could deviate slightly from their correct positions during feature detection.Dusmanu et al. corrected for this deviation during the adjustment process, although this required a significant computational overhead (Dusmanu et al., 2020).A learning-based method, called Mismatch Reduction Learning (LMR) (Ma et al., 2019) , was proposed utilizing consensus within local neighborhood structures, treating mismatch reduction as a binary classification problem.All of these algorithms rely on the similarity of local structures to identify consistent KNN graphs for two structures with similar patterns.Jiang et al. transformed feature matching into a spatial clustering problem with outliers, introducing Robust Feature Matching using Spatial Clustering (RFM-SCAN) (Jiang et al., 2020b).
In this paper, we observed that local constraint methods, such as LPM, face challenges when matching outliers are in the vicinity of their correct positions.This situation often arises from initial matches obtained through area-based matching methods.The threshold parameters for the topological structure in these algorithms are difficult to adapt to all local scenarios as shown in Figure 2, resulting in the inability to identify these erroneous matches.Therefore, starting from the precision of matching points, we derived uncertainties associated with distance and topological constraints.By cross-verifying two pairs of matching points within the neighborhood, we calculated a cost penalty for the central point.This approach realizes parameter self-adaptation for local constraint consistency, effectively identifying abnormal matches.This lays the foundation for subsequent geometric calibration, bundle adjustment, and the generation of surveying products.

Methodology
From area-based matching methods such as least squares matching (LSM) (Gruen, 1985), two initial corresponding point sets are extracted from two remote sensing images.Due to the physical constraints of a small region around the points, the local neighborhood structures between feature points may not undergo unrestricted changes (Ma et al., 2018a).This implies that the transformed points should preserve the locally corresponding structures.For each point pair in the point sets, the local 8 nearest neighbors constructed through the K-Nearest Neighbor (KNN) are shown in Figure 2.
Just as proposed in LPM, there exists a consensus in the neighborhood topological structure, as illustrated in Figure 2, Ma et al. defined the consensus of neighborhood topology by considering the ratio of length to the angle between i v and j v : Then defined a quantified distance between i v and j v , with a predefined threshold  as follows, ( , )   i j d v v represents the cost associated with neighboring points under the topological consensus condition: 0, ( , ) ( , ) 1, ( , ) For the initial matching of n point sets, although the locally corresponding structures are consistent for each pair of points in their constructed neighborhoods, there are subtle differences between different point pairs.The neighborhood topological structure is more sensitive to this, and the threshold parameter  is challenging to adapt to all structural variations.As indicated by the red neighboring points in Figure 2, this leads to the challenge of fixing  as a threshold, as the angle between i v and j v varies.Consequently, distinguishing the red points, which have a slight displacement, from the correctly matched green points becomes difficult.It often fails to effectively identify small, coarse outliers based on region-specific anomalies.Our objective is to make the threshold parameter  adaptive, allowing it to accurately characterize the consistency of local structures when constructing different local neighborhoods.

Parameter Adaptation for Local Consistency
In an ideal scenario, if only translational transformations exist between images, the threshold  should ideally be approximately equal to 1.However, due to varying local deformations, which can be either rigid or non-rigid, and the influence of matching accuracy, establishing a fixed and uniform threshold becomes challenging.Our core idea is to begin with the precision of the matches, deriving uncertainties in neighborhood topological consistency and neighborhood distance consistency from the uncertainty in coordinate accuracy.Subsequently, by cross-checking two pairs of vectors within the neighborhood, as illustrated , In order to make this parameter applicable to all neighborhood structures and better describe the consistency of neighborhood structure, our approach is to start from the matching accuracy of the initial set of matching points, as shown in Figure 3.In the Figure 3, i p and i q represent matched point pairs, j p and j q are corresponding neighboring feature points, and , , , i j p i j q i j v v v v are the vectors formed by the respective corresponding points.Here, we define some coordinates : ( , ) q x y , ( , ) j j j p p p x y , ( , ) j j j q q q x y .The precision of the matching homologous point row and column coordinates are denoted as x  and y  , respectively.Prior to this, some symbols and meanings are predefined: q p pq q p p q q p pq q p pq p q p q p q p q x x dx y y dy x x dx y y dy dx dx dy dy ds The neighborhood topological consistency ( , )   i j s v v can be calculated using the coordinates as: Linearizing and taking the first-order partial derivative for each coordinate yields: p q p q p q p q p q p q p q p q p q p q p q i Then, by utilizing matching precision and the law of error propagation, we can solve for the uncertainty of ( , ) Similarly, the uncertainty of ( , ) can also be determined.The consistency between the two is theoretically established, allowing for the imposition of constraints and penalties on the local level: 0, ( , ) ( , ) ( , , , ) 1, ( , ) ( , ) where is utilized for adaptive consistency thresholding in different local contexts, N is a positive integer, typically taken as 3 or 5.
Similarly, the neighborhood distance consistency ( , ) i j t v v can be calculated using the coordinates as: Linearizing and taking the first-order partial derivative for each coordinate yields: , i i i i j j j j j j j j j j j j p q p q p q p q p q p q p q p q i i i i j j j j p i j p i j q i j q i j p p j j q q j j dx dy Similarly, through the law of error propagation, we can calculate the uncertainty ( , ) t v v based on the precision of the coordinates.Similarly, we calculate the uncertainty of ( , )   pij qij t v v using the same method for crossverification.The theoretical consistency between the ( , )   i j t v v and ( , ) is valid, and a penalty is applied within the neighborhood to achieve: 0, ( , ) ( , )  ( , , , ) 1, ( , ) ( , ) where is utilized for adaptive consistency thresholding in different local contexts, N is a positive integer, typically taken as 3 or 5.

Problem Formulation
If the point sets P and Q are perfectly matched, their neighborhood structures should overlap.Therefore, the definition is to find a graph of two consistent local structures, which can be expressed as: arg min ( ; ) where I is the internal set with the maximum congruence and the minimum similarity of local structures.S represents the assumed initial correspondence expressed, and * I is the optimal solution.The cost function is defined as follows: where: The cost values can be calculated using Equations ( 6) and ( 9).

Outlier Removal Strategy
Based on the threshold parameter adaptation and cost function, a two-stage outlier removal strategy is designed and outlined in the algorithm.In each stage, two KNN graphs are established using assumed correspondences.Subsequently, after calculating the cost for each assumed correspondence, those correspondences with a cost exceeding the threshold are considered outliers and removed.These steps are repeated until the cost for all correspondences is below the given threshold.

Experiment and Analysis
In the experiments, the performance of the proposed algorithm was evaluated using six typical scenes of FY-3D's MERSI-II images and simulated images.Due to different error proportions and local geometric distortions in the initial matches obtained through LSM, and with the errors confined within the matching window range, excluding matching outliers posed a significant challenge.All algorithms were implemented on a computer equipped with an Intel i7-127000F processor (2.800 GHz, 16 GB memory).In this section, the proposed algorithm was compared with four other error removal algorithms: RANSAC, GCRANSAC and LPM.Qualitative matching results were presented first, followed by quantitative comparisons of the results from the six scenarios.Performance evaluation criteria included precision, recall, and root-mean-square error (RMSE) (Liu et al., 2012).The threshold was set to 0.9, while 2  was set to 0.5, 1  was set to 0.3,while 2  was set to 0.3.To construct the K-nearest neighbors (KNN) graph, k was set to 8. Calculating local consistency and uncertainty requires a minimum of four nearest neighboring feature points.

Datasets
The MERSI-II data of FY-3D is freely available from the National Satellite Meteorological Center.The Level 1 data with a 250 m resolution consists of six 250 m bands of Earth view data that have been co-registered, along with a Geolocation Table (GLT) with a 20-pixel interval.Each file is partitioned at 5-minute intervals and stored in the Hierarchical Data Format (HDF5), resulting in an image size of 8192 pixels in columns and 8000 pixels in rows.Additionally, a separate GLT file named "GEOQK" is provided, which includes latitude and longitude information for each pixel.During the geolocation processing, ephemeris, attitudes, and time information are required, and these are stored in the onboard calibrator (OBC) file.The eighth band of the Sentinel-2 Multispectral Instrument (MSI), with a central wavelength of 832.8 nm, was used as the cloud-free reference image obtained from the Google Earth Engine.The geolocation accuracy of the MSI non-refined Level-1C products is approximately 10 m at a 94.45% confidence level (Bouzinac et al., 2018).The spatial resolution of the reference images was degraded to 240 m by averaging pixel values, which is slightly better than the spatial resolution of MERSI-II nadir images.However, due to the changing Ground Sampling Distance (GSD) of MERSI-II with the view angle, the original images suffered from panchromatic distortions.To mitigate geometric distortions, a new reference image with Geolocation Table was simulated using the reference images.The RefSB4 band of MERSI-II was then matched with the simulated images.
The initial corresponding points were found using the normalized cross-correlation algorithm, and their coordinates were refined using LSM to achieve subpixel accuracy.A matching window size of 15 × 15 pixels was used to balance matching accuracy and error details. Figure 4 illustrates the initial matching results of six scene images considering different local deformations.Among them, the images of scenes 0445, 0520, 1120, and 1125 are results of matching before geometric calibration, exhibiting certain systematic errors.Conversely, the images of scenes 1905 and 0805 represent matching validation results after geometric calibration, eliminating systematic errors.The number of matched points varies from ten thousand to fifty thousand, with a mismatch point ratio ranging from approximately 0.4 to 0.9.Specific details are depicted in Figure 4.

Qualitative Matching Result Analysis
In this section, the matching results of scenes 0455, 1905, 1120, and 1125 are presented in Figures 5 and 6, serving as the qualitative evaluation outcomes.In these matching results, as they are based on area-based matching, the way matching lines are drawn does not effectively distinguish between correct and incorrect matches.These figures visually depict the contrast between two matching points, with darker colors indicating larger disparages, often signifying incorrect matches, while lighter colors suggest correct matches.Additionally, the accuracy of matching points can be inferred from the direction of these disparages.Points exhibiting consistent local magnitudes and directions of disparage are more likely to be correctly matched.In Figures 5 and 6, the four methods performed outlier removal on LSM initial matches with a significant error ratio, illustrating the comparison of the removal effects for four scenes.The initial matches include numerous error points deviating not far from their correct positions and error matches with inconsistent local disparity directions.In the results of RANSAC and GCRANSAC, not all outliers were removed, and there were still many error matches with inconsistent directions.LPM yielded the worst results among the four algorithms, possibly due to the non-universality of the local threshold.The original threshold parameter might be too stringent, resulting in the removal of almost all points.In this region-based initial matching result, LPM struggled to distinguish correct and incorrect matches.Among all methods, APLC produced the best results, preserving matches with consistent local structures.From the figures, it is evident that APLC retained matches with relatively consistent local disparity magnitudes and directions, while the other five methods failed to remove all outliers.

Quantitative Evaluation
Using FY-3D and its simulated images, preliminary results of region-based matching were obtained for six scenes and quantitatively compared with four state-of-the-art point matching outlier removal methods (LPM, RANSAC, and GCRANSAC).In the area-based initial matching results of scenes 0455, 0520, 1120, 1125, 1905, and 0805, the outlier ratios were 0.545, 0.916, 0.583, 0.356, 0.782, and 0.415, respectively.The average outlier ratio was 0.605, with an average of 33,140 initial matching point pairs.We focused on comparing the recall, precision, and root mean square error (RMSE) of the four algorithms, as shown in In comparison, our proposed APLC algorithm outperforms the other three algorithms in all metrics.This indicates that the matching results of APLC are more accurate and discriminative, while RANSAC, GCRANSAC, and LPM retain more outliers.Faced with varying proportions of outliers, the precision and RMSE of the other three methods decrease, while APLC maintains stable robustness and accuracy.In terms of recall, APLC is comparable to RANSAC and GCRANSAC, while outperforming LPM.After outlier removal, APLC achieves a precision consistently above 0.99, while other methods hover around 0.65.The RMSE of APLC is also less than half of that of other methods, highlighting the significant advantage of APLC in removing mismatched points in region-based approaches.This underscores the role of APLC in outlier removal based on area-based matching, showcasing its reliability and wide applicability in high-precision geometric calibration and other applications.

Conlusion
In this paper, we propose an adaptive parameter local consistency automatic outlier removal algorithm (APLC).The method primarily focuses on the precision of matching points, deriving uncertainties in the consistency of topological structure and distance within the neighborhood.Through cross-validation using two pairs of vectors, it mitigates the impact of local deformations to some extent.Ultimately, by computing cost values, APLC achieves error match removal in two stages.The application and validation of APLC in the geometric calibration of FY-3D satellite images demonstrate its significant improvement in outlier removal capabilities for area-based matching.The main contributions of this paper can be summarized as follows: (1) The discovery that locally fixed threshold parameters are challenging to adapt to all situations, addressing the judgment of local consistency in the face of small matching accuracy, distance variations, and deformations that affect topological consistency.
(2) Deriving uncertainties in the consistency of neighborhood topological structure and distance from the precision of matches.
(3) By cross-validating two vectors in the local neighborhood, the impact of local deformations is mitigated, achieving adaptive parameterization for finer detection and removal of outliers.
(a) Disparity results of the initial matching points for scene 0445 (b) Disparity results of the initial matching points for scene 0520 (c) Disparity results of the initial matching points for scene 1120 (d) Disparity results of the initial matching points for scene 1125 (e) Disparity results of the initial matching points for scene 1905 (f) Disparity results of the initial matching points for scene 0805 Figure 4. Initial matching data results.
Comparison of the effects of four algorithms on error matches removal in scenes 0455 and 1905.
Comparison of the effects of four algorithms on error matches removal in scenes 1120 and 1125

Table 1 .
RMSE is a statistical measure of the magnitude of disparity, while recall and precision are the average values across the six scenes data.

Table 1 .
Quantitative metrics comparison of four methods