ISPRS-Annals

ISPRS Annals of the Photogrammetry, Remote Sensing and Spatial Information Sciences

ISPRS-Annals

ISPRS Ann. Photogramm. Remote Sens. Spatial Inf. Sci.

2194-9050

Copernicus Publications

Göttingen, Germany

10.5194/isprs-annals-XI-2-2026-179-2026

Differentiable deep consistency for point cloud registration

Zhang

Tian

¹ Filin

Sagi

Mapping and Geo-Information Engineering, Technion – Israel Institute of Technology, Haifa, Israel

03 07 2026

XI-2-2026 179 186

2026

This work is licensed under the Creative Commons Attribution 4.0 International License. To view a copy of this licence, visit https://creativecommons.org/licenses/by/4.0/

This article is available from https://isprs-annals.copernicus.org/articles/XI-2-2026/179/2026/isprs-annals-XI-2-2026-179-2026.html

The full text article is available as a PDF file from https://isprs-annals.copernicus.org/articles/XI-2-2026/179/2026/isprs-annals-XI-2-2026-179-2026.pdf

Point cloud registration is a key facilitator for mapping, autonomous driving, and robotic applications. Current neural-based pipelines focus on learning view-consistent descriptors for correspondence matching, typically followed by geometric verification to assess distance/angular preservation and aid transformation estimation. Though beneficial, pairwise correspondence verification scales quadratically, creating a computational bottleneck. Moreover, since matching and verification are optimized separately, the latter cannot guide descriptor learning or foster geometric awareness. To address both limitations, we introduce an end-to-end neural registration framework that blends correspondence learning and verification into a single differentiable formulation. We propose a consistency-driven cross-attention module that dynamically correlates cross-scan neighborhoods to suppress inconsistent matches and reinforce inter-scan feature coherence, generating robust, discriminative descriptors without the quadratic cost of explicit pairwise verification. Our formulation integrates seamlessly into state-of-the-art architectures, GeoTransformer and RoITr, without additional supervision or post-processing. Results demonstrate superiority in challenging setups, where competing methods either produce few correct correspondences or fail entirely. Our method consistently achieves superior inlier ratios and the lowest registration errors on 3DMatch, 3DLoMatch, and KITTI benchmarks, improving registration recall by up to 2.6%, directly addressing setups where state-of-the-art frameworks fail. Beyond accuracy, our model converges faster during training and achieves the quickest inference among state-of-the-art methods, reflecting the value and soundness of our differentiable formulation.