TriCo-Net: Learning Semantically Aware Local Features via Triple Consistency
Keywords: Local Feature Matching, Semantic-Aware Descriptors, Knowledge Distillation, Multi-Scale Consistency, Visual Localization
Abstract. Local feature matching in complex scenes is hindered by semantic ambiguity, where detectors often latch onto transient or repetitive patterns. We present TriCo-Net, which learns semantically aware and discriminative local features by enforcing a Triple Consistency (TriCo) principle across implicit semantics, scale, and spatial context. During training, an Implicit Semantic Strategy (ISS) distills cues from a segmentation teacher to modulate keypoint reliability and descriptor learning, while introducing no overhead at inference. A Scale-wise Semantic Harmonizer (SSH) aligns and fuses feature-pyramid levels to ensure cross-scale coherence, and a Global Context Propagator (GCP) broadcasts scene-level dependencies to resolve local ambiguities. On Aachen Day–Night v1.1, TriCo-Net achieves strong and consistent gains in visual localization, particularly under night conditions, and exhibits robustness to blur, noise, and large homographies. Ablations show complementary benefits from ISS, SSH, and GCP, with ISS contributing most at tight thresholds and at night. TriCo-Net narrows the day–night performance gap while maintaining mid-range throughput, offering a practical trade-off between robustness and efficiency.
