LoD2-Former: Multi-Modal Transformer-Based 3D Building Wireframe Reconstruction

Abdelhedi, Youssef; Panangian, Daniel; Amrullah, Chaikal; Chaabouni-Chouayakh, Houda; Bittner, Ksenia

doi:10.5194/isprs-annals-XI-2-2026-187-2026

Articles | Volume XI-2-2026

https://doi.org/10.5194/isprs-annals-XI-2-2026-187-2026

© Author(s) 2026. This work is distributed under
the Creative Commons Attribution 4.0 License.

https://doi.org/10.5194/isprs-annals-XI-2-2026-187-2026

© Author(s) 2026. This work is distributed under
the Creative Commons Attribution 4.0 License.

Articles | Volume XI-2-2026

03 Jul 2026

| 03 Jul 2026

LoD2-Former: Multi-Modal Transformer-Based 3D Building Wireframe Reconstruction

Youssef Abdelhedi, Daniel Panangian, Chaikal Amrullah, Houda Chaabouni-Chouayakh, and Ksenia Bittner

Keywords: LoD2 Building Reconstruction, 3D Building Modeling, Transformer, Deep Learning, Multi-Modal Remote Sensing

Abstract. This paper presents LOD2-FORMER, a multi-modal Transformer architecture for end-to-end 3D roof wireframe reconstruction from both light detection and ranging (LiDAR) point clouds and aerial imagery. Unlike existing methods that rely solely on point clouds, LOD2-FORMER leverages complementary geometric and visual information to address challenges posed by sparse and incomplete airborne LiDAR data. State-of-the-art methods for 3D roof wireframe reconstruction typically explore the search space from 3D to 2D by first generating 2D heatmaps of roof corner probabilities from point cloud features, lifting the predicted corners back to 3D, and then inferring edge connections. While effective, these purely point-cloud-driven approaches leave substantial information unexploited, particularly from complementary 2D data sources. In this work, we investigate how integrating aerial optical imagery can improve reconstruction accuracy and provide insights into optimal multi-modal fusion strategies, highlighting the advantages and limitations of combining geometric and visual cues. We also introduce a robust pipeline for collecting, cleaning and matching aerial images with LiDAR point cloud, enabling the reconstruction of complete 3D roof wireframes. Experiments on two datasets demonstrate that LOD2-FORMER surpasses state-of-the-art baselines and mitigates the challenges posed by sparse or incomplete point clouds. To allow further comparisons with our methodology the dataset has been made available at https://github.com/KseniaBittner/LoD2-Former

LoD2-Former: Multi-Modal Transformer-Based 3D Building Wireframe Reconstruction

Useful Links

Useful External Links

Our Contact