ISPRS-Annals

ISPRS Annals of the Photogrammetry, Remote Sensing and Spatial Information Sciences

ISPRS-Annals

ISPRS Ann. Photogramm. Remote Sens. Spatial Inf. Sci.

2194-9050

Copernicus Publications

Göttingen, Germany

10.5194/isprs-annals-XI-2-2026-187-2026

LoD2-Former: Multi-Modal Transformer-Based 3D Building Wireframe Reconstruction

Abdelhedi

Youssef

¹ ³ Panangian

Daniel

¹ Amrullah

Chaikal

¹ Chaabouni-Chouayakh

Houda

² Bittner

Ksenia

Remote Sensing Technology Institute, German Aerospace Center (DLR), Wessling, Germany

Sm@rts Laboratory, Digital Research Center of Sfax, Sfax, Tunisia

Higher School of Communication of Tunis, Ariana, Tunisia

03 07 2026

XI-2-2026 187 195

2026

This work is licensed under the Creative Commons Attribution 4.0 International License. To view a copy of this licence, visit https://creativecommons.org/licenses/by/4.0/

This article is available from https://isprs-annals.copernicus.org/articles/XI-2-2026/187/2026/isprs-annals-XI-2-2026-187-2026.html

The full text article is available as a PDF file from https://isprs-annals.copernicus.org/articles/XI-2-2026/187/2026/isprs-annals-XI-2-2026-187-2026.pdf

This paper presents LOD2-FORMER, a multi-modal Transformer architecture for end-to-end 3D roof wireframe reconstruction from both light detection and ranging (LiDAR) point clouds and aerial imagery. Unlike existing methods that rely solely on point clouds, LOD2-FORMER leverages complementary geometric and visual information to address challenges posed by sparse and incomplete airborne LiDAR data. State-of-the-art methods for 3D roof wireframe reconstruction typically explore the search space from 3D to 2D by first generating 2D heatmaps of roof corner probabilities from point cloud features, lifting the predicted corners back to 3D, and then inferring edge connections. While effective, these purely point-cloud-driven approaches leave substantial information unexploited, particularly from complementary 2D data sources. In this work, we investigate how integrating aerial optical imagery can improve reconstruction accuracy and provide insights into optimal multi-modal fusion strategies, highlighting the advantages and limitations of combining geometric and visual cues. We also introduce a robust pipeline for collecting, cleaning and matching aerial images with LiDAR point cloud, enabling the reconstruction of complete 3D roof wireframes. Experiments on two datasets demonstrate that LOD2-FORMER surpasses state-of-the-art baselines and mitigates the challenges posed by sparse or incomplete point clouds. To allow further comparisons with our methodology the dataset has been made available at <code>https://github.com/KseniaBittner/LoD2-Former</code>