<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE article PUBLIC "-//NLM//DTD Journal Publishing DTD v3.0 20080202//EN" "https://jats.nlm.nih.gov/nlm-dtd/publishing/3.0/journalpublishing3.dtd">
<article xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink" article-type="research-article" dtd-version="3.0" xml:lang="en">
<front>
<journal-meta>
<journal-id journal-id-type="publisher">ISPRS-Annals</journal-id>
<journal-title-group>
<journal-title>ISPRS Annals of the Photogrammetry, Remote Sensing and Spatial Information Sciences</journal-title>
<abbrev-journal-title abbrev-type="publisher">ISPRS-Annals</abbrev-journal-title>
<abbrev-journal-title abbrev-type="nlm-ta">ISPRS Ann. Photogramm. Remote Sens. Spatial Inf. Sci.</abbrev-journal-title>
</journal-title-group>
<issn pub-type="epub">2194-9050</issn>
<publisher><publisher-name>Copernicus Publications</publisher-name>
<publisher-loc>Göttingen, Germany</publisher-loc>
</publisher>
</journal-meta>
<article-meta>
<article-id pub-id-type="doi">10.5194/isprs-annals-XI-2-2026-187-2026</article-id>
<title-group>
<article-title>LoD2-Former: Multi-Modal Transformer-Based 3D Building Wireframe Reconstruction</article-title>
</title-group>
<contrib-group><contrib contrib-type="author" xlink:type="simple"><name name-style="western"><surname>Abdelhedi</surname>
<given-names>Youssef</given-names>
</name>
<xref ref-type="aff" rid="aff1">
<sup>1</sup>
</xref>
<xref ref-type="aff" rid="aff3">
<sup>3</sup>
</xref>
</contrib>
<contrib contrib-type="author" xlink:type="simple"><name name-style="western"><surname>Panangian</surname>
<given-names>Daniel</given-names>
</name>
<xref ref-type="aff" rid="aff1">
<sup>1</sup>
</xref>
</contrib>
<contrib contrib-type="author" xlink:type="simple"><name name-style="western"><surname>Amrullah</surname>
<given-names>Chaikal</given-names>
</name>
<xref ref-type="aff" rid="aff1">
<sup>1</sup>
</xref>
</contrib>
<contrib contrib-type="author" xlink:type="simple"><name name-style="western"><surname>Chaabouni-Chouayakh</surname>
<given-names>Houda</given-names>
</name>
<xref ref-type="aff" rid="aff2">
<sup>2</sup>
</xref>
</contrib>
<contrib contrib-type="author" xlink:type="simple"><name name-style="western"><surname>Bittner</surname>
<given-names>Ksenia</given-names>
</name>
<xref ref-type="aff" rid="aff1">
<sup>1</sup>
</xref>
</contrib>
</contrib-group><aff id="aff1">
<label>1</label>
<addr-line>Remote Sensing Technology Institute, German Aerospace Center (DLR), Wessling, Germany</addr-line>
</aff>
<aff id="aff2">
<label>2</label>
<addr-line>Sm@rts Laboratory, Digital Research Center of Sfax, Sfax, Tunisia</addr-line>
</aff>
<aff id="aff3">
<label>3</label>
<addr-line>Higher School of Communication of Tunis, Ariana, Tunisia</addr-line>
</aff>
<pub-date pub-type="epub">
<day>03</day>
<month>07</month>
<year>2026</year>
</pub-date>
<volume>XI-2-2026</volume>
<fpage>187</fpage>
<lpage>195</lpage>
<permissions>
<copyright-statement>Copyright: &#x000a9; 2026 Youssef Abdelhedi et al.</copyright-statement>
<copyright-year>2026</copyright-year>
<license license-type="open-access">
<license-p>This work is licensed under the Creative Commons Attribution 4.0 International License. To view a copy of this licence, visit <ext-link ext-link-type="uri"  xlink:href="https://creativecommons.org/licenses/by/4.0/">https://creativecommons.org/licenses/by/4.0/</ext-link></license-p>
</license>
</permissions>
<self-uri xlink:href="https://isprs-annals.copernicus.org/articles/XI-2-2026/187/2026/isprs-annals-XI-2-2026-187-2026.html">This article is available from https://isprs-annals.copernicus.org/articles/XI-2-2026/187/2026/isprs-annals-XI-2-2026-187-2026.html</self-uri>
<self-uri xlink:href="https://isprs-annals.copernicus.org/articles/XI-2-2026/187/2026/isprs-annals-XI-2-2026-187-2026.pdf">The full text article is available as a PDF file from https://isprs-annals.copernicus.org/articles/XI-2-2026/187/2026/isprs-annals-XI-2-2026-187-2026.pdf</self-uri>
<abstract>
<p>This paper presents LOD2-FORMER, a multi-modal Transformer architecture for end-to-end 3D roof wireframe reconstruction from both light detection and ranging (LiDAR) point clouds and aerial imagery. Unlike existing methods that rely solely on point clouds, LOD2-FORMER leverages complementary geometric and visual information to address challenges posed by sparse and incomplete airborne LiDAR data. State-of-the-art methods for 3D roof wireframe reconstruction typically explore the search space from 3D to 2D by first generating 2D heatmaps of roof corner probabilities from point cloud features, lifting the predicted corners back to 3D, and then inferring edge connections. While effective, these purely point-cloud-driven approaches leave substantial information unexploited, particularly from complementary 2D data sources. In this work, we investigate how integrating aerial optical imagery can improve reconstruction accuracy and provide insights into optimal multi-modal fusion strategies, highlighting the advantages and limitations of combining geometric and visual cues. We also introduce a robust pipeline for collecting, cleaning and matching aerial images with LiDAR point cloud, enabling the reconstruction of complete 3D roof wireframes. Experiments on two datasets demonstrate that LOD2-FORMER surpasses state-of-the-art baselines and mitigates the challenges posed by sparse or incomplete point clouds. To allow further comparisons with our methodology the dataset has been made available at &lt;code&gt;https://github.com/KseniaBittner/LoD2-Former&lt;/code&gt;&amp;nbsp;</p>
</abstract>
<counts><page-count count="9"/></counts>
</article-meta>
</front>
<body/>
<back>
</back>
</article>