<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE article PUBLIC "-//NLM//DTD Journal Publishing DTD v3.0 20080202//EN" "https://jats.nlm.nih.gov/nlm-dtd/publishing/3.0/journalpublishing3.dtd">
<article xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink" article-type="research-article" dtd-version="3.0" xml:lang="en">
<front>
<journal-meta>
<journal-id journal-id-type="publisher">ISPRS-Annals</journal-id>
<journal-title-group>
<journal-title>ISPRS Annals of the Photogrammetry, Remote Sensing and Spatial Information Sciences</journal-title>
<abbrev-journal-title abbrev-type="publisher">ISPRS-Annals</abbrev-journal-title>
<abbrev-journal-title abbrev-type="nlm-ta">ISPRS Ann. Photogramm. Remote Sens. Spatial Inf. Sci.</abbrev-journal-title>
</journal-title-group>
<issn pub-type="epub">2194-9050</issn>
<publisher><publisher-name>Copernicus Publications</publisher-name>
<publisher-loc>Göttingen, Germany</publisher-loc>
</publisher>
</journal-meta>
<article-meta>
<article-id pub-id-type="doi">10.5194/isprs-annals-XI-2-2026-729-2026</article-id>
<title-group>
<article-title>Automatic Detection Models for Building Exterior Wall Cracks in Drone Imagery Based on CNN And Transformer</article-title>
</title-group>
<contrib-group><contrib contrib-type="author" xlink:type="simple"><name name-style="western"><surname>Shang</surname>
<given-names>Yaoling</given-names>
</name>
<xref ref-type="aff" rid="aff1">
<sup>1</sup>
</xref>
</contrib>
<contrib contrib-type="author" xlink:type="simple"><name name-style="western"><surname>Ge</surname>
<given-names>Ying</given-names>
</name>
<xref ref-type="aff" rid="aff2">
<sup>2</sup>
</xref>
</contrib>
<contrib contrib-type="author" xlink:type="simple"><name name-style="western"><surname>Ma</surname>
<given-names>Yuqing</given-names>
</name>
<xref ref-type="aff" rid="aff3">
<sup>3</sup>
</xref>
</contrib>
<contrib contrib-type="author" xlink:type="simple"><name name-style="western"><surname>Zhang</surname>
<given-names>Yingying</given-names>
</name>
<xref ref-type="aff" rid="aff4">
<sup>4</sup>
</xref>
</contrib>
<contrib contrib-type="author" xlink:type="simple"><name name-style="western"><surname>Lv</surname>
<given-names>Shilin</given-names>
</name>
<xref ref-type="aff" rid="aff5">
<sup>5</sup>
</xref>
</contrib>
</contrib-group><aff id="aff1">
<label>1</label>
<addr-line>National Quality Inspection and Testing Center for Surveying and Mapping Products, People&apos;s Republic of China</addr-line>
</aff>
<aff id="aff2">
<label>2</label>
<addr-line>Hohai University, People&apos;s Republic of China</addr-line>
</aff>
<aff id="aff3">
<label>3</label>
<addr-line>State Grid Zhejiang Electric Power Co.,Ltd. Logistics Service Company, People&apos;s Republic of China</addr-line>
</aff>
<aff id="aff4">
<label>4</label>
<addr-line>State Grid Zhejiang Electric Power Co.,Ltd. Logistics Service Company, People&apos;s Republic of China</addr-line>
</aff>
<aff id="aff5">
<label>5</label>
<addr-line>State Grid Zhejiang Electric Power Co.,Ltd. Logistics Service Company, People&apos;s Republic of China</addr-line>
</aff>
<pub-date pub-type="epub">
<day>03</day>
<month>07</month>
<year>2026</year>
</pub-date>
<volume>XI-2-2026</volume>
<fpage>729</fpage>
<lpage>740</lpage>
<permissions>
<copyright-statement>Copyright: &#x000a9; 2026 Yaoling Shang et al.</copyright-statement>
<copyright-year>2026</copyright-year>
<license license-type="open-access">
<license-p>This work is licensed under the Creative Commons Attribution 4.0 International License. To view a copy of this licence, visit <ext-link ext-link-type="uri"  xlink:href="https://creativecommons.org/licenses/by/4.0/">https://creativecommons.org/licenses/by/4.0/</ext-link></license-p>
</license>
</permissions>
<self-uri xlink:href="https://isprs-annals.copernicus.org/articles/XI-2-2026/729/2026/isprs-annals-XI-2-2026-729-2026.html">This article is available from https://isprs-annals.copernicus.org/articles/XI-2-2026/729/2026/isprs-annals-XI-2-2026-729-2026.html</self-uri>
<self-uri xlink:href="https://isprs-annals.copernicus.org/articles/XI-2-2026/729/2026/isprs-annals-XI-2-2026-729-2026.pdf">The full text article is available as a PDF file from https://isprs-annals.copernicus.org/articles/XI-2-2026/729/2026/isprs-annals-XI-2-2026-729-2026.pdf</self-uri>
<abstract>
<p>This study constructs a comprehensive evaluation framework comprising six representative models: standard U-Net, Resnet34-U-Net, UNet-Attention, UNet-Residual, HybridUNet, and TransUNet. We performed systematic ablation experiments to analyse the contributions of different architectural components, including residual connections, attention mechanisms, and Transformer modules. The models were trained and validated on a dedicated dataset of building exterior crack images captured by drones, with careful consideration of the challenges posed by complex backgrounds, varying lighting conditions, and fine crack features. Multiple loss functions - F1 Loss, Focal-Dice-Loss, and BCE-Dice-Loss - were evaluated to determine their impact on model performance. The evaluation employed comprehensive metrics including Accuracy, F1 Score, IoU, Precision, Recall, and Loss values to ensure thorough performance assessment.&lt;/p&gt;
&lt;p&gt;&lt;br /&gt;Experimental results demonstrate that TransUNet achieved the best overall performance with F1 Score of 87.66%, Precision of 90.43%, and Recall of 89.99%, leveraging its Transformer module&apos;s global context modelling capability. In loss function comparisons, F1 Loss yielded the most balanced performance on TransUNet with F1 Score of 87.50%, while Focal-Dice-Loss showed exceptional optimization stability with the lowest loss value (0.1008) and high recall (96.05%). Interestingly, the performance gap among the six models was relatively small, with the difference in F1 Score between the optimal TransUNet and baseline standard U-Net being less than 0.5%. Qualitative analysis revealed that while complex models like TransUNet excel in overall metrics, simpler architectures like UNet-Attention and UNet-Residual demonstrate better robustness in challenging scenarios with complex textures, highlighting the importance of context-specific model selection.&lt;/p&gt;
&lt;p&gt;&lt;br /&gt;This research provides comprehensive insights into deep learning approaches for building exterior crack detection. TransUNet with F1 Loss emerges as the optimal solution for high-accuracy requirements, while standard U-Net and its attention-enhanced variants offer cost-effective alternatives for large-scale applications. The minimal performance gap among different architectures suggests that model complexity alone doesn&apos;t guarantee superior performance for this specific task. The study emphasizes the importance of balancing accuracy needs with computational efficiency in practical engineering applications. These findings offer valuable guidance for model selection in real-world building maintenance scenarios and contribute to the advancement of intelligent detection technologies in structural health monitoring. Future work should focus on enhancing model robustness across diverse environmental conditions and optimizing computational efficiency for broader implementation.</p>
</abstract>
<counts><page-count count="12"/></counts>
</article-meta>
</front>
<body/>
<back>
</back>
</article>