High Resolution Multi-View Image-based Building Type Classification Using Deep Learning

Tavakoligargari, Mohammad Hassan; Ghasemzadeh, Maryam; Hazrati, Nima; Arefi, Hossein

doi:10.5194/isprs-annals-X-4-W8-2025-793-2026

Articles | Volume X-4/W8-2025

https://doi.org/10.5194/isprs-annals-X-4-W8-2025-793-2026

© Author(s) 2026. This work is distributed under
the Creative Commons Attribution 4.0 License.

https://doi.org/10.5194/isprs-annals-X-4-W8-2025-793-2026

© Author(s) 2026. This work is distributed under
the Creative Commons Attribution 4.0 License.

Articles | Volume X-4/W8-2025

29 May 2026

| 29 May 2026

High Resolution Multi-View Image-based Building Type Classification Using Deep Learning

Mohammad Hassan Tavakoligargari, Maryam Ghasemzadeh, Nima Hazrati, and Hossein Arefi

Keywords: High-Resolution Remote Sensing, Street-Level Imagery, Deep Multi-View Fusion, Stacking Ensemble Learning, CNN, Urban Building Classification

Abstract. The classification of building types is a major method for optimizing urban planning, enhancing disaster management strategies, and advancing sustainable development objectives. This study presents a multi-view deep learning approach that achieves an overall classification accuracy of 75.8% for distinguishing building types. Using OpenStreetMap (OSM) building tags as ground-truth labels and a multi-view image dataset of 10,360 buildings from the German states of Baden-Württemberg and Rhineland-Palatinate, was generated accordingly. The multi-scale images include aerial images at multiple zoom levels as well as street view images for each building, which are then classified into four categories: commercial, industrial, public, and residential. This approach employs two convolutional neural network architectures (VGG16 and Inception3), with each view trained separately using these CNN model architectures. All CNN models were pretrained on ImageNet before being fine-tuned on the building images. The predictions from the separately trained models were fused using model blending to identify the best combination, followed by a stacking ensemble framework with a Random Forest meta-model for the final classification. Experimental results show that this model fusion leads to a 16% relative improvement in classification accuracy compared to all individually trained models. This paper highlights the importance of integrating different types of views and state-of-the-art CNN architectures, as well as employing model fusion methods for improved urban building classification. Future research will focus on enhancing model fusion techniques and possibly enriching the classification via the incorporation of statistical data on population, income distribution, and infrastructure.

High Resolution Multi-View Image-based Building Type Classification Using Deep Learning

Useful Links

Useful External Links

Our Contact