Classifying Roof Types from Orthophotos: A Swin Transformer-Based Approach
Keywords: Rooftop types, Orthophotos, Classification, Transformer
Abstract. Rooftop type classification refers to the process of identifying and categorizing the structural geometry of building roofs using geospatial data. It plays a critical role in urban analysis, aiding applications such as 3D city modeling, solar potential estimation, infrastructure planning and post-disaster damage assessment. This research proposes a transformer-based deep learning model, the Swin Transformer, for rooftop classification to handle the issues of complex roof shapes, small or similar-looking structures, and variations in roof types, especially in areas with unplanned construction. The model is trained on an orthophoto-derived GeoTIFF images having four roof type such as flat, gable, complex and bug. Images were resized to 256×256 pixels and processed in batches of 128. This dataset is split into 2528 training images, 544 testing images, and 545 validation images. The Transformer architecture achieves overall performance with a test accuracy of 75%, showing excellent results for gable classes having F1-score of 85.23% and complex classes achieving F1-score of 70.75%, while flat and bugs classes show moderate performance due to lower recall. With the integration of early stopping and a learning rate scheduler, the Swin Transformer showed improved precision for bugs from 60.94% to 66.67% and flat from 78.41% to 66.93% classes, while maintaining a comparable overall accuracy 75.00% to 74.63% and enhancing class balance in predictions. The proposed architecture is also compared with other state-of-the art models and can be used in future applications such as distinguishing roofs from other urban and roof improved geospatial analysis and smart city development.
Contact author first name:
