Attention-guided Multi-Scale Deep Learning Approach for Tree Health Detection Using Very High-Resolution Aerial Imagery
Keywords: Tree Health Monitoring, Deep Learning, Forest Management, Attention Mechanism, Remote Sensing
Abstract. Monitoring tree health is essential for detecting early signs of stress, defoliation, and potential mortality, supporting effective forest management, ecosystem conservation, and early warning systems. Advances in deep learning have enabled automated analysis of trees in remote sensing imagery through object detection methods that leverage both spectral and spatial information. However, assessing tree defoliation remains challenging, as subtle differences between defoliation levels make accurate classification difficult. To address this, we propose the hybrid ResNet-Swin Transformer, an object detection architecture built on a Faster R-CNN framework, incorporating a fused ResNet and Swin Transformer backbone with attention-based feature fusion. This design captures rich, multiscale representations by combining convolutional and transformer-based features and progressively refines them through channel-wise attention blocks for robust detection and classification. The architecture was evaluated on a very high-resolution aerial dataset from Switzerland, partially annotated with five classes: Conifer (healthy), Conifer (defoliated), Broadleaf (healthy), Broadleaf (defoliated) and Dead. Comparative experiments with state-of-the-art object detection and classification methods demonstrate that the proposed approach achieves higher accuracy and robustness, highlighting its potential for precise and reliable automated tree health monitoring.
