Deep Learning-Based Model for Paddy Diseases Classification by Thermal Infrared Sensor: An Application for Precision Agriculture

: The rice plant is an extremely valuable food crop worldwide. Paddy diseases not only reduce the cultivation of rice but most significantly, they contribute to environmental damage. The identification of paddy diseases before the onset of any visible signs has gained consideration with the development of deep learning (DL) and thermal infrared sensors. According to previous investigations, certain internal alterations in the paddy occur before signs of the infection become apparent. Such modifications couldn't be seen by exterior visible light sensors. On the other hand, thermal infrared sensors may be able to detect these variations, which will aid in predicting illness at earlier phases. However, there are few research articles regarding this topic. This study suggests a DL-based model for identifying paddy diseases from thermal images. The proposed DL-based model, in contrast to earlier approaches for classifying plant diseases, uses three convolutional neural networks (CNNs) with distinctive configurations. In addition, it makes use of discrete wavelet transform (DWT) to give a time-frequency illustration of the spatial deep features gathered via the three CNNs to develop the classification models instead of relying on solely spatial data like current models. Furthermore, it merges the spatial-time-frequency features of the three CNNs and uses a feature selection method based on Relief-F to choose the most beneficial attributes and reduce the dimensionality of the feature space. The outcomes of the proposed DL-based model show that spatial-time-frequency demonstrations are preferable to spatial data. The results additionally demonstrate that integrating high-level features from various CNNs can improve classification performance reaching an accuracy of 96.5% using a cubic support vector machine (SVM) classifier. Furthermore, the findings attained in this study outperform current DL-based models for plant disease classification.


INTRODUCTION
Rice is a particularly significant grains crop within all forms of agriculture because it is an essential food source for numerous countries; nearly 75% of them-eat rice.Paddy cultivation, which is widespread but especially popular in Egypt in addition to East Asian countries, serves as one of the primary crops affected by global warming.Paddy illnesses have been identified as one of the greatest threats to rice cultivation, reducing yields from agriculture and causing increased financial losses (Vishnoi et al., 2021).Paddy diseases do not just prompt farmers to lose money but also decrease the quality of the end product.Hence, quick and precise identification of the illness is crucial to prevent the reduction of crop grains and to raise overall product quality (Phadikar et al., 2013;Radhakrishnan, 2020).Typically, either physical examination or laboratory evaluation is used to monitor the recognition of rice illnesses.Only a professional can perform a laborious visual assessment.On the other hand, Chemical reagents are required for laboratory experiments, which is an exhausting procedure (TÜRKOĞLU and HANBAY, 2018), (Sethy et al., 2020).Thus, it is important to automate the process of detecting rice diseases in order to reduce the above-mentioned limitations and produce an effective and accurate detection of paddy diseases.
The recent advancement in image analysis, computer vision, and artificial intelligence technologies especially deep learning (DL) techniques have great achievements in multiple domains including medicine (Attallah, 2023a(Attallah, , 2023b)), healthcare (Attallah, 2023c(Attallah, , 2023d)), renewable able energy (Attallah et al., 2022), petroleum industry (Rashad et al., 2022), and gas leakage detection (Attalah, 2023).Inspired by the accomplishments of these technologies, they could present a chance for the development of computer-assisted frameworks to help with the identification of rice diseases and other issues in the agricultural sector (Jiang et al., 2020).The majority of existing computerassisted frameworks for plant disease recognition relied on pictures taken with visible light cameras (Batchuluun et al., 2022a), (Attallah, 2023e;Haridasan et al., 2023;Salamai et al., 2023;Thangaraj et al., 2022;Vishnoi et al., 2021).Each plant illness typically results in both internal and external changes in the affected crops.If the infection has begun to propagate throughout the crop, visible signs show up a few days following illness, nevertheless by then the disease has propagated and the crop quality has declined, which results in a substantial decrease in productivity.Therefore, considering that the crops remain in their incubation phase ahead of the first signs of the illness, visible camera-based images collapse to accomplish timely diagnosis because it is unable to accurately forecast the infection before the emergence of lesions.However, internal chemical alterations start to show up right away.Infected crops experience fluctuations in temperature that are undetectable to the naked eye due to those internal alterations (Chen and Shakhnovich, 2010).Since it records infrared light across the outermost layer of an item in a wavelength spectrum of 7500 to 14,000 nm (Bhakta et al., 2023), a thermal infrared sensor is strongly responsive to this sort of temperature fluctuation within the object's body, unlike visible light sensors, which operates in the 380 to 700 nm range (Batchuluun et al., 2022a;Vadivambal and Jayas, 2011).
There are a few research efforts on the identification of plant diseases using images captured by thermal infrared cameras.An example of the scarce studies is (Batchuluun et al., 2022b) which employed a convolutional neural network (CNN) and explainable artificial intelligence to enhance the performance of paddy disease classification.Furthermore, the research article (Bhakta et al., 2023) introduced a novel three-layer CNN that was suggested to identify rice diseases.Whereas the research article (Anasta et al., 2021) employed standard image processing techniques to detect disease in banana fruit.In addition, the article (Sachan et al., 2022) utilized four CNNs separately with a support vector machine (SVM) to identify paddy diseases using a thermal camera.Similarly, the article (Bompilwar et al., 2022) adopted multiple CNNs to detect plant diseases.On the other hand, the study (Banerjee et al., 2018) employed an SVM classifier to estimate leaf area in wheat crops.
Some of the studies discussed earlier are based on traditional machine learning approaches which require several preprocessing phases such as segmentation and handcrafted feature extraction which are time-consuming and might be prone to error.Others utilized DL models including a single CNN to achieve the disease identification procedure.Even studies that employed several CNNs used them individually to perform the recognition procedure.However, employing multiple CNNs of various structures, merge their advantages and could probably enhance performance.Furthermore, all of the previous studies relied solely on the spatial information obtained from the CNNs to achieve the plant disease detection process.Nonetheless, acquiring time-frequency information as well as spatial knowledge most likely boosts the detection procedure.Also, non of them employed a feature selection step to lower feature dimensionality, thus diminishing classification complexity.Therefore, this study proposes a DL-based model that relies on three CNNs (Inception, Mobile, and DenseNet-201) of different constructions for paddy disease identification by thermal images.Rather than relying on spatial data alone obtained from each CNN, the proposed model acquires timefrequency representation using the discrete wavelet transform (DWT) method which is applied to spatial deep features attained from each CNN separately.Afterward, spatial-timefrequency features of the three CNNs are concatenated, and a feature selection approach is adopted to reduce dimensionality and pick the most significant features.
The remaining parts of the paper are structured in the following manner.Section 2 introduces the material and methods.Section 3 shows the performance measures.Section 4 discusses the results of the proposed model.Section 5 concludes the article.

Paddy Thermal Image Dataset
The FLIR camera is used to capture paddy leaves in this dataset.Asian rice, frequently referred to be "Oryza sativa," was the rice crop used in this database.This database could be found on Kaggle.The database includes thermal photographs of Oryza sativa leaves in both healthy and unhealthy conditions.The photos available in the database have a resolution of 320 x 240 pixels.It includes 636 pictures comprising five classes of paddy diseases and a healthy leaves class.The paddy disease classes involve blast, bacterial leaf blight, hispa, leaf folder, and leaf spot.The distribution of the photos among classes and samples of images in each class are shown in Table 1

Discrete Wavelet Transform
The discrete wavelet transform (DWT) is a well-recognized technique utilized in analyzing signals in the time-frequency domain.The initial signal could be split into various frequency sub-bands using the DWT (Mallat, 1989).Particularly, the highfrequency sub-band is rescaled using the first decomposition level of DWT, and the signal is then rebuilt using this developed sub-band.The original signal is initially high-pass filtered, producing a trio of detailed coefficients, followed by low-pass filtering and down-scaling, producing an approximation coefficient sub-band that represents the 1D DWT of the signal.The procedure iterates on the approximation coefficient sub-band to determine the multi-level decomposition of DWT.

Proposed DL-Based Model
The

Paddy Thermal Images Preprocessing
The infrared Camera's thermal images possess distinct sizes from the input layer dimensions that each of the CNN models embraces.In order to maintain a comparable size to what these models allow, the picture resolution should be changed to 224 x 224 x 3 for DenseNet-201 and Mobile, whereas for Inception, it is equal to 229 x 229 x 3.After that, a number of augmentation techniques are used to increase the total number of photographs utilized for instructing the CNN models.The augmentation action typically promotes these models' learning and prevents overfitting.

CNN Model Formulation
The three pre-trained CNNs employed in the proposed DLbased model are Mobile, DenseNet-201, and Inception.They are three state-of-the-art DL models having different architectures.The three pre-trained CNNs are modified using transfer learning to have a final output layer size of 6, which is equivalent to the total amount of paddy disease and healthy classes that make up the dataset.Mini-batch and learning rate are two additional hyper-parameter settings that have been tweaked.Both have been changed to 4 (mini-batch) and 0.0002 (learning rate), respectively.Additionally, 100 epochs have been allocated.Subsequently, thermal pictures are employed for retraining these CNNs.

Feature Extraction and Time-Frequency Interpretation
After terminating the training procedure of the three CNNs, the feature extraction process is done.Repurposing transfer learning allows for the acquisition of high-level features out of the pooling layer, which comes before the fully connected layer.Those attributes offer merely spatial data.The Inception, DenseNet-201, and Mobile CNNs, respectively, produce feature vector lengths that are equal to 2048, 1920, and 1280.These features are then subjected to additional analysis using the DWT technique resulting in the time-frequency representation of the features resulting in the spatial-timefrequency illustrations.The total amount of these attributes is reduced as well using DWT to reach half of their original dimensions 1024, 960, and 640 for the Inception, DenseNet-201, and Mobile models, respectively.Note that, one level of DWT is utilized with the "Haar' wavelet, where the approximation coefficients are employed as features.

Integration and Feature Selection
High-level features obtained from each CNN in the previous step, after applying the DWT are concatenated.The dimensions of features after this step reached 2624 which is huge.Thus, diminishing the size of this feature vector is essential to lower the complexity of classification.Feature selection is the process of choosing a significant set of lowered features that impact classification, leading to a lesser classification complexity as well as avoiding overfitting (Attallah et al., 2017).In this step, Relief-F feature selection (Urbanowicz et al., 2018) is applied to the integrated high-level features.The Manhattan distance is used by ReliefF for determining weighting elements, which results in positive as well as negative weighting elements.Redundancy is distributed all over the ReliefF using unfavorable weight quantities.The default threshold of Matlab is used for the weighting procedure.

Paddy Disease Classification
To accomplish the paddy disease classification step, two SVM classifiers of distinct kernels are employed.These kernels are quadratic and cubic.The sequential minimal optimization algorithm is used to learn these SVMs.5-fold cross-validation is utilized to validate the performance of the proposed DL-based model.Specifically, the entire data set is split at random into five comparable sub-datasets for 5-fold cross-validation.Four of the sub-portions are applied for learning the classifiers, while a single subset is placed aside for testing purposes where the classification accuracy is computed.This procedure is carried out multiple times, using a distinct testing portion from among the 5 sub-portions every single time.
The accuracy of classification of the five sub-datasets utilized in evaluating the performance is averaged to determine the final accuracy.

ASSESSMENT METRICS
This section provides examples of the assessment metrics that were utilized to assess the effectiveness of the suggested DLbased model.Accuracy, sensitivity, precision, F1-score, and specificity are some of the above measures.Formulas (1-5) are employed for calculating the measures.
Where the term "true positive' (TP), represents the number of properly determined positive instances.The term "true negative" (TN) designates the total of exactly recognized negative instances.The term "false negative" (FN) refers to positive instances that were wrongly identified as negative, while the false positive (FP) refers to the number of negative instances that were erroneously detected as positive.

Spatial Deep Features Results
This section provides the results of the SVM classifiers learned with spatial deep features of each CNN separately.Table 2 shows the results of the SVMs trained with these spatial features.Table 2 indicates that both quadratic SVM (Q-SVM) and cubic SVM (C-SVM) reached an accuracy of 89% using spatial features of DensNet-201.Whereas Mobile CNN's spatial features used to train the SVM classifiers led to an accuracy of 88.4% and 87.6% for Q-SVM and C-SVM respectively.While spatial features of Inception achieved an accuracy of 87.1% and 88.1% for the Q-SVM and C-SVM classifiers respectively.

Spatial-Time-Frequency Features Results
This section discusses the results after the DWT process is applied for each spatial deep feature set obtained for each CNN independently.The results are displayed in Figure 2. As shown in Figure 2, the classification accuracy has improved after the DWT process to reach 89.5% and 89.2% for Mobile CNN, 87.6%, 87.9% for Inception CNN, and 90.4% and 89% for DenseNet-201 CNN using the Q-SVM and C-SVM classifiers respectively.Note that for the spatial-time-frequency of DenseNet-201 CNN, the C-SVM attained the same accuracy achieved with the spatial features, however, the number of spatial-time-frequency features ( 960) is much lower than that of the spatial features (1920).Likewise, the number of spatialtime-frequency features is lesser for Mobile (640) and Inception (2048) compared to 1280 and 2048 for spatial features.These results prove that spatial-time-frequency demonstration is superior to spatial data.The results also verify that DWT is capable of enhancing performance with a lower amount of features.

Feature Selection Results
This section illustrates the results after the integration and feature selection processes.Table 3 demonstrates the assessment metrics after the integration and feature selection processes.The results shown in Table 3 indicate that the accuracy, sensitivity, specificity, precision, and F1-score are (95.92%,96.5%),(95.55%, 95.94%), (99.11%,99.17%),(95.98%,97.08%),and (95.69%,96.47%)for the Q-SVM and C-SVM respectively.These results confirm that the integration of high-level features from multiple CNN could improve classification performance.These results are obtained with only 1000 features which is much lower than the concatenated features (2624).Thus, feature selection can successfully reduce and select significant features while enhancing performance.The receiving operating characteristics curves (ROCs) for the Q-SVM and C-SVM classifiers are also plotted and shown in Figure 3.
proposed DL-based model has five cascaded steps consisting of paddy thermal image preprocessing, CNN model formation, feature extraction and time-frequency interpretation, integration and feature selection, and paddy disease classification.In the first step, paddy thermal photos are prepared to fit the CNN's input layer dimension and then augmented.Secondly, three CNN models including Mobile, DenseNet-201, and Inception are built with transfer learning.Thirdly, deep features are acquired from each of the CNNs, followed by a DWT process that produces time-frequency representation.Fourthly, the spatial-time-frequency features gathered in the previous step are integrated and a feature selection procedure is applied to these features.Finally, in the classification step, two SVM classifiers are built.Figure1summarizes the steps of the proposed DL-based model.

Figure 1 .
Figure 1.Summary of the steps of the proposed DL-based model.

Figure 2 .
Figure 2. Classification accuracy (%) for the SVM classifiers trained with spatial-time-frequency features compared to spatial features alone.
have focused on classifying paddy diseases via images obtained by visible light sensors.Thermal infrared sensors, on the other hand, just recently began to be utilized for recording patterns and characteristics of paddy's exterior and interior that can't be observed by visible-light sensors.There aren't many studies, though, that use thermal infrared sensors to classify paddy images.This paper proposed a DL-based model for paddy disease classification using thermal images.Contrary to previous methods for plant disease classification, the proposed DL-based model employed three CNNs with distinct structures.Furthermore, it employed DWT to provide a time-frequency representation of spatial deep features acquired from the three CNNs in order to train the classification models.In addition, it integrated the spatial-timefrequency features of the three CNNs and applied a feature selection approach based on Relief-F to pick the most useful features and lower feature space dimensionality.The results of the proposed DL-based model indicated the superiority of spatial-time-frequency demonstrations over spatial data.The outcomes also demonstrated that DWT could improve performance with fewer features.Moreover, the findings illustrated how combining high-level features from different CNNs could enhance classification performance.Compared to the combined features (2624), these results were obtained using only 1000 features.Thus, feature selection was capable of eliminating non-essential features and choosing the important ones whilst improving efficiency.In addition, the outcomes of the proposed DL-based model in comparison to recent methods for plant disease classification confirmed its outperforming capacity.

Table 1 .
The distribution of the photos among classes and samples of images in each class.

Table 2 .
The classification accuracy (%) of the SVM classifiers learned with spatial deep features of each CNN.

Table 3 .
The assessment metrics (%) of the SVM classifiers after the feature selection process.
Q-SVM C-SVM Figure 3. ROC curves after the integration and feature selection processes 4.

4 Comparisons with Previous Work The
results of the developed DL-based model are contrasted with current models that utilize thermal imaging and DL techniques, as shown in Table4, to illustrate and confirm its competitive capability.The outcomes in Table4attest to the created DL-based model's superiority.This is because the provided DL-based model achieves a greater classification accuracy of 96.5% in comparison with existing methods that employ thermal imaging sensors and DL techniques.Furthermore, all other assessment metrics achieved using the proposed DL-based model are greater than existing models.

Table 4 .
Comparisons with existing DL-based models based on the Paddy thermal images dataset