ESTIMATION OF WHEAT KERNEL MOISTURE CONTENT IN-FIELD BASED ON PLANETSCOPE AND SENTINEL-2 SATELLITE IMAGES

: Strict limitation of wheat kernel moisture content (KMC) has been set during wheat trading, as it determines the quality, storage safety, and economic efficiency. The acquisition of timely and precise wheat KMC data in-field constitutes a vital component of harvest management, as it can enable farmers to gather grain that meets industry standards, optimize their financial returns, and safeguard food resources. However, so far efficient monitoring methods have remained elusive. To address this challenge, this study utilized remote sensing satellite imagery, specifically, PlanetScope (PS) and Sentinel-2 (S2), to bridge this crucial gap. By leveraging the sensitive bands and vegetation indices for wheat KMC that were extracted from S2 and PS, respectively, this study constructed wheat KMC estimation models utilizing Random Forest Regression (RFR) to achieve high accuracy (R 2 >0.85). Furthermore, this study evaluated different spectral feature combinations to optimize the mapping retrieval quality of wheat KMC monitoring. Notably, the results revealed that the B5 band on PS was the most effective original band for wheat KMC monitoring, while B11 and B12 on S2 performed well but were susceptible to soil background interference along field edges. In terms of vegetation indices, the Plant Senescence Reflectance Index (PSRI) was deemed a reliable monitoring indicator. The practical implications of this study provided a dependable and convenient tool for monitoring wheat KMC in-field and scientific methods to assist harvest decision-making


INTRODUCTION
Wheat kernel moisture content (KMC) has attracted significant attention in previous research due to its crucial role in wheat harvest management, grain storage safety, and the resultant impact on grain quality for consumption and seeding (Atzema, 1993;Nasir et al., 2004;Wang et al., 2020).Current recommended standard for wheat KMC at harvest time is 13.5% (Loewer et al., 1994).Artificially drying wheat that has not met the required standard will result in additional costs and waste thermal resources.Proper monitoring methods for wheat KMC in-field before harvest can assist with making scientific decisions for agricultural management.Such approaches can contribute to improving economic benefits and protecting grain resources, particularly in large-scale automated farmland aimed at enhancing production efficiency where there are gaps in these approaches at the moment, unfortunately.The changes in wheat KMC occur in several distinct stages, which are of crucial significance to the wheat farmers.The initial lag phase is marked by a rapid increase in wheat KMC, which can exceed 70% (Pepler et al., 2006).And then it is worth noting that after reaching the physiological maturity stage, where wheat KMC is about 40%, the growth of wheat plants stops gradually, and environmental drying processes primarily drive the decrease in wheat KMC (Schnyder and Baum, 1992;Calderini et al., 2000).Consequently, the rate of declines in wheat KMC is rapid, specifically manifested as it takes only about 10 days to reach a standard level of 13% (Celestina et al., 2021).Farmers need to pay closer attention to changes in wheat KMC in-field after their crops have reached maturity to ensure timely harvesting during this brief period.Measurements for wheat KMC currently available primarily cater to post-harvest grain trading, which can be categorized into direct and indirect methods.Direct methods involve crushing grains or altering their properties during measurement, such as through drying or chemical methods (Grabe, 1989;Klomklao et al., 2017).Although such methods are reliable and accurate, they are not efficient.Indirect monitoring methods analyse the moisture content inside kernels by utilizing their properties, such as physical, chemical, and optical characteristics, without changing their state or chemical properties, such as through dielectric properties (Brain, 1970), acoustic methods (Amoodeh et al., 2006), or spectral measurement (Gergely and Salgó, 2003;Nath K and Ramanathan, 2017), while those methods are non-destructive, responsive, and easy to operate (Reid et al., 2010;Nath K and Ramanathan, 2017).Besides, using current methods to monitor the dynamic change of wheat KMC on-site in large-scale farmland is too cumbersome, time-consuming, and laborintensive (Li et al., 2021).These non-destructive monitoring methods can only offer single-point observations and cannot be effectively applied to fields before harvest.Therefore, exploring monitoring methods with high throughput, high precision, and non-destructive characteristics for estimating wheat KMC in-field can effectively solve the problem of inadequate data support for agricultural decisionmaking and research.Remote sensing is a powerful method to monitor plant phenotypes (Tariq et al., 2020).By receiving radiation information, quantitative models can be established to describe the relationship between remote sensing observation signals and the characteristics and properties of crops on the ground (Wójtowicz et al., 2016).Remote sensing satellite images provide an effective source of radiation information for large-scale crop phenotype retrieval, quickly providing the necessary parameters for agronomic decision-making (Dalla Marta et al., 2015).Various multispectral satellites are widely used to estimate various phenotype parameters of wheat.Xianfeng Zhou et al. compared various methods, such as vegetation index, machine learning, spectral transmission model lookup table, and mixed model, to explore the potential of Landsat 8 multispectral data in the retrieval of wheat chlorophyll content (Zhou et al., 2020).Haitao Zhao et al. constructed vegetation indices based on canopy hyperspectral and multispectral data to establish the wheat kernel protein content retrieval model using a multi-stage linear regression method.To conclude, spectral analysis is a powerful tool for monitoring and estimating crop parameters (Zhao et al., 2019).However, there is limited research currently on estimating wheat KMC in-field using satellite remote sensing.This study aims to evaluate the potential of multispectral satellite images for this purpose.We analysed sensitive bands and vegetation indices of wheat KMC from PlanetScope (PS) and Sentinel-2 (S2) and constructed retrieval models in the framework of random forest regression (RFR) to map the distribution of the wheat KMC in-field at different dates.Furthermore, the retrieval quality of mapping for wheat KMC based on different spectral feature combinations was compared to better characterize the ability of these features.

Study Site
Henan is located in central China and belongs to the warm belt monsoon climate zone, which is the primary growing area for winter wheat.The research was carried out on a highly standardized farmland (35 °22 ′ N,114 °12 ′ E, as shown in Figure 1) with a total cultivation area of 3,000,000 m 2 located in Qi County, Hebi City, Henan Province, China.The crop structure in this area is simple and the main crop is wheat.The experiment was conducted from 25 th May to 6 th June in 2022, when the wheat had reached the physiological maturity stage.During the experiment period, the weather was sunny and cloudy mostly, with the daytime temperature around 30°C.

Ground Measurements of Reference Wheat KMC
Three wheat fields were selected as sample collecting fields, and a total of 50 sampling points were set up in the fields (Figure 1 (a)).Regular data collection of wheat KMC was conducted daily throughout the entire period from 25 th May to 6 th June in 2022.Due to limited manpower, we selected different part of the sampling points for data collection every day.The wheat KMC is measured through the drying method, which is considered the most accurate way.The specific drying steps are divided into two steps: first, the wheat is wilted by setting the drying temperature to 130℃ and time to 2 hours; secondly, the oven temperature is lowered to 85 ℃ and the wheat is dried until it reaches a constant weight.The formula for calculating the wheat KMC M is as follows: Where W1 represents the weight of the sample before drying (fresh weight), while W2 represents the weight of the sample after drying (dry weight).

Satellite Data
The multispectral satellite data sources investigated in this study are S2 and PS.S2 is well-known for its outstanding temporal, spatial, and spectral performance and is widely used for vegetation monitoring and precision agriculture (Segarra et al., 2020).Its revisiting cycle is 5 days and it can provide 13 imaging bands covering the visible, NIR, and shortwave infrared regions with spatial resolutions ranging from 10 meters to 60 meters, except for the cloud detection band B10, which is excluded from L2A product.The S2 images were obtained from the Copernicus Open Access Hub (European Space Agency, 2023).The Copernicus Open Access Hub provided top-ofatmosphere reflectance products S2L1C, and the Sen2Cor plugin was used to perform atmospheric correction on all bands, resulting in the surface reflectance S2A product.Additionally, all band resolutions were resampled to 10m.PS is a commercial constellation of microsatellites operated by Planet Labs that offers greatly improved spatial resolution compared to the S2 mission.PS currently includes over 180 constellations, achieving global coverage almost every day.In March 2022, PS launched new products with eight spectral bands, covering visible and NIR regions, in addition to the original four-band products.PS has added a yellow band (600-620nm, B5) in the visible range, but cannot provide shortwave infrared bands and more extensive NIR information compared with S2.The PS images were acquired from Planet (Planet Labs PBC, 2023), where surface reflectance products with a spatial resolution of 3m can be downloaded.During the study period, there were 7 cloud-free images collected by PS within the study area, while the S2 satellite collected 3 images (Table 1).

Regression Approach
In this study, the Random Forest Regression (RFR) framework was utilized to investigate the sensitive feature domain and high-precision retrieval of wheat KMC.The RFR model is constructed using the RandomForestRegressor class in the Sklearn package of Python, and it also provides Gini importance as a measurement of feature importance.The random forest (RF) algorithm was proposed by Breiman (Breiman, 2001) ， which can effectively reduce bias and reduce variance.RFR employed an ensemble learning method to synthesize the results of all trees for obtaining the regression prediction result (Smith et al., 2013).To compare the retrieval performance of wheat KMC between PS and S2, RFR models were constructed using spectral data collected from PS and S2 on the position of the sampling points respectively.The alternative feature dimension on PS is 18 (8 original bands, 10 vegetation indices) with n=200 samples, while the dimension on S2 is 27 (12 original bands, 15 vegetation indices) with n=74 samples.The two significant parameters in the RFR algorithm, n_estimators and max_depth, were set to 20 and 7, respectively.The optimized feature subsets from the candidate feature space were selected by the RFR model for wheat KMC estimation.

Wheat KMC Spectral Analyse
Spectral characteristics of the wheat canopy are significantly affected by the wheat KMC (Figure 2).As wheat KMC decreases, the trend of spectral change is similar to leaf drought stress: reflectance increases in yellow and red band ranges while decreasing in NIR bands (Caturegli et al., 2020).Figure 3 illustrates the trend of gradient change in vegetation indices after standardization.It is clearly shown that the PSRI index exhibits a clear gradient trend on PS, while all vegetation indices exhibit a clear gradient trend on S2.The performance of the same vegetation index on different satellites is not entirely consistent.There are two possible reasons for this phenomenon: first, the difference in the numbers of sampling data between PS (200 samples) and S2 (74 samples), and second, due to the higher temporal resolution of PS, the fluctuations caused by changes in sensor imaging conditions are more significant.

Feature Performance
To reduce the complexity of the model and obtain the optimal feature domain, we quantitatively evaluate the contribution of each feature to the model using the Gini importance of RFR.Features were added step by step to the RFR model according to their importance and the accuracy of R 2 by step was observed on the train-test split dataset (with a test set ratio of 0.2). Figure 4 shows the feature importance ranking and retrieval steps.The result indicated that for PS, the significant features are B5 (importance=0.74) and PSRI (importance=0.11).For S2, the most significant features are B11 (importance=0.33),B12 (importance=0.26), and PSRI (importance=0.09).As features were gradually added to the model, the R 2 of the training and test set first increased and then stabilized or decreased, and the model reached relatively high precision at the inflection point of the accuracy curve, where instructed the optimal feature domain.Therefore, the optimized feature domain for PS is B5 and PSRI (R2=0.97/0.87,RMSE=0.0220/0.428), and for S2, it is B11, B12 and PSRI (R2=0.99/0.94,RMSE=0.0116/0.0292).On S2, R 2 increased by 2% on the training set and by 8% on the validation set when compared to PS, while RMSE decreased by 47% on the training set and by 32% on the validation set.

Wheat KMC Retrieval
The study area's cloud-free PS and S2 satellite images were used to conduct wheat KMC mapping using the trained RFR models.The boundaries of fields were manually drawn, and Figure 5 shows the distribution of wheat KMC on different dates.The result indicates a consistent decrease in wheat KMC over time, confirming the reliability of the trained RFR models.However, wheat KMC prediction of S2 at the edge of fields was significantly lower.Satellite images with higher spatial resolution can display more details of wheat KMC in different locations within fields.Wheat KMC data was extracted pixel by pixel and the dynamic changes of wheat KMC within the research area were plotted over time (Fig 6).The wheat KMC decreased from approximately 50% at the start to around 10% within 12 days, following an inverted-S-shaped curve with a faster initial decrease and a slower decline later on.On 27 th May and 1 st June, PS and S2 had similar distribution ranges of moisture content, which can confirm the uniformity of the RFR model from different data sources.However, due to its higher temporal resolution, PS was able to capture the dynamic changes in wheat KMC more clearly.

Additional Effective Features
The random forest model has been proven to have excellent predictive performance and embeds Gini importance to measure the input features (Mutanga et al., 2012;Smith et al., 2013;Belgiu and Drăguţ, 2016).In the feature selection section, all features were ranked according to their Gini importance and gradually added to the RFR model to determine the effective feature domain based on the inflection point of the accuracy curve.The importance calculation result showed that the importance of B5 in PS was much higher than that of other features, as well as B11 and B12 of S2.The appropriately selected domain contained B5 and PSRI on PS, while B11, B12, and PSRI on S2.
However, the disadvantage of Gini importance is that, in the case of multiple related features, it may easily ignore the second feature, leading to biased feature importance rankings (Strobl et al., 2007;Nicodemus et al., 2010).To more accurately delineate domains for wheat KMC, B5 on PS and the SWIR spectrum part on S2 were ignored to find out if there were still effective features for measuring wheat KMC (Figure 7).The same process was executed and the alternative feature domains were selected.The feature domains are B1, B3, B7, PSRI, EVI on PS (R 2 =0.95/0.86,RMSE=0.0269/0.0450), and PSRI, NDRE1, VARI, RESI, B4 on S2 (R 2 =0.99/0.89,RMSE=0.0158/0.0381).Compared with the initial feature domain, alternative domains still achieved high accuracy levels (R 2 >0.85), although the accuracy of models was slightly lower than that of the original model.Accuracy of R 2 decreased by 2%/3% and RMSE increased by 22%/5% on PS, while R 2 decreased by 0%/5% and RMSE increased by 36%/30% on S2.Furthermore, based on the RFR models constructed from alternative feature domains, wheat KMC mapping was carried out.The mapping results of representative days, 27 th May and 1 st June, were compared (Figure 8), and it was found that on PS, the new feature domain produced obvious noise in the mapping, possibly because the sensitivity of the new combination of features was not as good as that of the original features, and combining more features brought more noise to the model.On S2, the original features showed a significant underestimation of the predicted moisture content at the edge of fields, and the alternative feature domain significantly improved this phenomenon, but there was significant heterogeneity in the predicted moisture content in the centre of fields.

Explanation for Sensitive Features
Based on existing research, it is evident that the spectral reflectance of the wheat canopy gradually changes from the filling stage to the maturity stage, with two absorption valleys disappearing in the visible spectrum and the reflectance gradually increasing in the infrared spectrum (Sharabian et al., n.d.).As the wheat matures, both chlorophyll and moisture content decreases rapidly (Gergely and Salgó, 2003;Islam et al., 2014).Chlorophyll has a significant absorption spectrum in the visible part (chlorophyll a: 430nm, 660nm; chlorophyll b: 460nm, 640nm) (Curran, 1989), and the most dramatic changes occur in the range of 600nm-700nm.Therefore, B5 in PS and B4 in S2 are sensitive parameters for wheat KMC, and other bands and indices related to chlorophyll have a certain impact on wheat KMC estimation.Similarly, SWIR bands like B11 and B12 in S2 located within the water absorption spectrum range (Hamrouni et al., 2022) are also strong indicators of changes in wheat KMC.In terms of vegetation indices, PSRI maximizes sensitivity between carotenoids to chlorophylls ratio, and it can indicate an increase in canopy stress, which effectively characterizes plant fruit maturity, so it also shows strong sensitivity to changes in wheat KMC.In addition, when using NIR regions to retrieval water content, there is a noticeable phenomenon of under-prediction at the edge of the field.From figure 9, it can be seen that the spectrum at the edge of the field deviates more from the middle towards the bare soil in the infrared region.The reason for this may be that the longer the wavelength of the spectrum, the more diffraction occurs (Glenar et al., 1994).Therefore, in sparsely populated areas of wheat density at the edge of the field, infrared light is more likely to pass through the plants and reach the ground.It carries more information similar to bare soil.

DISCUSSION
Current wheat KMC measurement methods are passive and lagged, only judging the quality of harvested grain, and cannot guarantee a high qualification rate of the harvested grain.This study introduced multispectral imagery to retrieval wheat KMC in-field before harvest.Sentinel-2, a widely utilized multispectral satellite, and the recently introduced PS images were selected as data sources in this study.Notably, PS boasts high-frequency revisiting capabilities and superior spatial resolution that make it an outstanding data source for monitoring rapidly changing wheat KMC, though it must be noted that commercial satellite image acquisition can be a costly undertaking.Thus, assuming the implementation of precise predictive models, the use of public Sentinel-2 images with lower spatial and temporal resolution may offer a cost-effective method of accurately harvest time guidance.
It is important to acknowledge the limitations of this study.Firstly, the method employed in this study relies on multispectral satellite as the data source, which may be susceptible to cloud cover, potentially hindering the timely acquisition of wheat KMC distribution information in-field.
Secondly, the use of RFR algorithm to directly invert wheat KMC requires further validation to establish the model's transfer-ability over time and space.Lastly, the drying process in-field is a dynamic interaction between wheat and multiple environmental factors.Nevertheless, this study limits its consideration to surface temperature and relative humidity, neglecting other potentially influential factors such as rainfall, wind, atmospheric pressure, and so on.To overcome these limitations, extensive samples covering extended periods and broad regions, as well as a more in-depth exploration of the underlying mechanisms, are needed to aid in the precise simulation and prediction of wheat KMC.This could help further enhance crop growth models in the moisture module and provide reliable tools for making optimal crop harvest decisions.

CONCLUSION
This study explores the ability of multispectral images to monitor the dynamic of wheat KMC in-field.Based on PS and S2 images, sensitive features for wheat KMC were selected from original bands and vegetation spectral indices to build the estimation model by RFR.It successfully generated spatial distribution maps of wheat KMC at 3m and 20m resolutions within the study area, providing an efficient monitoring solution for rapid dynamic changes of wheat KMC over large areas.
From the result, the following major conclusions are drawn: 1) There are different sensitive bands for wheat KMC on different satellite platforms, which are B5 on PS, and B11 and B12 on S2.The vegetation index PSRI exhibited strong sensitivity on both data sources, making it an excellent feature for monitoring wheat KMC.Using bands in the SWIR range for retrieval can easily cause deviations of wheat KMC at the edge of fields.
2) Based on the selected features, the RFR model achieved high inversion accuracy on both data sources (R 2 >0.85), with slightly higher accuracy on S2.
3) In terms of the accurate dynamic depiction ability of wheat KMC in-field, S2 is weaker than PS due to the lower spatial and temporal resolution.

Figure 1 .
Figure 1.The situation of the study area and sampling points layout.(a) Three field were selected as sample-collecting fields.(b) Location details of the study area.

Figure 4 .
Figure 4. Feature importance ranking and RFR retrieval precision plots.

Figure 5 .
Figure 5.Estimated wheat KMC map in the study area and period based on the prediction models of PS (a) and S2 (b).

Figure 6 .
Figure 6.Boxplot of dynamic changes of wheat KMC within the research area.

Figure 7 .
Figure 7. Features without B5 on PS and SWIR spectrum part on S2 were evaluated by Gini importance and then input to the RFR in order of values of importance.

Figure 8 .
Figure 8.Comparison of wheat KMC results based on original and alternative domains.(a) On PS.(b) On S2.

Table 1 .
Specific date statistic of collected images.