Exploring the prospects of UAV-Remotely sensed data in estimating productivity of Maize crops in typical smallholder farms of Southern Africa

This study estimated maize grain biomass, and grain biomass as a proportion of the absolute maize plant biomass using UAV-derived multispectral data. Results showed that UAV-derived data could accurately predict yield with R 2 ranging from 0.80 - 0.95, RMSE ranging from 0.03 - 0.94 kg/m 2 and RRMSE ranging from 2.21% - 39.91% based on the spectral datasets combined. Results of this study further revealed that the VT-R1 (56-63 days after emergence) vegetative growth stage was the most optimal stage for the early prediction of maize grain yield (R 2 = 0.85, RMSE = 0.1, RRMSE = 5.08%) and proportional yield (R 2 = 0.92, RMSE = 0.06, RRMSE = 17.56%), with the Normalized Difference Vegetation Index (NDVI), Enhanced Normalized Difference Vegetation Index (ENDVI), Soil Adjusted Vegetation Index (SAVI) and the red edge band being the most optimal prediction variables. The grain yield models produced more accurate results in estimating maize yield when compared to the biomass and proportional yield models. The results demonstrate the value of UAV-derived data in predicting maize yield on smallholder farms – a previously challenging task with coarse spatial resolution satellite sensors.


INTRODUCTION
Agriculture continues to be the bedrock of food systems, especially in sub-Saharan Africa where population growth and demand for food are rapidly increasing.About 50%-90% of the population in developing countries is dependent on agriculture for employment, livelihood and income.Between 70-90% of this percentage are smallholder farmers surviving on subsistence farming in fields characterised by infertile soils in light of climate change-related shocks and infertile soils.Most of these fields are less than a hectare and less endowed with resources to withstand climate shocks (Giller et al., 2021;Jin et al., 2019).In a recent sub-national census Jin et al. (2019) showed that 50% of food calories in the region were produced on farms of less than 5 ha in size.However, despite the sector's fundamental role in the region's economies and food security, there are a plethora of challenges accelerating food and nutrition insecurities in this region.The principal cause is the decline in the production of staple crops (Giller et al., 2021).Specifically, drastically decreasing yields of critical food crops such as maize are attributed to among others the utility of rudimentary farming practices, the low inputs that characterize conventional farming systems, lack of incentives and appropriate technologies to optimize production, especially on smallholder farms (Giller et al., 2021;Tan et al., 2020).
Traditionally, several approaches that include ground observations, surveys and measurements have been adopted in crop monitoring (Mditshwa, 2017).However, these approaches are limited by their high labour and financial costs and therefore not ideal for continuous and time-efficient crop monitoring (Jégo et al., 2012).Howeverhe utilization of such multispectral satellite datasets in crop monitoring and yield estimation in smallholder farms is limited by their relatively coarse spatial and temporal resolutions (Stratoulias et al., 2017).Whereas there are numerous satellite images with high spatial resolutions (e.g.SPOT, Worldview and QuickBird and Planetscope), these are not cost-effective for monitoring smallholder crops.Moreover, they are often associated with processing complexities which makes them unsuitable for monitoring and estimating maize yield across the growing season at a farm scale (Chivasa et al., 2020;Jin et al., 2019).
On the other hand, UAVs, also known as drones have emerged as a prospective alternative source of remotely sensed data suitable for mapping and monitoring crop productivity at a farm-to-field scale (Maes et al., 2018).With advancements in technology, the weight and size of multispectral cameras have been drastically reduced to ease mounting on UAVs for use in precision agriculture (Candiago et al., 2015).UAV systems provide high spatial resolution remotely sensed data at user-defined revisit frequencies and areas of interest, hence time-efficient and cost-effective agricultural applications such as yield modelling (Schut et al., 2018;Ziliani et al., 2018).Furthermore, in estimating maize crop yield using temporal remotely sensed datasets, it is not very clear whether the actual grain biomass (excluding the foliage and stem) or the biomass of grain yield as a proportion of ultimate plant biomass exhibits more accurate yield estimates.This has further compounded the challenge of using remotely sensed data to estimate the yield of crops such as maize when compared with crops such as cabbages and spinach (Abdel-Rahman et al., 2014) where biomass is derived from the foliage which in turn directly interacts with the spectral signatures used in yield estimation.In this regard, very few studies have utilized UAV-derived data in estimating maize yield at smallholder farms in Sub-Saharan Africa.Hence, there is a need to test the utility of multispectral and thermal drone-derived remotely sensed datasets to not only estimate maize yield in smallholder farms of the southern African region.Testing drone-derived remotely sensed data in estimating maize yield is important for optimizing agricultural production, a challenge using coarse spatial resolution image data.Therefore, this study aimed to test the utility of UAV-derived data in estimating maize yield across the growing season in a smallholder farm.To address this overarching objective, the study sought to; i) predict maize yield using UAV remotely sensed data in conjunction with the RF algorithm and determine the most optimal growth stage for yield prediction, and ii) compare the performance of using the actual grain biomass (excluding the foliage), and the biomass of grain yield as a proportion of ultimate plant biomass in estimating maize yield.To achieve this, the combination of bands and VIs and the RF algorithm regression ensemble was used.

Study area
This study was conducted on a smallholder farm located in Swayimane, KwaZulu-Natal, South Africa.The farm is located between 29°31'24''S and 30°41'37'' E (Figure 3.1).The area has a sub-humid climate with an average temperature of 20 ℃ and average precipitation of 900-1200 mm per annum (Miya et al., 2018).The study was conducted on a 2699.005m 2 maize field where the maize was sawn in November with approximately 160 days of the growing season.The maize field.
The maize growth stages were divided into two subgroups, the vegetative growth stages which are the early growth stages covering the V8-V10, V12-V14 and VT-R1 growth stages and the reproductive growth stages covering the R2-R3 and R3-R4 growth stages.For details regarding the maize growth stages considered in this study see Buthelezi et al. (2023).

Agricultural practices
Maize seeds were sawn by hand in February 2021 and weeds were constantly hand removed throughout the growing season.Cow manure, instead of chemical fertilizers were used to optimize soil fertility.The maize crops in the study area were rain-fed.

Sampling strategy for yield measurements
To optimize the sampling procedure, a polygon of the entire experimental field was generated in Google Earth Pro and imported into ArcGIS 10.6.Subsequently, 63point locations were generated inside the experimental field plot polygon based on stratified random sampling to determine the sampling points for yield data collection.These points were then uploaded into a Trimble handheld GPS with a sub-meter accuracy of 30 cm.The GPS was then used to locate and navigate to the sampling points in the field.At each location, a square meter plot was established and maize plants in proximity to each sample point were selected for yield estimation.To determine the absolute maize plant biomass, the sample plants (the entire stalk and the cob with the grains) were harvested manually during the reproductive stage R3-R4, which marked the end of the growing season of maize.These were lightly shredded to fit in the brown bags and appropriately labelled.The entire plant biomass was first oven-dried at 60 ℃ for 48 hours and then weighed to determine the entire plant biomass before separating the cobs from the plant.After the separation, the grains were shelled to determine the grain yield biomass.The dry grains were weighed and grain yield was calculated as the weight in kg/m 2 .The dry grains were then divided by the absolute plant biomass to determine the proportional yield.These weights were then recorded on an excel spreadsheet together with the coordinates of each sampling point.

UAV system and imaging sensor
A DJI Matrice 300 UAV was used in this study for acquiring remotely sensed data (Figure 3.2 a).The DJI M300 flight controller was used for autonomous flights and a DJI Data Link was used to transmit flight parameters to the controller and to remotely control the UAV.A MicaSense Altum multi-spectral camera in conjunction with a DSL 2 was used for UAV spectral imaging of the study site (Figure 3.2 b).MicaSense Altum sensor is equipped with a DSL2 GPS to determine image coordinates during the acquisition period.The device acquires images simultaneously at a 5.2 cm spatial resolution in the blue (475 -559 nm), green (560 -667 nm), red (668 -716 nm), red edge (717 -839 nm), NIR (840 nm) and thermal (8-14 um) regions of the EMS.

Image acquisition and pre-processing
A polygon was digitized on Google Earth and exported as a kml file.The polygon was then imported into the controller and used to establish the flight plan, flight altitude and speed parameters for image acquisition.Prior to image acquisition, the sensor was calibrated by acquiring images of the radiometric calibration panel before and after the reconnaissance flight.Five images were acquired at different times across the growing season between February and May of 2021 (days after emergence, 35, 49, 62, 78 and 94).These images, covering the V8 to R3-R4 growth stages were acquired under clear sky conditions between 10:00 AM to 1:00 PM local time, which is the period of the day when changes in solar zenith angle are minimal and the radiation from the sun at maximum.The images from calibration targets were used in calibrating and correcting the reflectance of images.The calibrated images were then exported alongside all the other images into Pix4 D for stitching and radiometric correction.To accurately retrieve georeferenced orthomosaicked images of the study plot for the different growth stages, the Altum camera was set to 80% overlap mode using the sensor's Wi-Fi.This facilitated the stitching of the images using Pix4D.After transferring the images into Pix4D fields, they were calibrated, radiometrically corrected and stitched to create orthoimages for the entire study site.Geometric correction was done in QGIS 3.12.3using field-collected ground control points.

Calculation of VIs
The UAV-derived image bands were used to compute VIs and both spectral bands and indices were used to predict maize yield.Normalized Vegetation Index, Enhanced Vegetation Index, Soil Adjusted Vegetation Index , Optimized Soil Adjusted Vegetation Index and Simple Ratio were computed and used in conjunction with RF to estimate yield.

Correlation between grain yield and the entire plant biomass:
A correlation between the grain and the biomass data was determined to evaluate whether there was a link between the accumulated biomass and the actual yield at the R3-R4 growth stage.A Pearson product-moment correlation test was conducted in this regard following a data normality test, which indicated that the data did not significantly deviate from the normal distribution.

Maize yield prediction and accuracy assessment:
To test the relationship between biomass, grain yield and proportional yield determined at the R3-R4 stage, the collected 63 yield samples and UAV data (i.e.combination of bands and VIs data) were divided into training (70%) and test (30%) datasets to derive models using the RF algorithm in R statistical package.The RF algorithm was adopted in this study as it is a nonparametric statistical technique that uses a bagging-based approach to build an ensemble of regression trees while ranking important variables that produce an independent measure of prediction error Prasad et al. (2006).In R, the ntree and mtry parameters were optimized using the doBest function.The function selected the ntree and mtry parameters with the lowest RMSE to determine the most influential parameters.These parameters were tuned to 600 for ntree and five for mtry.In addition, the most optimal growth stage at which the combination of bands and VIs were highly correlated to the yield was assessed to determine the most suitable period to predict maize yield before harvest.Test data (30%) was used to evaluate the model performance of the derived models.Performance indicators such as R 2 , RMSE and RRMSE were determined and used to assess the accuracy of each model.RF Gini impurity index was employed to select optimal spectral features for yield estimation.

Descriptive statistics
The highest maize grain yield and proportional yield were 4.4 kg m 2 and 0.76 kg/m 2 and the lowest were 0.16kg/m 2 , and 0.04 /m 2 , respectively.There was considerable variation in maize yield samples in the study.The standard deviation was 1.08 and 0.15 for grain yield and proportional yield, respectively.Furthermore, a strong (R 2 of 0.74) positive correlation between the grain yield samples and the overall biomass of the maize plants was attained.

Derived maize yield prediction models and their accuracies
Figure 4 illustrates the model accuracies obtained in predicting the grain yield and proportional yield based on the RF algorithm.The accuracies of the prediction models varied greatly across the maize growing season.
The V8-V10 model demonstrated the lowest prediction accuracy in estimating the grain yield (R 2 = 0.85 and RMSE = 0.6 kg/m 2 ).This was followed by V12-V14 and VT-R1 with an R 2 of 0.89, RMSE of 0.12 kg/m 2 and R 2 of 0.85, RMSE of 0.1 kg/m 2 , respectively.The prediction accuracy increased significantly with the R2-R3 model (R 2 = 0.95 and RMSE = 0.09 kg/m 2 ).The R3-R4 model optimally predicted the grain yield with the lowest RMSE = 0.03 kg/m 2 and R 2 = 0.92 (Figure 3.4 e).The variables that had the highest influence in the grain yield model were ENDVI, NIR, NDVI and the red edge band in ascending order of importance (Figure 3.5 e).
When predicting the proportional yield, the V12-V14 model produced the lowest prediction accuracy with an R 2 of 0.92 and RMSE of 0.11 kg/m 2 .The prediction of proportional yield improved in the V8-V10, VT-R1 and R2-R3 models with an R 2 of 0.91, RMSE of 0.09 /m 2 ; R 2 of 0.92, RMSE of 0.06 /m 2 and R 2 = 0.92, RMSE = 0.07 /m 2 .The optimal model for estimating proportional yield produced an R 2 of 0.95 and RMSE = 0.07 /m 2 (Figure 3.4 e).The most suitable predictor variables included NDVI, the green, NIR and red edge bands (Figure 3.5 e).
In comparing the performance of the grain yield and proportional yield variables in predicting yield across all growth stages, the results varied greatly (Figure 4).For example, when estimating yield at the V8-V10 growth stage, the proportional yield model exhibited the poorest prediction accuracy with an RRMSE of 30.43% followed by the grain yield model with an RRMSE of 27.99%.Comparatively, the most optimal model in estimating yield during the R3-R4 growth stage was the grain yeild model with an RRMSE of 2.21% (Figure 2 e (i)).The most important variables include the ENDVI, NIR, NDVI and Red Edge, in order of importance (Figure 5 a).
Similarly, the proportional yield model yielded the poorest model with an RRMSE of 39.91% followed by the biomass model with an RRMSE of 15.37% at the V12-V14 growth stage.The grain yield model optimally predicted maize yield with the lowest RRMSE = 5.44% at the V12-14 (Figure 4 b).The most optimal variables for this prediction were the green, red edge, red and blue bands (Figure 5 b).
In predicting yield at the VT-R1 growth stage, the proportional yield model produced the highest RRMSE of 17.56%.The prediction accuracy improved with the biomass and grain yield models (RRMSE = 12.56% and 5.08%, correspondingly) (Figure 4 c).The variables that had the highest influence in the grain yield model were SAVI, NDVI, ENDVI and the green band, in order of importance (Figure 5 c).
When predicting yield in the R2-R3 growth stage, the highest RRMSE of 22.57% was obtained by the proportional yield model.The biomass model improved the prediction by a magnitude of 8.1%, i.e., RRMSE = 14.47%.Similarly, the grain yield model was the optimal model for estimating yield at the R2-R3 growth stage (Figure 4 d).The red-edge band, NDVI, ENDVI and SR were the most influential variables for this model (Figure 5 d).
For the R3-R4 growth stage, the proportional yield exhibited the lowest prediction accuracy with an RRMSE of 21.78%.The prediction of yield improved significantly with the biomass model (RRMSE = 12.97%) and even greater with the grain yield model (RRMSE = 2.21%) (Figure 4

DISCUSSION
This study sought to test the capability of UAV-derived data in estimating maize yield across the growing season using UAV remotely sensed data.Specifically, this study sought to predict maize grain and proportional yield using UAV images and the RF algorithm in smallholder farms.

Maize yield prediction models
The results of this study show that the early growth stages of the crop yielded lower overall accuracies for grain yield and proportional yield followed by some improvements in the later stages of growth (Figure 4).Specifically, the V8-V10, V12-V14 and VT-R1 growth stages had lower overall accuracies when compared to the R2-R3 and R3-R4 growth stages.Several studies (Al-Gaadi et al., 2016;Chivasa et al., 2017;Guindin-Garcia, 2010;Son et al., 2013) have noted that in the early stages of crop development, vegetation reflectance is affected by the soil background, which explains the low performance of UAV data in predicting maize biomass, grain yield and proportional yield at the early (vegetative growth) stages of this study.At this stage, the maize leaves are not fully grown, exposing the surrounding soil, which then interferes with the plant's reflectance as the sensor also picks up the soil reflectance (Zhang et al., 2019a).
In contrast, the later growth stages of the crop yielded higher overall accuracies.Specifically, the R2-R3 and R3-R4 growth stages had higher accuracies when compared to the V8-V10, V12-V14 and VT-R1 stages.The high performance of the UAV data in predicting maize yield at the R2-R3 and R3-R4 stages of the growth cycle can be explained by existing literature which has reported significantly high accuracies in the prediction of maize yield at the late (reproductive) stages of the crop (Guindin-Garcia, 2010;Mditshwa, 2017).Literature notes that at this stage, the maize leaves have grown to mid-density covering the surrounding soil and therefore crop reflectance is not impacted by the soil background (Mkhabela et al., 2005;Tumlisan, 2017;Tunca et al., 2018).
Regarding model variable importance, SAVI, OSAVI, and the blue and red bands were more important in the prediction at the early stages than in the late stages of the crop phenological cycle.The value of SAVI and OSAVI can be attributed to their ability to suppress soil background, hence better prediction at minimal leaf coverage resulting and soil exposure (Ren and Zhou, 2019;Zhang et al., 2019b).The importance of the blue and red bands for these models can be explained by soil being more dominant than vegetation in the early stages of the crop resulting in high reflectance in the blue and red region of the EMS (Ngie and Ahmed, 2018).
Comparatively, NDVI, ENDVI, the green, red, red edge and NIR bands were of significant importance in the prediction models at the R2-R3 and R3-R4 crop growth stages.The importance of NDVI and ENDVI in these models could be a result of the fact that when the reflectance measurements for the R2-R3 and R3-R4 growth stages were taken, a saturation of the plant canopy had not occurred, the plant canopy had only accumulated to mid-density and there is a good relationship between NDVI and ENDVI and biomass and yield at mid-density canopies, which characterize the R2-R3 and R3-R4 maize growth stages (Awad, 2019).The importance of the green, red, red edge and NIR bands in the models of the R2-R3 and R3-R4 growth stages for this study can be attributed to the fact that there was a dominance of vegetation which reflects strongly in the green and NIR regions of the EMS and highly absorbs in the red and red edge regions of the EMS (Khaliq et al., 2019;Marcial-Pablo et al., 2019).

Determining the most optimal growth stages and variables for yield prediction
The best-fit model for predicting maize grain yield was obtained at the R3-R4 growth stage, with ENDVI and the red edge band being the most important variables for the prediction of maize biomass and ENDVI and NDVI being the most important for the prediction of grain yield.The influence of the ENDVI, NDVI and the red edge in the prediction at this stage could be explained by the good relationship between the two indices and yield at middensity canopies before saturation (Mutanga et al., 2012;Tan et al., 2020).On the other hand, the literature notes that the red edge section of the EMS is related to chlorophyll and biomass, which directly relates to yield (Dube et al., 2017;Sibanda et al., 2017).Generally, middensity canopies are characterized by a high amount of biomass, which is associated with high chlorophyll content and carbon assimilation which are sensitive to the red edge section of the EMS (Sibanda et al., 2021).In addition, the best-fit model for predicting maize proportional yield was obtained in the VT-R1 growth stage with NDVI and SAVI being the most important variables for the prediction of proportional yield.The significance of NDVI and SAVI in the prediction model of maize of proportional yield at this stage can be attributed to the fact that this is the middle stage where the canopy has not grown to mid-density resulting in significant soil exposure (Mditshwa, 2017).This then results in SAVI being important in suppressing the soil background effect and allows NDVI to perform well as it has a good relationship with the yield at this stage's canopy level because the canopy has not yet reached saturation, as canopy saturation hinders the performance of NDVI (Mutanga et al., 2012).
Regarding the best-fit model for maize biomass and grain yield which was obtained at the R3-R4 reproductive development stage and proportional yield at the VT-R1 vegetative development stage (78 and 62 days after emergence) of the growth cycle.Using the R3-R4 growth stage for biomass and grain yield prediction could be late for the adoption of any effective measure before harvest.A significant relationship was found at the VT-R1 (62 days after emergence) growth stage for biomass as well as grain yield.Based on our findings, this is the optimal stage at which maize yield could be predicted before harvesting.The most significant variables for the optimal biomass, grain yield and proportional yield prediction models were the red edge band and ENDVI, SAVI and NDVI, ENDVI and the red edge band respectively.Furthermore, the grain yield produced higher prediction accuracies in estimating maize yield for most of the crop's growth stages (V12-V14, VT-R1, R2-R3 and R3-R4) when compared to the absolute plant biomass and the biomass of grain yield as a proportion of ultimate plant biomass.The absolute plant biomass was only optimal in the V8-V10 growth stage and the proportional yield produced the poorest yield prediction accuracies in all of the growth stages.Therefore, the grain yield proved to be the most optimal in estimating maize yield.
Limitations of the Proposed Methodology: Despite the flexibility of acquiring high spatial resolution data using UAVs and the associated models, the spectral resolution of on-board sensors regulates the types and number of spectral derivatives that can be generated to optimise the accuracy of models.Higher spectral resolutions could lead to improved model accuracies in predicting crop yield.Furthermore, in this study, the harvesting procedure was limited to the end of the growing season, which restricted the measurement of biomass accumulation and yield to that specified period.Future studies could consider multiple fields and different crop varieties to generate more robust models with sufficient biomass data collected throughout the growing season.

CONCLUSION
This study aimed to predict maize yield (grain yield and proportional yield) across the growing season in a smallholder farm based on UAV remotely sensed data.The following conclusions were drawn: • UAV-derived data optimally predicted maize yield during the R3-R4 growth stage using ENDVI, NDVI and the red edge band • The VT-R1 stage was the most optimal stage for the early prediction of maize yields using SAVI, NDVI, ENDVI and the red edge band.

•
The grain yield models produced higher accuracies in estimating maize yield when compared to the absolute plant biomass and the biomass of grain yield as a proportion of absolute plant biomass models.
The characterised variations in field productivity can assist farmers and decision-makers in identifying lowyield areas within the field to adjust their management practices to maximize farm productivity.These findings highlight the utility of UAV systems in optimizing agricultural production through precision farming on smallholder farms, necessary for poverty alleviation and food and nutritional security

Figure 1 :
Figure 1: a) Location of the experimental field plot in Swayimane, KwaZulu-Natal, South Africa, and b) & c)The maize field.
e).The most influential variables for this prediction were NDVI, NIR, ENDVI and the red edge band (Figure5 e).

Figure 2 :
Figure 2: Relationship between observed and predicted ii) grain yield and iii) proportional yield based on the combination of bands and VIs using the RF Model for a) V8-V10 b) V12-V14 c) VT-R1 d) R2-R3 and e) R3-R4 maize growth stages.

Figure 4 :
Figure 4: The spatial distribution of modelled maize a) grain yield and c) proportional yield based on the most optimal RF models.
FUNDING: Water Research Commission of South Africa (WRC) through Project WRC K5/2971//4 and in part by the National Research Foundation of South Africa (NRF) Research Chair in Land Use Planning and Management (Grant Number: 84157).Rahman, E.M., Mutanga, O., Odindi, J., Adam, E., Odindo, A., Ismail, R., 2014.A comparison of partial least squares (PLS) and sparse PLS regressions for predicting yield of Swiss chard grown under different irrigation water sources using hyperspectral data.Computers and Electronics in Agriculture 106, 11-19.