Cross Domain Early Crop Mapping with Label Spaces Discrepancies using MultiCropGAN

Mapping target crops before the harvest season for regions lacking crop-specific ground truth is critical for global food security. Utilizing multispectral remote sensing and domain adaptation methods, prior studies strive to produce precise crop maps in these regions (target domain) with the help of the crop-specific labelled remote sensing data from the source regions (source domain). However, existing approaches assume identical label spaces across those domains, a challenge often unmet in reality, necessitating a more adaptable solution. This paper introduces the Multiple Crop Mapping Generative Adversarial Neural Network (MultiCrop-GAN) model, comprising a generator, discriminator, and classifier. The generator transforms target domain data into the source domain, employing identity losses to retain the characteristics of the target data. The discriminator aims to distinguish them and shares the structure and weights with the classifier, which locates crops in the target domain using the generator’s output. This model’s novel capability lies in locating target crops within the target domain, overcoming differences in crop type label spaces between the target and source domains. In experiments, MultiCropGAN is benchmarked against various baseline methods. Not-ably, when facing differing label spaces, MultiCropGAN significantly outperforms other baseline methods. The Overall Accuracy is improved by about 10%.


Introduction
Mapping target crops in the early stages before the harvest season is crucial for a variety of agricultural applications such as agricultural planning, resource allocation, crop insurance, and risk management (Waldner et al., 2015, Singha andSwain, 2016).To locate the target crops, the primary method involves analyzing time-series multispectral images (MSI), while MSI provides detailed spectral information, essential for understanding vegetation's spectral characteristics, influenced by its structural composition, leaf biochemistry, and phenological stages.To generate precise crop cultivation maps from these images, various supervised Deep Learning (DL) methods, including Convolutional Neural Networks (CNN) (Wang et al., 2021a, Hamidi et al., 2021), Temporal Convolutional Neural Network (Tem-pCNN) (Pelletier et al., 2019), and Long Short-Term Memory (LSTM) (He et al., 2019, Crisóstomo de Castro Filho et al., 2020), have been explored.Notably, the findings indicate that DL approaches outperform conventional techniques like Support Vector Machine (SVM) (Mathur and Foody, 2008), Decision Trees (DTs) (Pittman et al., 2010, Tariq et al., 2023), and Random Forest (RF) (Duro et al., 2012, Wang et al., 2023b).These methods leverage publicly available datasets such as the United States Department of Agriculture (USDA)'s Cropland Data Layer (CDL) (Boryan et al., 2011) as the ground truth (GT).
Unfortunately, the collection of GT for crop types is expensive.In instances where GT data is absent, prevailing approaches adopt the "direct transfer strategy".This strategy involves training a classifier using labelled data from different regions (source domain) and then applying this trained model to regions lacking GT (target domain) (Hao et al., 2020, Ge et al., 2021).However, the trained model performs badly due to differences in soil composition, climate conditions, and crop progress, leading to discrepancies between the distributions of source and target data, commonly known as cross-domain or domain shift issue (Konduri et al., 2020).
To address domain shift, Domain Adversarial Neural Networks (DANN) (Ajakan et al., 2014) and its variants, such as Self-Training with Domain Adversarial Network (STDAN) (Kwak and Park, 2022), Phenology Alignment Network (PAN) (Wang et al., 2021b), and Deep Adaptation Crop Classification Network (DACCN) (Wang et al., 2023a), are employed.These models aim to extract invariant features from both target and source domain data, subsequently using these invariant features for crop mapping classification.Alternatively, Generative Adversarial Neural Networks (GAN) (Creswell et al., 2018), such as the Crop Generative Adversarial Network (CropGAN) (Wang et al., 2024), are utilized.CropGAN transforms time-series MSI data from the target domain to the source domain while preserving local structures.This transformation enables a pretrained crop mapper classifier, using labelled data from the source domain, to accurately locate the target crop using the transformed target domain data.
However, these methodologies are predicated on the assumption that the crop-type label spaces between the target and source domains are similar.While effective in addressing missing labels and mitigating the adverse effects of domain shifts, these strategies are limited by the disparities in label spaces between the domains.When discrepancies in label spaces occur between these domains, the extractor or generator still endeavors to align the data distributions of both domains.Consequently, some data from one domain must correspond to the other domain data associated with the labels absent in the first domain, leading to misclassification by the crop classifier.To address this label space discrepancies issue, the Multiple Crop Mapping Generative Adversarial Neural Network (MultiCropGAN) is presented in this paper.Our contributions can be summarized as follows: discriminator, and a classifier, as a solution to mitigate the domain shift issue with label space discrepancies encountered in early crop mapping tasks.
• Propose to incorporate identity losses into the generator's loss function to ensure that the generator refrains from making unnecessary alterations to the data, thereby preserving its essential characteristics.
• Conduct the experiments based on study areas encompassing the USA and Canada.A comparative analysis was undertaken, pitting MultiCropGAN against various baseline methods, including the CropGAN, STDAN, DACCN, Tem-pCNN, and RF.MultiCropGAN demonstrates the highest classification metrics when handling divergent label spaces in the target and source domains.

Related Work
Within a confined area, specific crop types often exhibit minimal variability in the growth period, offering reliable priors for accurate sample inference.However, outside these labelled regions or domains, significant phenological disparities arise within the same crop type due to variations in environmental conditions.This presents a considerable challenge for crossdomain classification.Existing approaches address this challenge through two distinct perspectives: From a feature perspective, previous works introduced DANN variant methods to map samples from diverse regions into a shared feature subspace, thereby reducing differences in deep features.For instance, (Kwak and Park, 2022) presents the STDAN, a novel unsupervised domain adaptation framework for crop type classification.Moreover, the DACCN (Wang et al., 2021b) and the PAN (Wang et al., 2023a), extended the loss function using the Maximum Mean Discrepancy (MMD) and the Multiple Kernel variant of Maximum Mean Discrepancy (MK-MMD), achieving improved accuracy compared to CNN and LSTM methods without domain adaptation.From a sample perspective, two distinct methods emerge.The first method involves fine-tuning pre-trained models by utilizing a few high-quality samples from the target domain.This process enables the adaptation of the original model to the new distribution.For instance, in (Tong et al., 2020), deep models were refined for nationwide land cover classification by incorporating pseudo-labels with high confidence.Similarly, in (Hamrouni et al., 2021), new samples from the target domains were annotated to adjust RF classifiers through active learning.However, this approach often requires labelling additional samples, making it impractical for extensive area research.On the other hand, the second method, exemplified by the CropGAN (Wang et al., 2024), involves employing a GAN model to learn a mapping function.This function transforms time-series MSI sample data from the target domain to the source domain while preserving local structures.This transformation allows a pre-trained crop mapper classifier, utilizing source domain labelled data, to accurately process the transformed data, thereby enabling highaccuracy crop mapping without labelled data in the target domain.
However, the approaches discussed earlier, regardless of their perspective, face limitations arising from differences in label spaces between the target and source domains.As far as our knowledge extends, we are the pioneering contributors to tackling label space discrepancies from a sample perspective.This is achieved through the utilization of a GAN model combined with specially designed identity losses.

Problem Statement
Let X denote the time series remote sensing input data and Y denote the GT.Our objective is to identify target crops to get target labels Yt in the target domain during their early growth stages, utilizing labelled source domain data (Xs, Ys) and unlabelled target domain data (Xt).This approach addresses challenges related to the cross-domain issue and the label space discrepancies issue.The source and target domains correspond to the study areas detailed in Section 3.1, while the specific challenges are elaborated upon in Sections 3.2 and 3.3.

Cross Domain Issue
The Normalized Difference Vegetation Index (NDVI) calculated as a ratio between the red (Red) and near-infrared (NIR) values by , is usually used to quantify vegetation greenness.Figure 2 displays the average NDVI value curves for the target crops and other crop types throughout their growth stages in different study areas.It is important to recognize that variations in denotes study areas, "T" annual average temperature, "P" average hourly precipitation, "E" average hourly evaporation rate, "R" surface net solar radiation, and "El" elevation.NDVI curves for a specific crop across different regions can be attributed to different environmental conditions, as detailed in Table 1, and varying crop calendars, as illustrated in Figure 3.These factors contribute to distinct phenological characteristics in crops within these regions.Consequently, when a model trained on source domain data is directly applied to the target domain, it frequently underperforms, termed as the cross-domain issue.

Label Space Discrepancies Issue
Current cross-domain methods aim to align the data distribution between the source and target domains to mitigate the domain shift problem, assuming that the crop types between these domains are identical.However, in the context of real-world applications, these domains often display a variety of crop types.For instance, Table 2 illustrates the diverse crop types present in the study areas.The absence of certain crop types in each domain leads to a significant difference between the data distributions, which the current cross-domain methods cannot resolve, known as the label space discrepancies issue.
Crop Types in Study Area A, B, C, and D. "✓" denotes the presence of a crop, while "-" indicates its absence.

Methodology
To address the cross-domain issue and label space discrepancies, the MulticropGAN model is proposed.It transforms the target domain data into the source domain while preserving the characteristics of the target domain data related to crop types absent in the source domain.Figure 4 illustrates our proposed MultiCropGAN model, consisting of a generator, a discriminator, and a classifier.The primary goal of the generator is to transform data from the target domain into the source domain.Meanwhile, the discriminator's objective is to differentiate between the transformed target data and the original source data.The classifier and the discriminator are structurally identical and share weights, with the exception of the output layer.Trained with source data and labels, the classifier's purpose is to categorize crop types in the target domain using the transformed target data produced by the generator.

Data Preprocessing
The preprocessing, shown in Figure 5, aims at providing complete time-series remote sensing data by filling gaps between MSI images due to cloud cover, atmospheric interference, or sensor limitations.There are four dataset used in the preprocessing: • The Sentinel-2 MSI images, widely recognized and employed in numerous agricultural applications within the scientific community (Wang et al., 2021a, Blickensdörfer et al., 2022), are utilized as remote sensing data.Sentinel-2 captures high-resolution MSI images up to 10 meters, with a 5-day revisit time, enabling frequent crop growth monitoring.
• The Dynamic World dataset (Brown et al., 2022) is utilized, providing detailed class probabilities and labels with a 10-meter resolution for nine distinct land categories, notably including cropland.
• The CDL (Boryan et al., 2011), a crop-specific land cover raster map accessible for the entire contiguous U.S. land area at a 30-meter resolution, is provided by the USDA.It serves as the training GT for the USA.
• Similarly, the Canada Agriculture and Agri-Food Canada (AAFC) Annual Crop Inventory (Fisette et al., 2013), with a 30-meter spatial resolution, is employed as the GT for Canada in the test process.
The procedure, shown in Figure, starts by acquiring MSI images with six bands, including B2 (Blue), B3 (Green), B4 (Red), B8 (NIR), B11 (Shortwave Infrared 1), and B12 (Shortwave Infrared 2), encompassing the entire designated study areas.These images originate from the Sentinel-2 Dataset and are captured at regular 10-day intervals as an image sequence.Our primary goal is to identify target crops with label space discrepancies and cross-domain problems during an early growth stage.Consequently, the time series remote sensing data should start after the planting period and end before the onset of crop harvest.As depicted in Figure 3, the earliest target crop harvesting season commences in August.As a result, nine image groups are compiled, spanning from May 1st to July 30th.Simultaneously, to eliminate non-agricultural lands in the MSI images, the process extracts the cropland mask during the crop growing season from the Dynamic World dataset and reprojects it to maintain a consistent 30-meter resolution.After excluding non-agricultural lands using the cropland mask extracted from the Dynamic World dataset, the MSI images in the nine image groups specifically focus on agricultural lands.Within each image group, Sentinel Hub's cloud detector (Skakun et al., 2022) is utilized to implement cloud masks on the Sentinel-2 MSI images.This merging process retains the mean values of the cloud-free MSI images, resulting in a composite MSI image at a clear, 30-meter resolution.Consequently, a set of nine highquality time-series MSI images is generated and characterized by their absence of clouds and gaps.These images encompass six bands, all with a spatial resolution of 30 meters.

Model Structure
In the MultiCropGAN framework, the generator operates as an autoencoder.encoder is composed of three residual blocks, each featuring two one-dimensional convolutional (Conv1D) layers followed by a MaxPooling layer.The Conv1D layers are characterized by 32 filters with a kernel size of 3.Meanwhile, the MaxPooling layers have a pool size of 3 and a stride of 1.The encoder is followed by a flatten layer and a sequence of fully connected layers with dimensions 96, 64, 64, 64, and 96, respectively.The decoder, mirroring the encoder, employs Conv1DTranspose layers, which are specifically designed for transposing a Conv1D layer.This technique reverses the transformation performed by the corresponding Conv1D layers in the encoder, thus aiding in reconstructing the original input features from their encoded state.The discriminator and classifier adopt the same encoder structure as the generator.This is followed by two fully connected layers, featuring dimensions of 32 and 16, respectively.The discriminator's output layer is a fully connected layer with a dimension of 2, designed to distinguish between generated target data and original source data.Conversely, the classifier's output layer is a fully connected layer with a dimension of 4, tasked with identifying target crops (such as corn, soybean, and spring wheat) and other crop types.

4.3
, which aims to drive the generator to produce transformed data within the target domain that closely resembles real data from the source domain.The discriminator's objective is to distinguish the generated data as fake data, while the generator's goal is to craft realistic data to deceive the discriminator.

Identity Loss
The identity losses are defined by (3) , which encourage the generator to preserve the identity of the input data.It computes the difference between the generator output and the input data.The objective of minimizing this loss is to ensure that the generator does not make unnecessary alterations to the data and maintains its essential characteristics.

Class Loss
The class loss, denoted as the multi-class cross-entropy loss, is defined by the equation: This loss function encourages the classifier to extract features that encompass class-related information.These features are shared with the discriminator since the classifier and the discriminator have shared weights.The discriminator utilizes these features to discern whether the data originates from the original source domain or not.

Total Loss
The total loss is defined by , where α, β, σ are the weight parameters.The total loss can be expressed as a minimax function: , where the generator seeks to minimize the loss while the discriminator aims to maximize it.

Experiments
In this section, our proposed MultiCropGAN is compared with other state-of-the-art (SOTA) algorithms, including CropGAN, STDAN, DACCN, TempCNN, and RF, on two experiments of cross-domain time series early crop classification with discrepancies in label spaces.The classification encompasses four labels, comprising three types of target crops (corn, soybean, and spring wheat) and a category for other crop types.

Experimental Setup
Two experiments are set up in this paper: • The first experiment employs study areas A and B as the source domain, while study area C is designated as the target domain.Despite the alignment of target crop types with those in the source domain, variations exist in the types of other crops.
• In the second experiment, study areas A and B function as the source domain, and study area D serves as the target domain, specifically lacking soybean.The variations in the types of other crops are retained.
In the experiments, Our method and SOTA methods are categorized into two groups.MultiCropGAN, CropGAN, STDAN, and DACCN fall under the cross-domain methods category, while TempCNN and RF are classified as direct methods.

Training Setup
The experiment utilizes data sampled from 2019.From the source domain, 5, 000 data points are randomly selected for each of nine existing crop types, as detailed in Table 2, amounting to a total of 45, 000 data points.These are balanced and accompanied by source domain labels.In contrast, for each target domain, 150, 000 data points were randomly sampled, lacking label information and resulting in unbalanced data.For the training and evaluation of models, the source data was distributed as follows: 80% was dedicated to training, and the remaining 20% was set aside for evaluation, specifically for the application of early stopping criteria.Notably, all target data were utilized in the training process of the cross-domain methods and testing process of all methods.All methods are trained solely on the training sets to acquire well-trained models for these two experiments.Every compared method was repeatedly trained on each subset from scratch 5 times with the same training configuration.Our MultiCropGAN was trained with α = 1, β = 20, σ = 1 in the loss function.The RF classifier was trained using specific parameters: tree num = 50 and leaf size = 15.The optimizer employed in all deep learning methods underwent a substitution, being replaced by the Adam optimizer, initialized with a learning rate of 0.0005 and configured with β = (0.9, 0.998).The RF classifier underwent training using the sklearn library, while all deep models were constructed using Tensorflow.Training persisted until the completion of 500 or upon convergence, as determined by an early stopping criterion set at 50 epochs.

Experimental Results
Two metrics, the Overall Accuracy (OA) and Weighted F1 Score, are utilized to assess the performance of the proposed Multi-CropGAN model.Tables 3 and 4 present the OA and F1 score results of our method in comparison to the SOTA.Across two experiments, our method demonstrates the highest average OA and F1 score.
In the first experiment, our approach achieved an average OA of 81.69% and an average F1 score of 81.66%.This marked a notable improvement, with a +9.28% increase in OA and a +9.59% boost in F1 score when compared to the CropGAN model's performance, which had an OA of 72.41% and an F1 score of 72.07%, securing the second position in the rankings.Among the cross-domain methods, DACCN claimed the third position, surpassing STDAN in performance.Within the category of direct methods, TempCNN exhibited slightly better results than STDAN but lagged behind DACCN.Notably, RF demonstrated the weakest performance among all the methods, with an OA of 67.31% and an F1 score of 66.08%.
In the second experiment, our approach achieved an OA of 79.69% and an F1 score of 80.54%.In contrast, the RF model secured the second position, attaining an OA of 77.01% and an F1 score of 77.09%.Noteworthy is the performance of the direct method TempCNN, which secured the third position with an OA of 73.53% and an F1 score of 74.32%, outperforming the other three cross-domain methods.Among these cross-domain methods, CropGAN demonstrated superior performance with an OA of 70.91% and an F1 score of 71.35% compared to STDAN, which achieved an OA of 68.75% and an F1 score of 69.09%.Furthermore, STDAN outperformed DACCN, which attained an OA of 67.63% and an F1 score of 67.94%.
The second experiment presents a more intricate scenario concerning discrepancies within the label space, in contrast the first experiment.This complexity is characterized not only by a diversity of other crop types but also by the conspicuous absence of soybeans among the target crop categories.As depicted in Figure 6(a), it is evident that soybeans (depicted in yellow) are not present in the target domain, although they exist in the source domain.Figures 6 and 7 provide visual representations of the results obtained through both our methodology and SOTA approaches, specifically tailored to address the challenges of the second experiment.In the results, it becomes apparent that CropGAN and TempCNN misclassify parts of corn, spring wheat, or other crop types as soybean pixels.Moreover, STDAN, DACCN, and RF exhibit trends of misclassification of corn and spring wheat as other crops.

Discussion
When there are moderate differences in label spaces across various cross-domain methods, such as those observed in the first experiment, cross-domain deep learning methods exhibit superior performance over TempCNN and RF.Notably, when DACCN incorporates the MMD loss to extract the invariant features more strictly, it demonstrates higher performance than STDAN.Crop-GAN outperforms both STDAN and DACCN, possibly due to its utilization of the identity loss calculated from the source domain data.Particularly noteworthy is the exceptional performance of MultiCropGAN, surpassing all the other methods, thereby establishing itself as the most effective method in this context.
However, in scenarios characterized by substantial disparities among target crop labels, as evident in the second experiment, TempCNN and RF demonstrated superior performance compared to several cross-domain methods (such as CropGAN, STDAN, and DACCN).This could be attributed to the attempts made by these cross-domain methods to align data distributions between the target and source domains, despite the inherent differences caused by the absence of soybeans in the target domain.Despite DACCN's more rigorous attempts to extract invariant features by employing MMD loss compared to STDAN, it exhibited less effective performance.Notably, CropGAN consistently demonstrated superior performance compared to both STDAN and DACCN.As expected, MultiCropGAN maintained its position as the top-performing approach, surpassing even Temp-CNN and RF.One of the key contributing factors to MultiCrop-GAN's success is its use of two identity losses, which ensure that the generator retains crucial data characteristics without unnecessary alterations.

Conclusions
This paper introduces MultiCropGAN, an innovative generative adversarial neural network designed to tackle the issue of do-main shift resulting from diverse regions and variations in label spaces during early crop mapping, leveraging multi-temporal multispectral input data.The MultiCropGAN model comprises three key components: a generator, a discriminator, and a classifier.Additionally, we introduce identity losses for both target and source domain data to ensure that the generator maintains essential data characteristics throughout the transformation process.The classifier is trained using source domain data and their respective labels, sharing model weights with the discriminator to enable the latter to leverage class-related features for distinguishing between generated target domain data and original source domain data.
Finally, our MultiCropGAN is evaluated against several SOTA methods in scenarios marked by variations in crop label spaces.Experiments conducted across the USA and Canada demonstrated that MultiCropGAN outperforms various SOTA methods, including CropGAN, STDAN, DACCN, TempCNN, and RF.The comparative analysis shows MultiCropGAN achieving the highest classification metrics, particularly effective in handling divergent label spaces in target and source domains.The results show that MultiCropGAN notably enhances classification outcomes for the target domain without utilizing any label information specific to the target domain.

Acknowledgements
I would like to express my deepest gratitude to my advisors, Hui HUANG and Radu STATE, for their unwavering support, guidance, and mentorship throughout the duration of this research project.Their expertise, encouragement, and invaluable insights have been instrumental in shaping the course of my academic journey.I would also like to extend my heartfelt appreciation to my family for their endless love, encouragement, and sacrifices.Their unwavering belief in my abilities has been a constant source of motivation, and I am profoundly grateful for their unyielding support.
Additionally, I want to thank all my colleagues, friends, and fellow researchers who provided valuable assistance, feedback, and inspiration during this research endeavor.Your contributions have enriched the quality of this work and have been immensely valuable.

Figure 1 .
Figure 1.The Study Areas in the USA and Canada.The locations of our study areas are denoted by red dots.The study areas are located in two countries: the USA and Canada.As depicted in Figure 1, Study Areas A (Traill County) and B (Cavalier County) are located in North Dakota, the USA, whereas Study Areas C and D are in Manitoba and Alberta, Canada, respectively.The geographic coordinates for Study Area C range from longitudes −97.97 to −96.59 and latitudes 49.22 to 49.59.Study Area D is defined by longitudes −112.96 to −112.53 and latitudes 49.79 to 50.06.

Figure 3 .
Figure 3.The Crop Calendar delineates the planting and harvesting schedules for target crops in the USA (above) and Canada (below).

Figure 4 .
Figure 4.The MultiCropGAN Structure with Training Dataflow.It comprises three essential components: the generator, the discriminator, and the classifier.

Figure 6 .Figure 7 .
Figure 6.Visualization of the Second Experiment Results Employing Cross-Domain Deep Learning Methods.In this visualization, yellow denotes corn, orange signifies soybean, green indicates spring wheat, and white represents other crops.(a) displays the GT for crop types.The crop mapping results are depicted in (b) for MultiCropGAN, (c) for CropGAN, (d) for STDAN, and (e) for DACCN.The corresponding error images are illustrated in panels (f) through (i) for MultiCropGAN, CropGAN, STDAN, and DACCN, respectively.Red highlights the misclassified pixels.
Each sample x can be expressed as a temporal form[x1, x2, ..., xt], where xi represents input at time i. xi can be further expanded as [xi1, xi2, ..., x ib ], containing multispectral bands information from band 1 to band b.Each GT label, denoted as y, is represented as a one-hot vector comprising four elements, which correspond to the categories of corn, soybean, spring wheat, and other crops.Let Xt denote the time series input target data, Xs denote the time series input source data, and Ys denote the GT for Xs.Each sample xs has a corresponding ys.

Table 3 .
Experiment Metrics for the First Experiment: Best metrics are indicated in bold, while the best metrics of the SOTA methods are underlined.

Table 4 .
Experiment Metrics for the Second Experiment: Best metrics are indicated in bold, while the best metrics of the SOTA methods are underlined.