Few-shot SAR vehicle target augmentation based on generative adversarial networks

The study of few-shot SAR image generation is an effective way to expand the SAR dataset, which not only provides diversified data support for SAR target classification, but also provides a high-fidelity false image template for SAR deceptive jamming. In this paper, we have constructed a multi-frequency and multi-target type SAR vehicle imagery dataset that encompasses frequencies such as X, Ka, P, and S bands. The vehicle types are coaster, suv and cabin. Subsequently, we utilized various Generative Adversarial Networks for image generation from the SAR vehicle dataset. The experimental result indicates that the images generated by the DCGAN and the LSGAN models are of superior quality. Furthermore, we employed different recognition networks to evaluate the classification accuracy of the generated images. Of all the frequency bands, the Ka band generated images achieved the highest recognition rate, with an accuracy of up to 99%. Under conditions of a limited number of samples, the LSGAN model performed the best, reaching a classification recognition rate of 71.48% with a dataset of only 20 samples. Finally, we use a conditional network generation model to generate conditions based on target categories and frequency bands, providing high fidelity samples for SAR deception jamming.


Introduction
Compared with optical imaging, synthetic aperture radar has the advantages of all-day, all-weather, cloud-penetrating imaging, and has a wide range of applications in civil and military fields.Synthetic aperture radar automatic target detection technology is a hot issue in the military field of SAR, which plays an immeasurable role in military confrontation, including object detection, classification, and identification.With the development of artificial intelligence, SAR-ATR technology based on deep learning has also developed rapidly, which provides a foundation for intelligent SAR confrontation in the future.However, the problem of deep learning is that it needs a large amount of data to support training, but now there are few SAR data samples, and the SAR data released are mainly based on ship targets on complex coasts in recent years.Therefore, there is a lack of SAR sample data for typical vehicle targets.The advantage of deep learning models is that they can learn real data distributions to extract the target information, but the robustness and generalization performance of the algorithm models are poor.Meanwhile, the training effect is also closely related to the dataset.Therefore, it is necessary to construct a SAR vehicle dataset.There are three methods for SAR image generation: field test acquisition, electromagnetic simulation, and artificial intelligence generation.Field test acquisition requires a lot of manpower and financial resources, and there are uncertainties in the quality of the acquired images and complex data processing and annotation.It is easy to construct a simple target model by using the RCS characteristics of electromagnetic simulation targets, but it is difficult to construct complex targets and can only use common image shapes to realistically represent the actual targets, which has the obstacles of complex model construction and large amount of calculation.With the proposal of deep convolutional neural network, the image generation technology led by GANs have also been rapidly developed in the field of SAR and has a relatively mature application.The applications of GAN in the field of SAR images include SAR image data augmentation, SAR and optical image fusion, SAR super-resolution generation, and military deception jamming.In SAR image processing, we mainly focus on the target and the shadow, we can acquire the shape and height of the object through the shadow, but the presence of coherent spot noise will affect the extraction of target features and will have a certain impact on SAR-ATR processing.This paper mainly addresses how to generate diverse highquality SAR images in the case of few-shot SAR vehicle dataset, and mainly includes the following contents.First, we construct a multi-band SAR slice dataset for civil vehicle targets, which includes four frequency bands such as X, Ku, P, and S. The three types of civil vehicle targets are suv, coaster, and cabin.Second, we used different GANs network models to train the SAR vehicle dataset and generated high-quality false image samples for SAR vehicle target recognition, and provided under different conditions image template for SAR deceptive jamming.Finally, we used different evaluation parameters to evaluate the samples generated by different GANs and gave the best SAR vehicle target image generation model under different task requirements.The topic structure of the paper is as follows: Section 2 presents related work of the SAR image generation.Section 3 discusses the proposed framework, including the theory of GANs models (Section 3.1), the network of used model (Section 3.2), the SAR vehicle dataset used (Section 3.3) and evaluation indicators (section 3.4).Section 4 shows the experiment results and performance analysis.Section 5 concludes the paper.

Related Work
SAR image generation is mainly used in data enhancement, data augmentation, SAR-optical image translation, azimuth interpolation generation, and deceptive jamming.

1) SAR data enhancement
SAR data enhancement refers to label denoising, which improves the SAR image quality by denoising the background clutter and speckle noise.The GAN network used for data enhancement is usually SRGANs (super-resolution GANs).(Ai, Fan et al. 2022) proposed an improved SRGAN (ISRGAN) for target fuzzy suppression of SAR ships, the generator of ISRGAN embedded a dense residual network for fusing global and local information of the image, which effectively improved the integrity of the target feature information of SAR ships.It is proved that ISRGAN can effectively suppress the azimuth ambiguity of SAR ship targets and retain the edge information of the target under the condition of no prior information, which can play a role in marine surveillance.

2) SAR data augmentation
In general, people expand the SAR dataset by rotating, cropping, flipping, and other operations to increase the amount of image.At present, SAR data augmentation can be achieved through classical GANs model training, but in the training process there are phenomena such as training instability and gradient disappearance.To solve this problem, (Qin, Liu et al. 2022) proposed an innovative generative adversarial network architecture based on convolution and deconvolution, which combined the joint recognition method of ResNet18 and Support Vector Machine (Resnet18-SVM) in the discriminator to improve the generalization performance of feature extraction, and used Wasserstein distance and gradient penalty to modify the loss function to make the network training more stable.The experimental dataset uses MSTAR data and evaluates the generation quality and recognition rate of images generated with the classical GAN network, and the image recognition rate of the proposed method is improved to about 95% while the image generation quality is improved.

3) SAR -Optical image translation
SAR and optical image translation combine the advantages of both SAR images and optical images.The optical images containing a lot of spectral information and the SAR images have all-weather characteristics.(Wang, Ma et al. 2022) proposed a Hybrid CGAN network for SAR and optical image translation by coupling local and global information.The innovation of this paper includes two points.Firstly, CNN and VIT network are added to the CGAN model, and the VIT module is used to extract the global information of the image.Secondly, the residual module based on attention is used as the direct transmission module of CNN and VIT module, which can effectively integrate the local and global information of the image.In this paper, SEN1-2, SEN12MS, WHU-SEN-City and other datasets are used for training, the visual effects and evaluation parameters are used to evaluate the quality of the generated images, which proves the effectiveness of the proposed method in SAR and optical image translation.(Hu, Zhang et al. 2023) proposed to convert SAR images into optical images using a Resnet based Pix2Pix model, and the experiment used 8669 pairs of SAR and optical images before and after the fire, of which 7758 pairs were used for training and 1111 pairs were used for testing.The input of the network was SAR image, the output was optical image, and the discriminator needed to discriminate the probability that the output optical image was a real image.The traditional Pix2pix model uses the UNet network, but the authors use the Resnet network instead of the UNet network, which simplifies the complexity of the model while improving the depth of the network, and extracts deeper feature information of the image.Then, the discriminator output of the original GAN network only judges whether the generated image is true or false, while PatchGAN is used in the discriminator in this paper.In this way, the final output of the discriminator is an nn  matrix, and the mean value of the matrix is taken as the true or false output, which can improve the ultra-high resolution of the style domain transformation.The experimental results show that the Pix2Pix model based on Resnet can convert SAR images into optical images well, the SSIM index of image quality is improved from 0.541 to 0.593, and the combustion ratio and relative combustion ratio of the generated images are also highly consistent with those of real optical images.SAR-optical image translation is mainly used in geological hazard detection, oil spill detection and glacier melting and other weather disasters.The style conversion of SAR images and optical images solves the problem that optical imaging is susceptible to weather and other factors, and also makes SAR images have rich spectral information, which has an important impact and development prospects in the civil field.At present, there are two main dilemmas in SAR and optical image style conversion.The first is the problem of data sets, which require paired data sets for image translation.The second is the problem of model robustness and generalization performance.To sum up, SAR and optical image translation still have great prospects for development in the future.(Sun, Wang et al. 2023) proposed SAR image generation based on attribute-guided GAN (AGGAN) to solve the problem of few-shot samples generation.The core idea of the AGGAN network includes two parts.One is to control the generation of images by using conditional labels based on categories and angles.The other is to add the idea of transfer learning to the network to improve the diversity of generated images.The experimental data are trained by BTR70, T72 and BMP2, and the results show that when the sample size of each type of data is equal to 5, the generation effect can be better and the image recognition rate is increased by at least 4%.However, in this paper, only the azimuth angle of the SAR vehicle image is used for interpolation, there are many SAR target feature information that are not generated under attribute guidance.(Wang, Pei et al. 2022) proposed an azimuth-based GAN network image generation, the input of the generator is two parallel SAR images with different interval angles, including 5°, 10°, 15° and 20°.The angle interval interpolation is used to control the generation of SAR vehicle images based on azimuth.The discriminator also includes the identification of the generated false images and predicted azimuths, and the generated images are more realistic and controllable through the gradient backpropagation update generator.The authors have done many interpolation and ablation experiments using the MSTAR dataset, and the experiments have also generated highfidelity SAR images with controllable azimuth.However, the input samples of the generator have been expanded, so how to generate high-fidelity false SAR images with controllable azimuth under the condition of few-shot samples remains to be studied.(Fan, Zhou et al. 2020) proposed a SAR deception jamming template generation based on cGAN network, and proposed a multi-level target feature extraction mechanism based on deconvolution neural network.The paper uses CGAN network to generate high-fidelity deception jamming image templates under different azimuth angles, pitch angles, target types and image resolutions.In this paper, MSAR dataset was used to generate SAR high-fidelity deceptive jamming templates with different azimuths.Meantime, the TOPSAR and SENTINEL-1A datasets were used to generate high-fidelity deceptive jamming templates in different scenarios, which provides a data basis for SAR deception jamming and can effectively improve the survivability of wartime equipment.However, the background noise of the generated image is poor, the quality of the generated image needs to be improved and the accuracy of the target azimuth angle of the generated image is lacking.

Experiment materials and methods
GAN consists of generator and discriminator.The input of the generator is a dimensional noise vector, and the image is generated by learning the hidden feature distribution of real data.The input of the discriminator includes the real image and the generated false image, and the probability that the output is true or false.The generator and discriminator oppose each other and promote each other, and finally reach the Nash equilibrium, generating high-quality false image samples.( ) A conditional generative adversarial network (CGAN) is a constraint to add some additional information to the generator, where the "additional information" can be a class of labels, a set of labels, or even a written description.The CGAN network loss function is as follows in Eq. ( 3). (3) Different from the CGAN network, ACGAN adds the label information of the category in the middle of the discriminator, so that the discriminator can not only distinguish the true and false but also the class.Therefore, the discriminator can better transfer the loss function so that the generator can accurately find the noise distribution corresponding to the class, the specific structure is shown in the Figure 2. In the ACGAN network, the original task is to generate false images, and the secondary task is image classification.
In the Info GAN network, the concept of information entropy is introduced, and the input noise vector is constrained to generate better images by increasing the mutual information between the hidden noise information and the generated data.The authors decompose the input noise into two parts.One is the original noise, and the other is the hidden variable, which represents the different dimensions of the generated data, and the distribution of the generated data is generated by constraints.

Network
In the experiment, the basic framework of all networks is consistent.The generative network architecture adopts a 5-layer deconvolution network, the convolution kernel size is 44  , the step size is 2, the padding layer is 1, finally use convolutional layer instead of a fully connected layer to reduce the complexity of the model.The discriminator network is the transpose structure of the generator, using Adam's optimizer, and the loss function is MSE and WGAN-GP.
The input is a 100-dim noise vector, the generator learning rate is 14 e − , the discriminator learning rate is 44 e − , beta1 is 0.5, and beta2 is 0.999.

SAR vehicle dataset
The SAR vehicle image generation dataset is obtained by the Xinzhou or Jiangzhuang airborne SAR, and the targets are fixed ground vehicle targets, including three types of targets: Coaster, SUV, and Cabin.The airborne SAR parameter information is shown in Table 1.The acquired dataset was imaged with strips.
The SAR image was a scene image of the size, and the target position was detected and sliced by YOLO.The original SAR image is shown in the Figure 5

Evaluation indicators
In this paper, the generated image is evaluated from subjective vision and objective index.The objective evaluation indicators include parameters such as mean, variance, grayscale histogram, peak signal-to-noise ratio.The similarity between the generated image and the real image is judged by structural similarity and FID.The diversity of the generated image is judged by image category recognition rate.
(1) SSIM (Structural Similarity) The structural similarity index mainly evaluates the clarity of the image, and evaluates the real image and the generated image from three directions: brightness, contrast and structure.Let lumin be the brightness, con be the contrast, and str be the structure, then the expression for SSIM is as follow.Where  ,  ,  is a parameter of the weight, which equals to 1.
Generally taken c as 0.01, d as 0.03, L as 255.
(2) FID FID is used to measure the distance between two multivariate normal distributions.The lower value of FID, the better quality of the generated image.The FID uses the 2048-dimensional vector before the Inception Net-V3 is fully connected as the feature vector of the image, and then calculates the distance between the features of the two images.
( ) ( ) Where r  is the mean of the features of the real image, g  is the mean of the features of the generated image, r  is the covariance of the real image, g  is the covariance matrix that generates the image. (

4) Classification Accuracy
Using networks such as Alexnet, VGG19, ResNet34 and ConvNext network, to test the generated images to correctly predict classification recognition rates for each class.

SAR vehicle dataset augmentation
This section is mainly divided into two parts.The first part discusses the performance indicators of classic network models such as DCGAN, WGAN-GP, and LSGAN in SAR vehicle image generation.The second part discusses the benchmark indicators for SAR vehicle image generation in few-shot scenarios, providing data and model algorithm requirements for SAR image generation based on different tasks for subsequent practical engineering applications.

SAR vehicle dataset augmentation
This section uses DCGAN, WGAN-GP, and LSGAN to expand SAR data samples.Each frequency band contains three target types, with a sample size of 32 images for each class.The network structure is roughly the same as Figure 3    Visually, when the sample size is 5, the target contour begins to appear clearly after 700 epochs, while when the sample size is 15 and 20, the target contour already appears after 70 epochs.
When the epoch is 300, it can be seen from Figure 10 that when the sample size is 10, the DCGAN network generates less background noise in the image, and the target contour is also relatively clear, but the target detail feature information is missing.When the sample size is 20, the generated image detail features are relatively clear.According to the training results, when the sample size is 5, the generated image classification recognition rate is around 45%.When the sample size is 10, the average classification recognition rate of the generated image can reach about 55%.Therefore, in terms of few-shot SAR image augmentation, we should ensure that each class of samples contains at least 5-10 images in order to generate better quality images.Of course, when the sample size is small, we can improve the quality of generated images by increasing the number of training iterations.
To sum up, the LSGAN performs the best in the few-shot SAR image augmentation.

SAR vehicle image generation based on conditional categories
This section uses CGAN, ACGAN, and InfoGAN to generate SAR vehicle samples under specific conditions.The main structures of the generator and discriminator networks used are shown in Figures 3 and 4  SAR vehicle images generated based on frequency band conditions can interpolate to generate SAR vehicle images between different frequency bands, providing high fidelity deception templates for SAR deception interference and improving wartime survival capabilities.From Figure 14, it can be seen that although CGAN can generate SAR images of different frequency bands, the generated image quality is poor and the target contour is blurred.ACGAN and InfoGAN can generate better SAR vehicle images, but InfoGAN generates Sband images that are more blur.Overall, the optimal network for generating SAR vehicle images based on specific frequency bands is ACGAN model.

Conclusion
Figure 1. the structure of a generative adversarial network.
DCGAN, WGAN-GP, LSGAN were used to expand the number of SAR images.The CGAN, ACGAN, and InfoGAN were used to interpolate conditional labels such as categories and frequency bands.DCGAN introduces deep convolutional neural networks while using batch normalization to help stabilize the training process by normalizing the inputs of each layer of the application network.The improvement of the DCGAN model architecture largely solves the problem of instability in the training process of the GAN model, so the GAN can generate images with high definition.WGAN-GP adds a gradient penalty on the basis of WGAN, and the gradient value is constrained within a uniform distribution [ 1,1] − while satisfying the Lipschitz constraint.The loss function is as follows in Eq. (2).WGAN-GP improves the training stability, and alleviates the problem of gradient vanishing or explosion, but does not fundamentally solve the problem of GAN network training instability.
, and the target slice image is shown in the Figure 6.Since the background noise contained in the target slice image is too complex and will affect the model training, the background of the slice image is filled in black, which can effectively stabilize the training process and improve the quality of the generated image.SAR parameters The experimental data include three vehicle target types, and each type of data volume is 32 slice images, and the image size is 128 128  .The algorithm implementation platform is Ubuntu 20.04 operating system, the GPU model is NVIDIA RTX 8000, and the CPU model is Xeon 8280. Figure 5. SAR vehicle scene images of different frequency bands.(a) C band.(b) Ka band.(c) P band.(d) S band.Comparison of C-band SAR vehicle slicing before and after processing.(a) Cabin.(b) Coaster.(c) SUV.
Visually, when the epoch reaches 100, the generated image begins to have contours.When the epoch reaches 200, the background noise decreases, and the contours and target background gradually become clear.When the epoch reaches 300, the contours and detail features of the target are quite clear.When the epoch reaches 1000, the quality and detail feature information of the generated image is already very well.Objectively speaking, DCGAN training is relatively stable and produces better image quality results, but compared to LSGAN, the mean and variance of generated images are larger.From the perspective of generated image classification recognition rate, the Ka frequency band classification recognition accuracy can reach up to 99%, while the P frequency band classification recognition accuracy is the worst.This may be related to image quality, as the Ka frequency band has the highest resolution and the synthesized SAR image quality is relatively better.4.1.2Few-shot SAR vehicle dataset augmentationThis section discusses and evaluates the quality of generated images when each sample size is 5, 10, 15the generate results when samples =15 Epoch=70 Epoch=200 Figure12.the generate results when samples =20(from the left to the right：DCGAN、WGAN-GP、LSGAN). .

4. 2 . 1 Figure 13 .Figure 14 .
Figure 13.SAR image generation based on different category conditions.from the left to the right are cabin, coaster and suv.(a) CGAN.(b) ACGAN.(c) InfoGAN.From the perspective of image generation accuracy, three networks can accurately generate images based on category conditions.However, from the perspective of image generation performance, ACGAN and InfoGAN have obvious target feature information in the generated images, ACGAN has slightly poorer background quality, and InfoGAN has better overall image quality.4.2.2 Image generation based on frequency categoriesWhen performing frequency band based conditional generation while keeping the target category unchanged, discussing interpolation generation under different frequency bands.The target category _4 n class = , includes frequency bands such as C, Ka, P, S. The sample size for each frequency band is 32 images.The experimental results are as follows.

For
ground vehicle targets, this paper collects multi-band and multi-type ground vehicle target SAR datasets through practical measured data, and constructs them into a dedicated SAR image generation dataset through data labelling, slicing, background noise processing, etc.According to the GANs training, we increased the SAR images and enriched the SAR vehicle dataset.The experimental results show that LSGAN network is more suitable for SAR image augmentation, under conditions of a limited number of samples, the LSGAN model performed the best, reaching a classification recognition rate of 71.48% with a dataset of only 20 samples.Then, the conditional generation network both ACGAN and InfoGAN can generate SAR images under specific conditions well, but InfoGAN has interpretability.In a conclusion, this paper combines measured data acquisition and intelligent image generation, which is a new way to construct and expand target SAR datasets, which can provide efficient and fast high-fidelity false image templates for battlefield SAR deceptive jamming.

Table 7 .
C-band image classification accuracy.

Table 9 .
P-band image classification accuracy.

Table 10 .
S-band image classification accuracy.