ISPRS Annals of the Photogrammetry, Remote Sensing and Spatial Information Sciences
Publications Copernicus
Articles | Volume IV-3/W2-2020
29 Oct 2020
 | 29 Oct 2020


D. A. B. Oliveira

Keywords: Land Cover Segmentation, Image Synthesis, Latent Data Representation, Gaussian Mixture Models

Abstract. The use of convolutional neural networks improved greatly data synthesis in the last years and have been widely used for data augmentation in scenarios where very imbalanced data is observed, such as land cover segmentation. Balancing the proportion of classes for training segmentation models can be very challenging considering that samples where all classes are reasonably represented might constitute a small portion of a training set, and techniques for augmenting this small amount of data such as rotation, scaling and translation might be not sufficient for efficient training. In this context, this paper proposes a methodology to perform data augmentation from few samples to improve the performance of CNN-based land cover semantic segmentation. First, we estimate the latent data representation of selected training samples by means of a mixture of Gaussians, using an encoder-decoder CNN. Then, we change the latent embedding used to generate the mixture parameters, at random and in training time, to generate new mixture models slightly different from the original. Finally, we compute the displacement maps between the original and the modified mixture models, and use them to elastically deform the original images, creating new realistic samples out of the original ones. Our disentangled approach allows the spatial modification of displacement maps to preserve objects where deformation is undesired, like buildings and cars, where geometry is highly discriminant. With this simple pipeline, we managed to augment samples in training time, and improve the overall performance of two basal semantic segmentation CNN architectures for land cover semantic segmentation.