INVESTIGATING 2D AND 3D CONVOLUTIONS FOR MULTITEMPORAL LAND COVER CLASSIFICATION USING REMOTE SENSING IMAGES
Keywords: land cover classification, remote sensing, FCN, multi-temporal images, 3D-CNN
Abstract. With the availability of large amounts of satellite image time series (SITS), the identification of different materials of the Earth’s surface is possible with a high temporal resolution. One of the basic tasks is the pixel-wise classification of land cover, i.e. the task of identifying the physical material of the Earth’s surface in an image. Fully convolutional neural networks (FCN) are successfully used for this task. In this paper, we investigate different FCN variants, using different methods for the computation of spatial, spectral, and temporal features. We investigate the impact of 3D convolutions in the spatial-temporal as well as in the spatial-spectral dimensions in comparison to 2D convolutions in the spatial dimensions only. Additionally, we introduce a new method to generate multitemporal input patches by using time intervals instead of fixed acquisition dates. We then choose the image that is closest in time to the middle of the corresponding time interval, which makes our approach more flexible with respect to the requirements for the acquisition of new data. Using these multi-temporal input patches, generated from Sentinel-2 images, we improve the classification of land cover by 4% in the mean F1-score and 1.3% in the overall accuracy compared to a classification using mono-temporal input patches. Furthermore, the usage of 3D convolutions instead of 2D convolutions improves the classification performance by a small amount of 0.4% in the mean F1-score and 1.2% in the overall accuracy.