ISPRS Annals of the Photogrammetry, Remote Sensing and Spatial Information Sciences
Download
Share
Publications Copernicus
Download
Citation
Share
Articles | Volume X-3/W4-2025
https://doi.org/10.5194/isprs-annals-X-3-W4-2025-21-2026
https://doi.org/10.5194/isprs-annals-X-3-W4-2025-21-2026
13 Mar 2026
 | 13 Mar 2026

Impact of Training Set Size on Representation Learning for Hyperspectral Image Classification

Victor Andres Ayma Quirita, Aramis Palacios, Victor Hugo Ayma Quirita, Walter Aliaga, and Gilson A. O. P. Costa

Keywords: Dimensionality Reduction, Hyperspectral Image Classification, Autoencoders, Orthogonal Autoencoders, Representation Learning, Limited Training Data

Abstract. Nowadays, the ever-increasing amount of information provided by hyperspectral sensors requires efficient solutions for facilitating subsequent data analysis. Dimensionality reduction plays a central role in this context, as it allows the extraction of meaningful and compact representations from high-dimensional hyperspectral data. Existing methodologies address data representation problems through dimensionality reduction techniques, predominantly employing Principal Component Analysis (PCA), Autoencoders (AE), and more recently, Hyperspectral Orthogonal Autoencoders (HOAE). However, these approaches commonly rely on the entire image to build projection models, which may result in high computational costs. A pragmatic attempt to mitigate such computational challenge is to use a subset of the image data to construct accurate data representation models. In this work, we investigate the extent to which using a reduced number of training samples affects the quality of the latent space generated by AE and HOAE models, and how this impacts classification performance. Experiments conducted on the Pavia University hyperspectral dataset demonstrate that the representation efficacy of the AE and HOAE models significantly exceeds that of traditional hyperspectral dimensionality reduction algorithms, such as PCA. We also show that competitive classification results can be obtained even when the representation models are trained with a small portion of the image, which opens the door to more computationally efficient pipelines.

Share