EFFICIENT TRAINING DATA GENERATION BY CLUSTERING-BASED CLASSIFICATION
Keywords: Hierarchical Clustering, Classification, Training Data, CNN, GUI, Labeling
Abstract. Insufficient amount or complete absence of reference data for the training of classifiers is a general topic. Especially the state-of-the-art deep learning approaches have to deal with the availability or adaption of this reference data to produce the reliable results they are designed for. This paper will pursue different approaches according to the absence of training data for land cover classification from aerial images. First, we will analyze the performance of traditional classification in the absence of reference data using clustering techniques and salient features for the assignment of semantic labels. Second, we will transfer the results as training data to a DeepLabv3+ CNN with pre-trained weights to demonstrate the usability of the generated training data. Third, we expand the clustering approaches and combine them with a Random Forest classifier. Finally, if user interaction and manual annotation of training data are still necessary, we also introduce our labeling GUI that enables a simple, fast, and comfortable training data generation with only a few clicks. To evaluate our procedure, we used two datasets, including the Vaihingen benchmark, for which ground truth is available. Without any interactive steps except setting a few algorithm paremeters, we achieved an overall accuracy of 75% using the Deeplab method with image data only.