ISPRS Annals of the Photogrammetry, Remote Sensing and Spatial Information Sciences
Download
Publications Copernicus
Download
Citation
Articles | Volume IV-1/W1
https://doi.org/10.5194/isprs-annals-IV-1-W1-229-2017
https://doi.org/10.5194/isprs-annals-IV-1-W1-229-2017
30 May 2017
 | 30 May 2017

BOOSTED UNSUPERVISED MULTI-SOURCE SELECTION FOR DOMAIN ADAPTATION

K. Vogt, A. Paul, J. Ostermann, F. Rottensteiner, and C. Heipke

Keywords: Transfer Learning, Domain Adaptation, Negative Transfer, Source Selection, Machine Learning, Remote Sensing

Abstract. Supervised machine learning needs high quality, densely sampled and labelled training data. Transfer learning (TL) techniques have been devised to reduce this dependency by adapting classifiers trained on different, but related, (source) training data to new (target) data sets. A problem in TL is how to quantify the relatedness of a source quickly and robustly, because transferring knowledge from unrelated data can degrade the performance of a classifier. In this paper, we propose a method that can select a nearly optimal source from a large number of candidate sources. This operation depends only on the marginal probability distributions of the data, thus allowing the use of the often abundant unlabelled data. We extend this method to multi-source selection by optimizing a weighted combination of sources. The source weights are computed using a very fast boosting-like optimization scheme. The run-time complexity of our method scales linearly in regard to the number of candidate sources and the size of the training set and is thus applicable to very large data sets. We also propose a modification of an existing TL algorithm to handle multiple weighted training sets. Our method is evaluated on five survey regions. The experiments show that our source selection method is effective in discriminating between related and unrelated sources, almost always generating results within 3% in overall accuracy of a classifier based on fully labelled training data. We also show that using the selected source as training data for a TL method will additionally result in a performance improvement.