Fast SVM-based Multiclass Classification in Large Training Sets
Keywords: Multiclass classification, nonlinear SVM, large-scale problems, smart sampling, handwritten digit images
Abstract. This paper addresses the actual problem of multiclass classification in large training sets. Classical Support Vector Machines (SVM) is a popular, convenient and well-interpreted classification method, but it has a high computational complexity of a training stage in a nonlinear case and a low data parallelism. The aim of this paper is to improve the scalability of nonlinear multiclass SVM.
In the basis of this paper is Kernel-based Mean Decision Rule method with smart sampling (SS-KMDR) we previously proposed for fast solving large-scale binary SVM problems. In this paper we, at first, extend SS-KMDR for the multiclass classification problem. At second, we propose the modified algorithm of smart sample construction that allows to improve its characteristics and also extend it to possess the possibility to solve large-scale multiclass SVM problems. Experimental investigation of proposed methods was made on three large handwritten digit images data sets of different size and one large intrusion detection data set. Experiments show that both proposed multiclass methods allow to reach the quality near state-of-the-art SVC quality, but they essentially outperform it in the training time. The proposed Dual-Layer Smart Sampling SVM (DLSS-SVM method) allows additionally reduce training and test times in contrast to the basic smart sampling technique.