Thesis GMelki – Knowledge Discovery and Intelligent Systems – KDIS

NOVEL SUPPORT VECTOR MACHINES FOR DIVERSE LEARNING PARADIGMS.

BASIC INFORMATION

Ph.D. Student: Gabriella Melki
Advisors: Alberto Cano, Sebastián Ventura
Defended on: September 2018
Keywords: support vector machines, machine learning, online learning
Digital version

DESCRIPTION

Three multi-target support vector regression (SVR) models are first presented. The first involves building independent, single-target SVR models for each target. The second builds an ensemble of randomly chained models using the first single-target method as a base model. The third calculates the targets’ correlations and forms a maximum correlation chain, which is used to build a single chained SVR model, improving the model’s prediction performance, while reducing computational complexity.

Under the multi-instance paradigm, a novel SVM multiple-instance formulation and an algorithm with a bag-representative selector, named Multi-Instance Representative SVM (MIRSVM), are presented. The contribution trains the SVM based on bag-level information and is able to identify instances that highly impact classification, i.e. bag-representatives, for both positive and negative bags, while finding the optimal class separation hyperplane. Unlike other multi-instance SVM methods, this approach eliminates possible class imbalance issues by allowing both positive and negative bags to have at most one representative, which constitute as the most contributing instances to the model.

Due to the shortcomings of current popular SVM solvers, especially in the context of large-scale learning, the third contribution presents a novel stochastic, i.e. online, learning algorithm for solving the L1-SVM problem in the primal domain, dubbed OnLine Learning Algorithm using Worst-Violators (OLLAWV). This algorithm, unlike other stochastic methods, provides a novel stopping criteria and eliminates the need for using a regularization term. It instead uses early stopping. Because of these characteristics, OLLAWV was proven to efficiently produce sparse models, while maintaining a competitive accuracy.

OLLAWV’s online nature and success for traditional classification inspired its implementation, as well as its predecessor named OnLine Learning Algorithm – List 2 (OLLA-L2), under the batch data stream classification setting. Unlike other existing methods, these two algorithms were chosen because their properties are a natural remedy for the time and memory constraints that arise from the data stream problem. OLLA-L2’s low spacial complexity deals with memory constraints imposed by the data stream setting, and OLLAWV’s fast run time, early self-stopping capability, as well as the ability to produce sparse models, agrees with both memory and time constraints. The preliminary results for OLLAWV showed a superior performance to its predecessor and was chosen to be used in the final set of experiments against current popular data stream methods.

Rigorous experimental studies and statistical analyses over various metrics and datasets were conducted in order to comprehensively compare the proposed solutions against modern, widely-used methods from all paradigms. The experimental studies and analyses confirm that the proposals achieve better performances and more scalable solutions than the methods compared, making them competitive in their respected fields.

PUBLICATIONS ASSOCIATED WITH THIS THESIS

INTERNATIONAL JOURNALS

G. Melki, A. Cano, V. Kecman and S. Ventura, “Multi-Target Support Vector Regression Via Correlation Regressor Chains”, Information Sciences, vol. 415, pp. 53-69. 2017. DOI: 10.1016/j.ins.2017.06.017.
G. Melki, A. Cano, S. Ventura, “MIRSVM: Multi-Instance Support Vector Machine with Bag Representatives”, Pattern Recognition, vol. 79, pp. 228-241. 2018. DOI: 10.1016/j.patcog.2018.02.007.
G. Melki, V. Kecman, S. Ventura and A. Cano, “OLLAWV: OnLine Learning using Worst-Violators”, Applied Soft Computing, vol. 66, pp. 384-393. 2018. DOI: 10.1016/j.asoc.2018.02.040.

INTERNATIONAL CONFERENCES

V. Kecman, L. Zigic and G. Melki, “Models and Algorithms for Support Vector Machines: Direct L2 SVM”, Seminar at Max Planck Institute for Intelligent Systems, Empirical Inference, Tübigen, Germany, 2015. DOI: 10.1109/INISTA.2014.6873654.
G. Melki and V. Kecman, “Speeding Up Online Training of L1 Support Vector Machines”, Proceedings of the IEEE SoutheastCon 2016, pp. 1-6. 2016. DOI: 10.1109/SECON.2016.7506732.
V. Kecman and G. Melki, “Fast Online Algorithm for SVMs”, Proceedings of the IEEE SoutheastCon 2016, pp. 1-6. 2016. DOI: 10.1109/SECON.2016.7506733.