Ph.D. Student: Oscar Gabriel Reyes
Advisor: Sebastián Ventura
Defended on: November 2016
Keywords: multi-label learning, feature weighting, feature selection, lazy learning, active learning, evolutionary algorithms
In the last decade, multi-label learning has become an important area of research due to the large number of real-world problems that contain multi-label data. This doctoral thesis is focused on the multi-label learning paradigm. Two problems were studied, firstly, improving the performance of the algorithms on complex multi-label data, and secondly, improving the performance through unlabeled data.
The first problem was solved by means of feature estimation methods. Five feature estimation methods were proposed. Two methods are based on the application of evolutionary algorithms to estimate an adequate weight vector. The three other methods proposed are extensions of the well-known ReliefF algorithm. The effectiveness of the feature estimation methods proposed was evaluated by improving the performance of multi-label lazy algorithms. The parametrization of the distance functions with a weight vector allowed to recover examples with relevant label sets for classification. It was also demonstrated the effectiveness of the feature estimation methods in the feature selection task. On the other hand, a lazy algorithm based on a data gravitation model was proposed. This lazy algorithm has a good trade-off between effectiveness and efficiency in the resolution of the multi-label lazy learning.
The second problem was solved by means of active learning techniques. The active learning methods allowed to reduce the costs of the data labeling process and training an accurate model. Two active learning strategies were proposed. The first strategy effectively solves the multi-label active learning problem. In this strategy, two measures that represent the utility of an unlabeled example were defined and combined. On the other hand, the second active learning strategy proposed resolves the batch-mode active learning problem, where the aim is to select a batch of unlabeled examples that are informative and the information redundancy is minimal. The batch-mode active learning was formulated as a multi-objective problem, where three measures were optimized. The multi-objective problem was solved through an evolutionary algorithm.
This thesis also derived in the creation of a computational framework to develop any active learning method and to favor the experimentation process in the active learning area. On the other hand, a methodology based on non-parametric tests that allows a more adequate evaluation of active learning performance was proposed.
All methods proposed were evaluated by means of extensive and adequate experimental studies. Several multi-label datasets from different domains were used, and the methods were compared to the most significant state-of-the-art algorithms. The results were validated using non-parametric statistical tests. The evidence showed the effectiveness of the methods proposed.
The development of this thesis was supported by:
- Spanish Ministry of Science and Technology, project TIN2011-22408.
PUBLICATIONS ASSOCIATED WITH THIS THESIS
- O. Reyes, C. Morell and S. Ventura. Evolutionary feature weighting to improve the performance of multi-label lazy algorithms. Integrated Computer-Aided Engineering, vol. 21(4), pp. 339-354. 2014.
- O. Reyes, C. Morell and S. Ventura. Scalable extensions of the ReliefF algorithm for weighting and selecting features on the multi-label learning context. Neurocomputing, vol. 161(4), pp. 168-182. 2015.
- O. Reyes, C. Morell and S. Ventura. Effective lazy learning algorithm based on a data gravitation model for multi-label learning. Information Sciences, vol. 340-341, pp. 159-174. 2016.
- O. Reyes, E. Pérez, M. C. Rodríguez Hernández, H. M. Fardoun and S. Ventura. JCLAL: A Java Framework for Active Learning. Journal of Machine Learning Research, vol. 17(95), pp. 1-5. 2016.
- O. Reyes, C. Morell and S. Ventura. Effective active learning strategy for multi-label learning. Neurocomputing, submitted October, 2015.
- O. Reyes and S. Ventura. Evolutionary Strategy to perform Batch-Mode Active Learning on Multi-label Data. ACM Transactions on Intelligent Systems and Technology, submitted September, 2016.
- O. Reyes, A.H. Altahi and S. Ventura. Statistical Comparisons of Active Learning Strategies over Multiple Datasets.Information Sciences, submitted September, 2016.
- O. Reyes, C. Morell and S. Ventura. Learning Similarity Metric to improve the performance of Lazy Multi-label Ranking Algorithms. Proceedings of the 12th International Conference on Intelligent Systems Design and Applications (ISDA’2012), pp. 246-251. 2012.
- O. Reyes, C. Morell and S. Ventura. ReliefF-ML: an extension of ReliefF algorithm to multi-label learning. Progress in Pattern Recognition, Image Analysis, Computer Vision, and Applications., LNCS, Springer, vol. 8259, pp. 528-535, 2013
- O. Reyes, C. Morell and S. Ventura. Feature weighting on multi-label data through quadratic loss minimization. Congreso Internacional de Matemática y Computación, COMPUMAT-2013, Habana, Cuba, 2013.