DISTRIBUTED MULTI-LABEL LEARNING ON APACHE SPARK.

BASIC INFORMATION

Ph.D. Student: Jorge Gonzalez Lopez
Advisors: Alberto Cano, Sebastián Ventura
Defended on: April 2019
Keywords: multi-label learning, distributed systems, spark
Digital version

DESCRIPTION

This thesis proposes a series of multi-label learning algorithms for classification and feature selection implemented on the Apache Spark distributed computing model.

Five approaches for determining the optimal architecture to speed up the multi-label learning methods are presented. These approaches range from local parallelization using threads to distributed computing using independent or shared memory spaces. It is shown that the optimal approach performs hundreds of times faster than the baseline method.

Three distributed multi-label k nearest neighbors methods built on top of the Spark architecture are proposed: an exact iterative method that computes pair-wise distances, an approximate tree-based method that indexes the instances across multiple nodes, and an
approximate local sensitive hashing method that builds multiple hash tables to index the data. The results indicated that the predictions of the tree-based method are on par with those of an exact method while reducing the execution times in all the scenarios.

The aforementioned method is then used to evaluate the quality of a selected feature subset. The optimal adaptation for a multi-label feature selection criterion is discussed and two distributed feature selection methods for multi-label problems are proposed: a method that selects the feature subset that maximizes the Euclidean norm of the individual information measures, and a method selects the subset of features that maximize the geometrical mean. The results indicate that each method excels in different scenarios depending on type of features and the number of labels.

Rigorous experimental studies and statistical analyses over many multi-label metrics and datasets confirm that the proposals achieve better performances and provide better scalability to bigger data than the methods compared in the state of the art.

PUBLICATIONS ASSOCIATED WITH THIS THESIS

INTERNATIONAL JOURNALS
  1. J. Gonzalez-Lopez, S. Ventura and A. Cano, “Distributed nearest neighbor classification for large-scale multi-label data on Spark”, Future Generation Computer Systems, vol. 87, pp. 66-82, 2018.
  2. J. Gonzalez-Lopez, S. Ventura and A. Cano, “Distributed selection of continuous features in multi-label classification using mutual information”, IEEE Transactions on Neural Networks and Learning Systems, under review, 2019.
  3. J. Gonzalez-Lopez, S. Ventura and A. Cano, “Distributed multi-label feature selection using individual mutual information measures”, IEEE Transactions on Knowledge and Data Engineering, under review, 2019.
INTERNATIONAL CONFERENCES
  1. J. Gonzalez-Lopez, A. Cano and S. Ventura, “Large-Scale Multi-label Ensemble Learning on Spark”, IEEE Trustcom/BigDataSE/ICESS Sydney, pp. 893-900, 2017.
  2. J. Gonzalez-Lopez, S. Ventura and A. Cano, “ARFF data source library for distributed single/multiple instance, single/multiple output learning on Apache Spark”, International Conference on Computational Science, 2019.

 

At the ages of fifty four fifty can advise you that I'onal ended up lucky not to have wanted the product sooner, nevertheless loosing your partner of 25yrs 2010 became a curve which modified me personally for a long time. Now there came out a place exactly where click to read fifty had visit to have my tastes fulfilled only to find out this plumbing related desired just a little poke. So I questioned my own Computer system doc intended for a little something with tiny facet is affecting. He or she provided the particular recommended you read Cialis regular 5mg. 1st working day fifty had 5mg without any help to discover more help in the event that t discovered virtually any difference considering the woman never was planning determine, which'azines our system and also l'meters being dedicated to the idea. Regardless, these materials Operates, along with is useful. And click here then up coming night time with your ex m took 10mg at 8pm, and the rest is heritage. Through 13:double zero fifty manufactured my own move, but it appeared to be the most element to live in place until finally the sunlight came up upward , 100%Pleased :)When i't thirty-two as well as gone pretty much 1,5 years without intercourse. I had created pop over to this website plenty of anxiousness related penile erection challenges. And hop over to here then We found this specific great which woman My spouse and i started courting, along with first 2 times us all sex didn'to determine which properly, and i also appeared to be worried about just what exactly your woman considered this matter. We obtained braveness to visit and request cialis approved from the health practitioner. When i had taken 10mg product and it labored perfectly. I could truthfully continue on having sex many times a day without the problems. Merely bad thing is a smallish frustration. For me this can be truly a wonder drug.