NEW CHALLENGES ON ASSOCIATIVE CLASSIFICATION: BIG DATA AND APPLICATIONS

BASIC INFORMATION

Ph.D. Student: Francisco Solano Padillo Ruz
Advisors: Jose Maria Luna, Sebastián Ventura
Defended on: July 2020
Keywords: pattern mining, associative classification, big data, map reduce
Digital version: PDF

DESCRIPTION

The increasing innovation in technology over the last decades has provoked an exponential growth on both the quantity of data being generated, and its complexity. The discovery of high level information and knowledge from these complex large quantities of data has become significantly ambitious and challenging. Additionally, as technology advances and hardware is improved, more and more data are being able to be stored, thus, the quantity of data to deal with have increased like never before. Big Data is the term more and more used to comprise a subset of these techniques focused on facing up the problems derived from the management and analysis from very huge quantities of data.

Aiming at extracting hidden, interesting and previously unknown information from large quantities of data, many different techniques have been proposed along the years. Nevertheless, all of them could be categorized in two main groups: descriptive tasks, which depict intrinsic and important properties of data; and predictive tasks, which predict an output variable for unseen data. Classification based on association rule mining, generally known as Associative Classification (AC), integrates a descriptive task in the process of generating a classifier. Several researches have proved that AC algorithms are able to obtain accurate and interpretable results in an efficient way thanks to leveraging association rule discovery methods in the training phase. This enables to obtain all the possible hidden relationships among the attribute values which possibly may be missed by other lesser exhaustive methodologies. Furthermore, AC also enables to update and tune a subset of rules without having to redraw the whole tree as happens in decision tree approaches. Last but not least, the main advantage of AC with regard to other techniques is the final model representation, which is formed by simple and easy to interpretate rules that enables end-user to understand and interpret the results.

This Doctoral Thesis aims at solving the challenging problem of AC and its application on very large datasets. The main contributions of this Ph.D. thesis are summarized in the following points::

  1. AC state-of-art has been studied and analyzed, and a new tool covering the whole taxonomy of algorithms as well as providing many different measures has been proposed. The goal of this tool is two-fold: 1) unification of comparisons, since existing works compare with very different measures; 2) providing a unique tool which has at least one algorithm of each category forming the taxonomy.
  2. AC has been analyzed on very large quantities of data. In this regard, many different platforms for distributed computing have been studied and different proposals have been developed on them. These proposals enable to deal with very large data in a efficient way scaling up the load on very different compute nodes.
  3. As one of the most important part of the AC is to extract high quality rules, it has been proposed a novel grammar-guided genetic programming algorithm which enables to obtain interesting association rules with regard to different metrics and in different kinds of data, including truly Big Data datasets. This proposal has proved to obtain very good results in terms of both quality and interpretability, at the same time of providing a very flexible way of representing the solutions and enabling to introduce subjective knowledge in the search process. Then, a novel algorithm has been proposed for AC using a non-trivial adaptation of the aforementioned algorithm to obtain the rules forming the classifier. This methodology is also based on grammar-guided genetic programming enabling user not only to constrain the form of the rules, but the final form of the classifier. Results have proved that this algorithm obtains very accurate classifiers at the same time of maintaining a good level of interpretability.

FUNDS

The development of this thesis has been supported by:

  • Spanish Ministry of Science and Competitiveness, project TIN-2014-55252-P.
  • Spanish Ministry of Science and Competitiveness, project TIN-2017-83445-P.

PUBLICATIONS ASSOCIATED WITH THIS THESIS

INTERNATIONAL JOURNALS
  1. F. Padillo, J.M. Luna and S. Ventura. LAC: Library for associative classification. Knowledge-Based Systems, 193: 105432 (2020). DOI: 10.1016/j.knosys.2019.105432.
  2. F. Padillo, J.M. Luna and S. Ventura. A Grammar-Guided Genetic Programing Algorithm for Associative Classification in Big Data. Cognitive Computation, 11(3): 331-346 (2019). DOI: 10.1007/s12559-018-9617-2.
  3. F. Padillo, J.M. Luna, F. Herrera and S. Ventura. Mining association rules on Big Data through MapReduce genetic programming. Integrated Computer-Aided Engineering, 25(1): 31-48 (2018). DOI: 10.3233/ICA-170555.
INTERNATIONAL CONFERENCES
  1. F. Padillo, J.M. Luna and S. Ventura. Associative Classification in Big Data through a G3P Approach. In 4th International Conference on Internet of Things, Big Data and Security (IoTBDS), 94-102 (2019). DOI: 10.5220/0007688400940102.
  2. F. Padillo, J.M. Luna and S. Ventura. An evolutionary algorithm for mining rare association rules: A Big Data approach. In 2017 IEEE Congress on Evolutionary Computation (CEC), 2007-2014 (2017). DOI: 10.1109/CEC.2017.7969547.

 

At the ages of fifty four fifty can advise you that I'onal ended up lucky not to have wanted the product sooner, nevertheless loosing your partner of 25yrs 2010 became a curve which modified me personally for a long time. Now there came out a place exactly where click to read fifty had visit to have my tastes fulfilled only to find out this plumbing related desired just a little poke. So I questioned my own Computer system doc intended for a little something with tiny facet is affecting. He or she provided the particular recommended you read Cialis regular 5mg. 1st working day fifty had 5mg without any help to discover more help in the event that t discovered virtually any difference considering the woman never was planning determine, which'azines our system and also l'meters being dedicated to the idea. Regardless, these materials Operates, along with is useful. And click here then up coming night time with your ex m took 10mg at 8pm, and the rest is heritage. Through 13:double zero fifty manufactured my own move, but it appeared to be the most element to live in place until finally the sunlight came up upward , 100%Pleased :)When i't thirty-two as well as gone pretty much 1,5 years without intercourse. I had created pop over to this website plenty of anxiousness related penile erection challenges. And hop over to here then We found this specific great which woman My spouse and i started courting, along with first 2 times us all sex didn'to determine which properly, and i also appeared to be worried about just what exactly your woman considered this matter. We obtained braveness to visit and request cialis approved from the health practitioner. When i had taken 10mg product and it labored perfectly. I could truthfully continue on having sex many times a day without the problems. Merely bad thing is a smallish frustration. For me this can be truly a wonder drug.