Multi-Objective G3P: Association Rule Mining

From KDIS Research Group - Wiki

Jump to: navigation, search

Contents

[edit] Abstract

This paper presents two evolutionary multi-objective approaches for the extraction of association rules based on grammar guided genetic programming (G3P) models. G3P allows of enforcing syntactic constraints to genetic programming by building individuals conforming to a context-free grammar. In such a situation, each mined rule is a derivation tree that represents a solution using the language defined by the grammar. The proposals presented in this paper combine the advantages of grammar-based genetic programming models with the benefits of multi-objective approaches. Both approaches follow the philosophy of two well-known multi-objective algorithms, frequently used in the literature: the non dominated sort genetic algorithm (NSGA-2) and the strength Pareto evolutionary algorithm (SPEA-2).

We compare both multi-objective algorithms to the original G3P proposal for mining association rules and perform an analysis of the mined rules. The results obtained show that multi-objective proposals obtain very frequent (with support values above 95% in most cases) and reliable (with confidence values close to 100%) rules when attaining the optimal tradeoff between support and confidence. Furthermore, for the tradeoff between support and lift, the multi-objective proposals obtain very interesting and representative rules as well.

[edit] Paper versions

[edit] Draft

J.M. Luna, J.R. Romero and S. Ventura. Grammar-Based Multi-Objective Algorithms for Mining Association Rules. Data & Knowledge Engineering. 2013

[edit] Paper published online

J.M. Luna, J.R. Romero and S. Ventura. Grammar-Based Multi-Objective Algorithms for Mining Association Rules. Data & Knowledge Engineering. 2013 Accepted

[edit] Additional material

[edit] Experimental Study

Since evolutionary proposals have a number of parameters that should be fixed prior to their execution, a series of previous experiments were performed on the G3PARM algorithm in order to obtain the best found combination of the parameters, i.e., those determined when checking a reasonable number of parameter combinations and that enable to get the best results. To do this, different parameter values were combined and tested (e.g., population size, number of generations, crossover probability, etc.) using different datasets. It is worth mentioning that no single combination of parameter values performed better for all datasets, as was to be expected. Also, since both multi-objective algorithms are based on G3PARM, and in order to make a fair comparison, the same parameters set-up is used for the three algorithms.

The best results were obtained using a population size of 50 individuals, 100 generations, 90% crossover probability, 16% mutation probability, and a maximum derivation size of 24. For the G3PARM algorithm, the external population size is 20 and the thresholds of support and confidence are set to 70% and 90%, respectively. For the SPEA-G3PARM, the Cartesian distance is calculated with the fifth nearest neighbour and the maximum Pareto front size is 20 to perform a fair comparison against G3PARM. The results shown in this experimental section correspond with the average values calculated after running each algorithm 30 times with different seeds. Ten datasets with different numbers of instances and attributes were used. All the experiments used in this study were performed on a 12Gb main memory Intel Core i7 machine, running CentOS 5.4. The algorithms were written in Java using JCLEC, a Java library for evolutionary computation.


Experimental results are available for download in Excel format (300 KB) download

[edit] Datasets

Dataset # Instances # Attributes Type of Attributes
Automobile Performance 392 8 Numerical
Credit German 1000 21 Categorical and Numerical
House 16H 22784 17 Numerical
Minorities at Risk 852 22 Categorical and Numerical
Minorities at Risk Organization Behaviour 1789 50 Categorical and Numerical
Mushroom 8124 23 Categorical
Soybean 683 36 Categorical
Wisconsin Breast Cancer 683 11 Categorical and Numerical
Wisconsin Diagnostic Breast Cancer 569 31 Categorical and Numerical
Wisconsin Prognostic Breast Cancer 194 34 Categorical and Numerical

[edit] Software

Runnable NSGAG3PARM jar file

NSGAG3PARM Configuration file

How to execute the NSGA-G3PARM algorithm

java -jar NSGAG3PARM.jar NSGAG3PARM.cfg

Runnable SPEAG3PARM jar file

SPEAG3PARM Configuration file

How to execute the SPEA-G3PARM algorithm

java -jar SPEAG3PARM.jar SPEAG3PARM.cfg

The configuration file must be modified with the desired dataset.

[edit] WEKA plugin

The G3PARM algorithm can also be runned within the WEKA software tool. Download the SPEAG3PARM package or the NSGAG3PARM package and import using the WEKA package manager (requires WEKA 3.7.5).


[edit] Results

[edit] Critical differences obtained with the Bonferroni-Dunn test for different measures when using support and confidence as objectives to be optimized

  • Support measure

  • Confidence measure

  • Coverage measure

[edit] Critical differences obtained with the Bonferroni-Dunn test for different measures when using support and lift as objectives to be optimized

  • Support measure

  • Lift measure

  • Confidence measure

  • Coverage measure

Personal tools