From KDIS Research Group - Wiki

Jump to: navigation, search

J.L. Olmo, J.R. Romero and S. Ventura. Using Ant Programming Guided by Grammar for Building Rule-Based Classifiers IEEE Transactions on Systems, Man & Cybernetics - Part B: Cybernetics'. 2011


[edit] Abstract

The extraction of comprehensible knowledge is one of the major challenges in many domains. In this paper an ant programming (AP) framework, capable of mining classification rules easily comprehensible by humans and, therefore, capable of supporting expert-domain decisions, is presented. The algorithm proposed, called GBAP (Grammar Based Ant Programming), is the first AP algorithm developed for the extraction of classification rules, and it is guided by a context-free grammar that ensures the creation of new valid individuals. To compute the transition probability of each available movement, this new model introduces the use of two complementary heuristic functions, instead of just one, as typical ant-based algorithms do. The selection of a consequent for each rule mined and the selection of the rules that make up the classifier is based on the use of a niching approach. The performance of GBAP is compared against other classification techniques on 18 varied data sets. Experimental results show that our approach produces comprehensible rules and competitive or better accuracy values than those achieved by the other classification algorithms

[edit] Tests

The performance of GBAP was tested on 18 publicly available data sets, both artificial and real-world, selected from the machine learning repository of the University of California at Irvine (UCI). We have selected problems with a wide range of dimensionality with respect to the number of classes and attributes.

[edit] Results

The performance and the understandability of the model proposed is compared to other classification algorithms. The aim of this section is to analyze statistically and interpret the experimental results obtained. Recall that in DM there is no classification algorithm that performs better than all others for every data set, as stated by the no free lunch theorem.

A. Predictive accuracy analysis

A first evaluation criterion for the comparison is the predictive accuracy. Table IV shows average values for predictive accuracy with standard deviation. The best classification accuracies for each data set are highlighted in bold typeface. Analyzing the table, it is possible to realize that GBAP is competitive with respect to all the other algorithms considered, and also that it obtains the best results on 50% of the data sets used in the experimentation. In those data sets where GBAP does not reach the best results, its classification results are quite competitive. With regard to the standard deviation values, we can also observe that GBAP globally yields middling values in terms of stability.

Though GBAP obtains the best average accuracy values, we performed the Friedman test with the aim of comparing the results obtained and analyzing if there are significant differences between the classifiers. The Friedman test compares the average rankings of k algorithms over N datasets. Average rankings of all the algorithms considered are summarized at the bottom of Table IV. Looking at these ranking values, it can be noticed that the lowest ranking value, i.e., the best global position, is obtained by our proposal. The computed value for the Friedman statistic of average rankings distributed according to the F-distribution with k-1 and (k-1)(N-1) degrees of freedom is 8.7404, which is greater than the tabled critical value at the alpha = 0.1 significance level, C0 = [0; (FF)0.1,6,102 = 1.8327]. Thus, we reject the null-hypothesis that all algorithms perform equally well when alpha = 0.1.

Because of the rejection of the null-hypothesis by the Friedman test, we proceed with a post-hoc test to reveal the performance differences. Since all classifiers are compared with respect to a control classifier, we can apply the Bonferroni–Dunn test focusing on all possible pairwise comparisons involving the GBAP algorithm. At the same significance level (alpha = 0.1) the Bonferroni–Dunn critical value is 1.7239, which means that in order to be significant, the difference between any pair of means must be at least 1.7239 units. Thus, the performance of GBAP is statistically better than those of the PSO/ACO2, Ant-Miner+, Ant-Miner and Bojarczuk-GP algorithms, because the difference between their mean rank value and the mean rank of GBAP is greater than the mentioned critical value. These results are captured in Figure 3, where one can also see that GBAP achieves competitive or even better accuracy results than PART and JRIP.

Note that both at a significance level of alpha = 0.05 and alpha = 0.01, the Friedman test also rejects the null-hypothesis. In the first case, the Bonferroni–Dunn critical value is 1.8996, so that GBAP is significantly more accurate than Ant-Miner+, Ant-Miner and GP. At the alpha = 0.01 significance level, the Bonferroni–Dunn critical value is equal to 2.2639 and, therefore, GBAP is significantly more accurate than Ant-Miner and GP. In both cases, GBAP is the control algorithm and its results are quite competitive or better than the results obtained by the other algorithms.

In order to contrast the results obtained after the application of the Bonferroni–Dunn’s procedure, we can use the Holm test, which is more powerful than the first one and makes no additional assumptions about the hypotheses tested. The advantage of the Bonferroni–Dunn test lies in the fact that it is easier to describe and visualize because it uses the same critical difference for all comparisons. In turn, the Holm test is a step-down post-hoc procedure that tests the hypotheses ordered by significance, comparing each pi with alpha=(k/i) from the most significant p value. Table VI shows all the possible hypotheses of comparison between the control algorithm and the others, ordered by their p value and associated with their level of significance alpha. To contrast the results obtained by the Bonferroni–Dunn method, we applied the Holm test, which rejects those hypotheses that have a p value less or equal to 0.025. Thus, at a significance level of alpha = 0.05, according to the Holm test and regarding to the predictive accuracy results, GBAP is statistically better than PSO/ACO2, Ant- Miner+, Ant-Miner and Bojarczuk-GP algorithms.

B. Comprehensibility analysis

A second evaluation criterion is the comprehensibility of the knowledge acquired. In contrast to predictive accuracy, comprehensibility is a subjective concept, and it is frequently associated to the syntactical simplicity of the classifier. Thus, the smaller the number of rules and the number of conditions appearing in them, the smaller the complexity of the classifier.

The following table summarizes both the classifier’s rule set complexity, by the average number of rules found per data set, and the complexity of the rules, by the average number of conditions per rule. The last but one row of the table shows the average ranking value of each algorithm using the Friedman test with respect to the number of rules in the classifier, and the last row does the same for the number of conditions per rule. In both cases the control algorithm found is GP, as it has the lowest ranking value.

Before analyzing the results obtained, it is important to mention that all algorithms except GP extract rules in the same form, as a conjunction of conditions. However, GP employs the OR operator, and due to the tree-based enconding of individuals in GP, to compute fairly the number of rules and the number of conditions per rule, for each OR operator it is necessary to split the rule into two separate rules, without considering OR nodes as conditions.

The first statistical analysis is carried out considering the average number of rules in the output classifier. At a significance level of alpha = 0.05 the application of the Friedman test rejects the null-hypothesis, because the value of the statistic, 23.4734, does not belong to the critical interval C0 = [0; (FF)0.05,6,102 = 2.1888]. To show the significant differences we applied the post-hoc Bonferroni–Dunn test. The Bonferroni–Dunn’s critical value is 1.8995 when alpha=0.05, which means that GP, JRIP and Ant-Miner+ are statistically better than GBAP. In turn, GBAP does not perform significantly worse than Ant-Miner, PSO/ACO2 and PART.

Regarding the number of rules in the output classifier, the best possible result would be to mine one rule per class, but this may not lead to good results when the distribution of instances per class is not located in a definite space region. This can be observed in the behavior of the GP algorithm, because it nearly extracts one rule per class and, therefore, it obtains the best results in this respect. Notice that in this algorithm, although OR nodes are not considered to be conditions but a way of joining two different rules predicting the same class, the algorithm tends to minimize this kind of operator, as it decreases substantially the simplicity component of the fitness function and, therefore, decreases the quality of the rules mined. In addition, this number of rules may not be enough for obtaining accurate results in many data sets, as it can be deduced looking at the accuracy results obtained by the GP algorithm. In contrast, the niching algorithm of GBAP ensures the selection of the number of rules that are necessary to cover the examples of each class, also achieving very good classification results.

The second statistical analysis involved the average number of conditions per rule measured. To check whether the algorithms present differences, we applied the Friedman test at the same significance level considered in the previous study, alpha = 0.05. The F-distribution’s statistic value is 22.0539, which neither belongs to the critical interval C0 = [0; (FF)0.05,6,102 = 2.1888]. Therefore, there are significant differences between the algorithms. The subsequent application of the Bonferroni– Dunn test revealed that GBAP performs significantly better than Ant-Miner+ and PSO/ACO2 in this aspect. Another conclusion of this test is that GBAP is not significantly better than GP, Ant-Miner, JRIP and PART, neither significantly worse than these algorithms, which is more important.

Regarding this measure, it should be pointed out that the use of a grammar in GBAP has a benefit because we can restrict the complexity of each rule by the number of derivations allowed for such grammar. Thus, we can arrange a tradeoff between rule complexity and performance, reaching a compromise (longer rules may report better rules as they can discover more complex relationships between attributes). As seen in the table that sums up the rule complexity results, the GBAP algorithm is the third-best algorithm in obtaining a small number of conditions per rule, only beaten by GP and Ant-Miner. The reason why the GP algorithm obtains the lowest values of conditions per rule may lie in the fact that this algorithm considers a simplicity component in the fitness function, and so the algorithm tries to minimize this factor. GBAP also takes into account the complexity of the rules in the reinforcement.

The trade-off between comprehensibility and accuracy is perfectly illustrated in the results obtained by the GP algorithm, as it is the most comprehensible algorithm; however it obtains the poorest accurate results. Despite, we can conclude by saying that the GBAP algorithm presents a good comprehensibility-accuracy trade-off, since it is the algorithm that presents the best ranking in accuracy, though it does not give rise to bad comprehensibility results, reaching quite competitive results in this sense, as shown before.

Personal tools