From KDIS Research Group - Wiki

Jump to: navigation, search


[edit] Abstract

This paper proposes a multi-objective ant programming algorithm for classification rule mining called MOGAPC, which focuses on optimizing sensitivity, specificity and comprehensibility objectives. It defines a context-free grammar that restricts the search space and ensures the creation of valid individuals. Its most important features are also discussed, such as the heuristic function with two complementary components, or the novel multi-objective approach that follows, specifically suited for classification. First, a comparative analysis of MOGAP-C using two and three objectives is performed. Then, its performance is experimentally evaluated throughout 15 varied benchmark data sets and compared to those obtained using other 7 relevant rule extraction algorithms. Results prove that MOGAP-C outperforms the other algorithms regarding the predictive accuracy, also achieving a good trade-off between accuracy and comprehensibility.

[edit] Tests

The simulation was performed using fifteen public-domain UCI standard classification problems. This data set selection comprises problems with a broad range of dimensionality. The particular characteristics of these data sets can be observed in Table 1.

[edit] Results

In order to study the performance of MOGAP-C, its effectiveness was analyzed when considering two and three objectives. The 2-objectives MOGAP-C looks for the maximization of both sensitivity and specificity, whereas the 3-objective version, furthermore, optimizes comprehensibility.

Table 3 captures several results obtained by each version of our algorithm. In both cases, the predictive accuracy results obtained in training are better than those obtained in test. Average sensitivity and specificity values are greater in the 2-objective version, which is an expected result since they are the only objectives that this version focuses on optimizing. The 3-objective MOGAP-C obtains better average values in predictive accuracy (for test), average number of rules in the classifier and average number of conditions per rule.

However, this analysis is not very meaningful, since it is based on the average of absolute values of several measures across data sets that have different characteristics. Therefore, we performed the Wilcoxon signed rank test with the continuity correction over the test accuracy results of both MOGAP-C versions, although the results obtained indicated that at a significance level of alpha = 0.05, there are no significant differences between them. We also analyzed the statistical significance with respect to the number of conditions per rule applying the Wilcoxon pair-test. The p-value obtained was equal to 0.001363, which is lower than 0.01. This proves that the 3-objective version, which also considers the comprehensibility metric, is significantly more comprehensible than the 2-objective one with a 99% probability.

Thus, the adopted version of MOGAP-C is the one which aims to simultaneously optimize the three objectives of sensitivity, specificity and comprehensibility.

The performance of MOGAP-C is compared to several rule-based algorithms regarding the predictive accuracy criterion. Table 4 reports on the average predictive accuracy values with standard deviation obtained for each algorithm over each data set. The bold results indicate the algorithm that yields the maximum average classification rate in each data set.

In order to evaluate the differences between the algorithms statistically, the Friedman test is performed. This is a non-parametric test that compares the mean ranks of k algorithms over N data sets. These ranks indicate which algorithm obtains the best results considering all data sets. To compute them, a rank of 1 is assigned to the algorithm with the highest accuracy in one data set, the algorithm with the next highest accuracy is given a rank of 2, and so on. Finally, the average ranks of each algorithm for all data sets are calculated; they are shown in the last row of Table 4.

According to the Friedman test, if the null-hypothesis is accepted, we state that all the algorithms are equivalent. In contrast, if the null-hypothesis is rejected, we state that there are differences between the algorithms. Considering a significance level of alpha=0.05, the Friedman statistic of average rankings distributed according to the F-distributionwith k - 1 and (k - 1) (N - 1) degrees of freedom is 7.0, which does not belong to the critical interval, which is equal to C0 = [0,2.1044]. Thus, the Friedman test rejects the null-hypothesis that all algorithms perform equally well when alpha=0.05.

To reveal whether MOGAP-C—which corresponds to the control algorithm—is significantly better than the other classifiers, a post-hoc test is performed. The Bonferroni-Dunn test can be applied since all algorithms are compared with respect to a control one, focusing on all possible pairwise comparisons that involve the MOGAP-C algorithm. The critical value found by this test at the same significance level of alpha=0.05 is 2.4060 and, therefore, the performance of MOGAP-C is statistically better regarding accuracy than those of PSO/ACO2, Ant-Miner+, Ant-Miner and GP. These results are captured in Figure 6, where it is also possible to see that MOGAP-C achieves competitive or even better accuracy results than GBAP, PART and JRIP.

As mentioned previously, all algorithms included in the comparison are rule-based algorithms. Hence, the complexity of the rule set obtained by each classifier and the complexity of the rules can be compared appropriately. The average number of rules and the average number of conditions per rule of all algorithms over the fifteen data sets are presented in Table 5.

It should be pointed out that apart from the GP algorithm, which also makes use of the OR operator, all algorithms mine rules as a conjunction of conditions. In the latter case, the number of rules in the classifier can be computed directly by counting the number of rules. However, in the case of the GP algorithm, to compute correctly the number of rules in the classifier it is necessary to consider each disjunction as the connection of two different rules, without considering OR nodes as conditions.

Table 6 summarizes the mean results of all algorithms over the fifteen data sets with respect to predictive accuracy, the number of rules in the classifier, and the number of conditions per rule. However, these results are not significant, and the average rankings of the algorithms across data sets was computed again. The last but one row of Table 5 represents the average ranking obtained by each algorithm with regard to the rule set length, and the last row specifies the average rankings regarding the number of conditions per rule. In both cases the algorithm with the lowest ranking value corresponds to GP-Bojarczuk.

A first statistical analysis is performed considering the average number of rules in the classifier (the lower it is, the more comprehensible the classifier). Assuming a significance level of alpha=0.05, the value of the Friedman statistic according to the F-distribution is equal to 22.3511, and the critical interval is C0 = [0,2.1044]. Hence, the Friedman test rejects the null-hypothesis, which means that there are significance differences among the algorithms considering this measure. At the same significance level, the difference between GP, JRIP and Ant-Miner+ ranks, and the rank of MOGAP-C is greater than the Bonferroni-Dunn critical value, 2.4060. Therefore, these algorithms are significantly better than MOGAP-C regarding rule set complexity.

Nevertheless, there are no differences between MOGAP-C and GBAP, Ant-Miner, PSO/ACO2 and PART. Ideally, the best result regarding the classifier’s rule set complexity would be to mine a single rule for predicting each class in the data set. However, this could be detrimental to the accuracy obtained by the algorithm. This statement is captured in the results of the GP algorithm, which extracts nearly one rule per class in the classifier but, conversely, has the worse predictive accuracy results. In contrast, MOGAPC has a good trade-off, obtaining the best accuracy results and at the same time behaving competitively when the rule set length is considered.

A second statistical analysis is performed applying the Friedman test to the results obtained for the average number of conditions per rule. The F-distribution’s statistical value is 17.6205, which does not belong to the critical interval, thus existing significant differences. The post-hoc Bonferroni- Dunn test leads to two conclusions. On the one hand, MOGAP-C is significantly better than Ant-Miner+ regarding this metric. On the other hand, MOGAP-C is not significantly better than GBAP, Ant-Miner, PSO/ACO2, GP, JRIP and PART, nor significantly worse than these algorithms, which is more important.

It should be mentioned that the length of the rules mined by MOGAP-C directly depends from the maxDerivations parameter. While more derivations are allowed for the grammar, the algorithm is capable of finding more complex relationships, normally entailing the extraction of rules which have more conditions.

After performing both tests, it is possible to conclude that MOGAP-C obtains competitive results in terms of comprehensibility, regarding both the rule set length and the number of conditions per rule. Actually, it presents a good accuracy-comprehensibility trade-off, since it is the algorithm that presents the best ranking values in predictive accuracy, also obtaining competitive comprehensibility results.

Personal tools