NEW PROBLEMS IN KNOWLEDGE DISCOVERY: A GENETIC PROGRAMMING APPROACH

Status:
Finished
PI:
Sebastián Ventura
Reference:
TIN2011-22408
Members:

José Luis Ávila
Alberto Cano
Eva L. Gibaja
José María Luna
Juan Luis Olmo
Mykola Pechenizkiy
Cristóbal Romero
José Raúl Romero
Amelia Zafra
Duration:
January 2012 – December 2014
Budget:
63,526 €
Project:

SUMMARY

The Project iNsPIrED (New Problems In knowlEdge Discovery) has the main objective of developing new knowledge discovery methodologies using genetic programming (GP) and other evolutionary computation (EC) approaches, as well as their application in several real-world problems.

This main objective can be split into the following secondary objectives:

  • Development of GP models for solving different problems in knowledge discovery: multiple instance learning, relational learning, multi-label classification and mining association rules.
  • Adaptation of the models developed to deal with new challenges associated to high dimensionality problems, large datasets and imbalanced data.
  • Application of the developed models to real problems in the context of educational data mining (new representation models for predicting student’s performance in learning management system, modelling students drop-out, categorization of learning objects) and web mining (intrusion detection and web categorization problems).
  • Development of data repositories enabling the scientific community to compare our findings with other existing proposals, and integration of the models developed in the KEEL and WEKA software platforms, in order to facilitate their promotion.

The aim of designing methods of knowledge extraction is to obtain useful models, not only in terms of high performance (understood as accuracy in classification problems), but also in terms of other relevant characteristics such as robustness, versatility, interpretation, ease of updates, and coherence with previous knowledge. Furthermore, such systems should be able to model and manage all the information within their reach, including incomplete and imprecise information, imbalanced data in classification problems, etc.

Our departure hypothesis is that the use of EC techniques, and particularly those based on GP, should allow us to design knowledge extraction models that fulfil the aforementioned features. GP allows a much more flexible individual representation than other EC paradigms, allowing representing decision trees, rule bases or complex mathematical expression directly. It can also express restrictions in the representation space, increasing the efficiency of the evolutionary process, and give more control over the size of the resulting expressions, improving the interpretability of results.

Finally, the development of methods based on the use of massive parallel computers opens possibilities for facing large scale data mining problems (high dimensional problems and large scale datasets), by significantly speeding up the processing of data mining algorithms. Although significant progress has been made in the last few years, there is still much work to be done in this line. Some of them will be approached in the four objectives proposed in the current document. We firmly believe that the team can achieve these objectives, given the experience and results obtained previously.

RESEARCH RESULTS

Software

Dataset repositories

Journal articles

  • S. Ventura, C. Romero, A. Abraham. Foreword: Intelligent data analysis. Journal of Computer and System Sciences, 80(1), pp 1-2, 2014.
  • B. Strack, J.P. Deshazo, C. Gennings, J.L. Olmo, S. Ventura, K.J. Cios, J.N. Clore. Impact of HbA1c measurement on hospital readmission rates: Analysis of 70,000 clinical database patient records. BioMed Research International, 2014, 2014.
  • C. Romero, J.R. Romero, S. Ventura. A survey on pre-processing educational data. Studies in Computational Intelligence, 524, pp 29-64, 2014.
  • O. Reyes, C. Morell, S. Ventura. Evolutionary feature weighting to improve the performance of multi-label lazy algorithms. Integrated Computer-Aided Engineering, 21(4), pp 339-354, 2014.
  • J.L. Olmo, J.R. Romero, S. Ventura. Swarm-based metaheuristics in automatic programming: A survey. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery, 4(6), pp 445-469, 2014.
  • J. M. Luna, J. R. Romero, C. Romero, S. Ventura. On the Use of Genetic Programming for Mining Comprehensible Rules in Subgroup Discovery. IEEE Transactions on Cybernetics, 44(12), pp 2329-2341, 2014.
  • J. M. Luna, J. R. Romero, S. Ventura. On the adaptability of G3PARM to the extraction of rare association rules. Knowledge and Information Systems, 38(2), pp 391-418, 2014.
  • J. M. Luna, J.R. Romero, C. Romero, S. Ventura. Reducing gaps in quantitative association rules: a genetic programming free-parameter algorithm. Integrated Computer Aided Engineering, 21(4), pp 321-337, 2014.
  • E. Gibaja, S. Ventura. Multi-label learning: A review of the state of the art and ongoing research. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery, 4(6), pp 411-444, 2014.
  • E. Corchado, A. Abraham, P.A. Gutiérrez, J. M. Benítez, S. Ventura. Special issue: Advances in learning schemes for function approximation, Neurocomputing, 135(1-2), 2014.
  • A. Cano, A. Zafra, S. Ventura. Parallel evaluation of Pittsburgh rule-based classifiers on GPUs. Neurocomputing, 126, pp 45-57, 2014.
  • A. Cano, S. Ventura, K.J. Cios. Scalable CAIM discretization on multiple GPUs using concurrent kernels. Journal of Supercomputing, 69(1), pp 273-292, 2014.
  • A. Cano, E. Yeguas-Bolivar, R. Muñoz-Salinas, R. Medina-Carnicer, S. Ventura. Parallelization strategies for markerless human motion capture. Journal of Real-Time Image Processing, volume In press, 2014.
  • A. Zapata, V.H. Menéndez, M.E. Prieto, C. Romero. A framework for recommendation in learning object repositories: An example of application in civil engineering. Advances in Engineering Software, 56, pp 1-14, 2013.
  • A. Zafra, C. Romero, S. Ventura. DRAL: A tool for discovering relevant e-activities for learners. Knowledge and Information Systems, 36(1), pp 211-250, 2013.
  • A. Zafra, M. Pechenizkiy, S. Ventura. HyDR-MI: A hybrid algorithm to reduce dimensionality in multiple instance learning. Information Sciences, 222, pp 282-301, 2013.
  • C. Romero, S. Ventura. Data mining in education. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery, 3(1), pp 12-27, 2013.
  • C. Romero, A. Zafra, J. M. Luna, S. Ventura. Association rule mining using genetic programming to provide feedback to instructors from multiple-choice quiz data. Expert Systems, 30(2), pp 162-172, 2013.
  • J.R. Romero, J.I. Jaén, A. Vallecillo. A tool for the model-based specification of open distributed systems. Computer Journal, 56(7), pp 793-818, 2013.
  • C. Romero, P.G. Espejo, A. Zafra, J.R. Romero, S. Ventura. Web usage mining for predicting final marks of students that use Moodle courses. Computer Applications in Engineering Education, 21(1), pp 135-146, 2013.
  • C. Romero, M. I. López, J. M. Luna, S. Ventura. Predicting students’ final performance from participation in on-line discussion forums. Computers and Education, 68, pp 458-472, 2013.
  • J. L. Olmo, J. M. Luna, J. R. Romero, S. Ventura. Mining association rules with single and multi-objective grammar guided ant programming. Integrated Computer-Aided Engineering, 20(3), pp 217-234, 2013.
  • C. Márquez-Vera, C. Romero, S. Ventura. Predicting school failure and dropout by using data mining techniques. Revista Iberoamericana de Tecnologías del Aprendizaje, 8(1), pp 7-14, 2013.
  • C. Márquez-Vera, A. Cano, C. Romero, S. Ventura. Predicting student failure at school using genetic programming and different data mining approaches with high dimensional and imbalanced data. Applied Intelligence, 38(3), pp 315-330, 2013.
  • J. M. Luna, J. R. Romero, S. Ventura. Grammar-based multi-objective algorithms for mining association rules. Data and Knowledge Engineering, 86, pp 19-37, 2013.
  • H. Kilov, P.F. Linington, J.R. Romero, A. Tanaka, A. Vallecillo. The reference model of open distributed processing: Foundations, experience and applications. Computer Standards and Interfaces, 35(3), pp 247-256, 2013.
  • A. Cano, J. M. Luna, S. Ventura. High performance evaluation of evolutionary-mined association rules on GPUs. The Journal of Supercomputing, Springer US, 66(3), pp 1438-1461, 2013.
  • A. Cano, J.L. Olmo, S. Ventura. Parallel multi-objective Ant Programming for classification using GPUs. Journal of Parallel and Distributed Computing, 73(6), pp 713-728, 2013.
  • A. Cano, A. Zafra, S. Ventura. An interpretable classification rule mining algorithm. Information Sciences, 240, pp 1-20, 2013.
  • A. Cano, A. Zafra, S. Ventura. Weighted data gravitation classification for standard and imbalanced data. IEEE Transactions on Cybernetics, 43(6), pp 1672-1687, 2013.
  • A. Zafra, S. Ventura. Multi-objective approach based on grammar-guided genetic programming for solving multiple instance problems. Soft Computing, 16(6), pp 955-977, 2012.
  • A. Zafra, M. Pechenizkiy, S. Ventura. ReliefF-MI: An extension of ReliefF to multiple instance learning. Neurocomputing, 75, pp 210-218, 2012.
  • A. Zafra, S. Ventura. Multi-instance genetic programming for predicting student performance in web based educational environments. Applied Soft Computing Journal, 12(8), pp 2693-2706, 2012.
  • J.L. Olmo, J.R. Romero, S. Ventura. Classification rule mining using ant programming guided by grammar with multiple Pareto fronts. Soft Computing, 16(12), pp 2143-2163, 2012.
  • J. M. Luna, J. R. Romero, S. Ventura. Design and behavior study of a grammar-guided genetic programming algorithm for mining association rules. Knowledge and Information Systems, 32(1), pp. 53-76, 2012.
  • A. Cano, A. Zafra, S. Ventura. Speeding up the evaluation phase of GP classification algorithms on GPUs. Soft Computing, 16(2), pp 187-202, 2012.

 

International conferences

  • J. Fuentes-Alventosa, C. Romero, C. García-Martínez, S. Ventura. Accepting or Rejecting Students’ Self-grading in Their Final Marks by using Data Mining. International Conference on Educational Data Mining (EDM’14), pp 327-328, 2014.
  • A. Ramírez, J.R. Romero, S. Ventura. On the Performance of Multiple Objective Evolutionary Algorithms for Software Architecture Discovery. Proceedings of the Conference on Genetic and Evolutionary Computation, GECCO’14, pp 1287-1294, 2014.
  • J. Pedraza, C. García-Martínez, A. Cano, S. Ventura. Proceedings of the 9th International Conference on Hybrid Artificial Intelligent Systems, HAIS’14, volume 8480 LNAI, pp 585-596, 2014.
  • A. Bogarín, C. Romero, R. Cerezo, M. S´nchez-Santillán. Clustering for improving educational process mining. Learning Analytics and Knowledge Conference 2014, LAK ’14, Indianapolis, IN, USA, March 24-28, 2014, pp 11-15, 2014.
  • A. Cano, S. Ventura. GPU-parallel SubTree interpreter for genetic programming. Proceedings of the Conference on Genetic and Evolutionary Computation, GECCO’14, pp 887-893, 2014.
  • A. Bogarín, C. Romero, R. Cerezo, M. Sánchez-Santillán. Clustering for improving Educational process mining. ACM International Conference Proceeding Series, pp 11-15, 2014.
  • O. Reyes, Carlos Morell, S. Ventura. Feature weighting on multi-label data through quadratic loss minimization. Congreso Internacional de Matemática y Computación (COMPUMAT-2013), Habana, Cuba, 2013.
  • A. Ramírez, J.R. Romero, S. Ventura. A Novel Component Identification Approach Using Evolutionary Programming. Proceedings of the 15th Genetic and Evolutionary Computation Conference, GECCO’13 Companion, pp 209-210, 2013.
  • O. Reyes, Carlos Morell, S. Ventura. ReliefF-ML: An Extension of ReliefF Algorithm to Multi-label Learning. Springer Berlin Heidelberg, LNCS, pp 528-535, 2013.
  • J.L. Olmo, J.R. Romero, S. Ventura. On the use of ant programming for mining rare association rules. Proceedings of the World Congress on Nature and Biologically Inspired Computing, NaBIC’13, pp 220-225, 2013.
  • J. M. Luna, J. R. Romero, C. Romero, S. Ventura. Discovering Subgroups by means of Genetic Programming. Proceedings of the 16th European Conference on Genetic Programming, pp 121-132, 2013.
  • A. Cano, A. Zafra, E. Gibaja, S. Ventura. A grammar-guided genetic programming algorithm for multi-label classification, volume 7831 LNCS, pp 217-228, 2013.
  • M. M. Molina, J. M. Luna, C. Romero, S. Ventura. Meta-learning approach for automatic parameter tuning: A case study with educational datasets. Proceedings of the 5th International Conference on Educational Data Mining, pp 180-183, 2012.
  • M. I. López, J. M. Luna, C. Romero, S. Ventura. Classification via clustering for predicting final marks based on student participation in forums. Proceedings of the 5th International Conference on Educational Data Mining, pp 148-151, 2012.
  • O. Reyes, C. Morell, S. Ventura. Learning similarity metric to improve the performance of lazy multi-label ranking algorithms. Proceedings of the 12th International Conference on Intelligent Systems Design and Applications, ISDA’12, pp 246-251, 2012.
  • J.L. Olmo, A. Cano, J.R. Romero, S. Ventura. Binary and multiclass imbalanced classification using multi-objective ant programming. Proceedings of the 12th International Conference on Intelligent Systems Design and Applications, ISDA’12, pp 70-76, 2012.
  • J.L. Olmo, J.R. Romero, S. Ventura. Multi-objective ant programming for mining classification rules. Proceedings of the 15th European Conference on Genetic Programming, EuroGP’12, volume 7244 LNCS, pp 146-157, 2012.
  • J. M. Luna, J. R. Romero, C. Romero, S. Ventura. A genetic programming free-parameter algorithm for mining association rules. Proceedings of the 12th International Conference on Intelligent Systems Design and Applications, pp 64-69, 2012.
  • J.I. Jaén, J.R. Romero, S. Ventura. VisualJCLEC: A visual framework for evolutionary computation. Proceedings of the 12th International Conference on Intelligent Systems Design and Applications, ISDA’12, pp 119-125, 2012.

National conferences

  • A. Ramírez, J.R. Romero, S. Ventura. Análisis de la aplicabilidad de medidas software para el diseño semi-automático de arquitecturas. XIX Jornadas en Ingeniería del Software y Bases de Datos, JISBD’14, pp 307-320, 2014.
  • A. Ramírez, J.R. Romero, S. Ventura. Algoritmo de programación evolutiva para identificación de arquitecturas software. IX Congreso Español sobre Metaheurísticas, Algoritmos Evolutivos y Bioinspirados, MAEB’13, pp 892-901, 2013.
  • A. Ramírez, J.R. Romero, S. Ventura. Identificación de Componentes en Arquitecturas Software Mediante Programación Evolutiva. XVIII Jornadas en Ingeniería del Software y Bases de Datos, JISBD’13, pp 413-426, 2013.
  • M. Jiménez, J. M. Luna, S. Ventura. EDM para la detección precoz del fracaso escolar en secundaria. VI Simposio de Teoría y Aplicaciones de Minería de Datos, pp 1353-1362, 2013.
  • J. M. Luna, C. Romero, J. R. Romero, S. Ventura. Extracción de reglas de asociación frecuentes en bases de datos relacionales. IX Congreso Español sobre Metaheurísticas and Algoritmos Evolutivos y Bioinspirados, pp 753-762, 2013.
  • A. Cano, J.L. Olmo, S. Ventura. Programación Automática con Colonias de Hormigas Multi-Objetivo en GPUs. IX Congreso Español sobre Metaheurísticas and Algoritmos Evolutivos y Bioinspirados, MAEB’13, pp 288-297, 2013.
  • A. Ramírez, J. M. Luna, J. R. Romero, S. Ventura. Detección de intrusos con reglas de asociaci´n: un estudio preliminar. VIII Congreso Español sobre Metaheurísticas and Algoritmos Evolutivos y Bioinspirados, pp 235-242, 2012.
  • J.L. Olmo, A. Cano, J.R. Romero, S. Ventura. Programación con Hormigas Multi-Objetivo para la Extracción de Reglas de Clasificación. VIII Congreso Español sobre Metaheurísticas and Algoritmos Evolutivos y Bioinspirados, MAEB’12, pp 219-226, 2012.
  • J. M. Luna, J. L Olmo, J. R. Romero, S. and Ventura. Minería de reglas de asociación poco frecuentes con programación genética. VIII Congreso Español sobre Metaheurísticas and Algoritmos Evolutivos y Bioinspirados, pp 181-188, 2012.
  • A. Cano, J. M. Luna, A. Zafra, S. Ventura. Modelo gravitacional para clasificación. VIII Congreso Español sobre Metaheurísticas y Algoritmos Evolutivos y Bioinspirados, pp 63-70, 2012.

At the ages of fifty four fifty can advise you that I'onal ended up lucky not to have wanted the product sooner, nevertheless loosing your partner of 25yrs 2010 became a curve which modified me personally for a long time. Now there came out a place exactly where click to read fifty had visit to have my tastes fulfilled only to find out this plumbing related desired just a little poke. So I questioned my own Computer system doc intended for a little something with tiny facet is affecting. He or she provided the particular recommended you read Cialis regular 5mg. 1st working day fifty had 5mg without any help to discover more help in the event that t discovered virtually any difference considering the woman never was planning determine, which'azines our system and also l'meters being dedicated to the idea. Regardless, these materials Operates, along with is useful. And click here then up coming night time with your ex m took 10mg at 8pm, and the rest is heritage. Through 13:double zero fifty manufactured my own move, but it appeared to be the most element to live in place until finally the sunlight came up upward , 100%Pleased :)When i't thirty-two as well as gone pretty much 1,5 years without intercourse. I had created pop over to this website plenty of anxiousness related penile erection challenges. And hop over to here then We found this specific great which woman My spouse and i started courting, along with first 2 times us all sex didn'to determine which properly, and i also appeared to be worried about just what exactly your woman considered this matter. We obtained braveness to visit and request cialis approved from the health practitioner. When i had taken 10mg product and it labored perfectly. I could truthfully continue on having sex many times a day without the problems. Merely bad thing is a smallish frustration. For me this can be truly a wonder drug.