Mining data with more flexible representations

Status
In progress
Principal Investigator
     
Sebastián Ventura
Reference
TIN2014-55252-P
Members
Alberto Cano
Krzysztof Cios
Carlos García
Eva L. Gibaja
Alain Guerrero
José María Luna
Carmen Luque
María Luque
José María Moyano
Francisco Padillo
Mykola Pechenizkiy
Aurora Ramírez
Oscar G. Reyes
Hermes Robles
Cristóbal Romero
José Raúl Romero
Amelia Zafra
Duration
January 2015 - December 2018
Budget
69,900 €
Project

Summary

Project MARFIL (Mining data with more flexible representations) has as objective to develop novel approaches for knowledge extraction in those contexts demanding some additional flexibility in data representation:

  • Multi-instance and relational learning models that enable a more flexible representation of the input space.
  • Learning models with multiple outputs, especially multi-label learning, that allow representing the output space with more flexibility.
  • Multi-source and multi-view learning models, which make possible to combine together several data sets describing the same problem using models individually chosen for each of these data sources.

Having all the approaches aforementioned, we will develop new models in the scope of classification, clustering, association and subgroup discovery. We will also enable mechanisms to adapt these models to problems with special characteristics, such as a large number of variables, or very large data sets, as the circumstance dictates. Some of these problems fit into the so-called big data term, and therefore our proposals will be adapted to this new landscape, supplying scalable implementations that are able to provide innovative, appropriate solutions in these contexts.

In addition to its theoretical dimension, previously introduced, this project has got an applied orientation, since we expect to solve several real life problems making use of the developed models. More specifically, we will address some issues related to the context of educational data mining (predicting students' academic performance, modelling self-assessment and peer assessment plans, and developing resource and activities recommendation models for students), and biomedicine (early diagnosis by studying electronic health records, and predicting the risk of insulin metabolism diseases and related pathologies). It is remarkable the interest that nowadays arouses both application fields in our society, as well as the significant impact that any small step forward would have on the health and educational communities. In fact, in addition to our close cooperation with the Universities involved in this project and the Maimónides health research institute, several companies in both sectors have already shown their interest in the results derived from this proposal. Therefore, in a first stage, we will analyse whether these representation models really represent an important step forward to serve the problem resolution with respect to traditional approaches. In a second stage, the existing state of the art methods will be compared to our own proposals, where we expect to achieve significantly improved outcomes.

Last but not least, in order to promote the conducted research, we plan to build test data repositories together with each one of the resulting models in order to allow the scientific community to replicate our experimentation and thoroughly compare the results. Furthermore, we will integrate the developed models into the today's most relevant software platforms in order to facilitate their dissemination.

Research results

Software

Dataset repositories

Books
  • S. Ventura, J. M. Luna. Pattern Mining with Evolutionary Algorithms. Springer, 2016.
  • F. Herrera, S. Ventura, R. Bello, C. Cornelis, A. Zafra, D. Sanchez-Tarragó, S. Vluymans. Multiple Instance Learning. Foundations and Algorithms. Springer, 2016.
  • Journal articles
  • A. Cano, J. M. Luna, E. L. Gibaja, S. Ventura. LAIM discretization for multi-label data. Information Sciences, 330, pp 370-384, 2016.
  • O. G. Reyes, C. Morell, S. Ventura. Effective lazy learning algorithm based on data gravitation model for multi-label learning. Information Sciences, 340-341, pp 59-174, 2016.
  • A. Cano, D. T. Nguyen, S. Ventura, K. Cios. ur-CAIM: Improved CAIM Discretization for Unbalanced and Balanced Data. Soft Computing, 20(1), pp 173-188, 2016.
  • A. Guerrero, C. Morell, A. Y. Noaman, S. Ventura. An algorithm evaluation for classification rules discovering with gene expression programming. International Journal of Computational Intelligence Systems, 9(2), pp 263-280, 2016.
  • J. M. Luna, M. Pechenizkiy, S. Ventura. Mining Exceptional Relationships with Grammar-Guided Genetic Programming. Knowledge and Information Systems, 47(3), pp 571-594, 2016.
  • C. Romero, C. Márquez-Vera, A. Cano, A. Y. Noaman, H. M. Fardoun, S. Ventura. Early Dropout Prediction using Data Mining: A Case Study with High School Students. Expert Systems, 33(1), pp 107-124, 2016.
  • J.M. Luna, A. Cano, M. Pechenizkiy, S. Ventura. Speeding-up Association Rule Mining with Inverted Index Compression. IEEE Transactions on Cybernetics, PP(99), pp. 1-14, 2016.
  • J.M. Luna, A. Cano, V. Sakalauskas, S. Ventura. Discovering useful patterns from multiple instance data. Information Sciences, 357, pp 23-38, 2016.
  • E. L. Gibaja, J.M. Moyano, S. Ventura. An ensemble-based approach for multi-view multi-label classification. Progress in Artificial Intelligence, 5, pp 251-259, 2016.
  • J. M. Luna, A. Y. Noaman, A. H. M. Ragab, S. Ventura. Recommending degree studies according to student's attitudes in high school by means of subgroup discovery. International Journal of Computational Intelligence Systems, 2016. (accepted)
  • O. Reyes, E. Pérez, María del Carmen Rodríguez-Hernández, Habib M. Fardoun, S. Ventura. JCLAL: a Java framework for active learning. Journal of Machine Learning Research, 17(95), pp 1-5, 2016.
  • J. L. Olmo, C. Romero, E. Gibaja, S. Ventura. Improving meta-learning for algorithm selection by using multi-label classification: a case of study with educational data sets. International Journal of Computational Intelligence Systems, 8(6), pp 1144-1164, 2015
  • A. Cano. S. Ventura, K. Cios. Multi-Objective Genetic Programming for Feature Extraction and Data Visualization. Soft Computing, pp 1-21, 2015.
  • J. M. Luna, C. Romero, J. R. Romero, S. Ventura. An Evolutionary Algorithm for the Discovery of Rare Class Association Rules in Learning Management Systems. Applied Intelligence, 42(3), pp 501-513, 2015.
  • A. Cano, A. Zafra, S. Ventura. Speeding up Multiple Instance Learning Classification Rules on GPUs. Knowledge and Information Systems, 44(1), pp 127-145, 2015.
  • O. G. Reyes, C. Morell, S. Ventura. Scalable extensions of the ReliefF algorithm for weighting and selecting features on the multi-label learning context. Neurocomputing, 161, pp 168-182, 2015.
  • A.Cano, J. M. Luna, A. Zafra, S. Ventura. A classification module for Genetic Programming Algorithms in JCLEC. Journal of Machine Learning Research, 16, pp 491-494, 2015.
  • E. Gibaja, S. Ventura. A Tutorial on Multi-Label Learning. ACM Computing Surveys, 47(3), pp 1-38, 2015.
  • International conferences
  • C. Romero, R. Cerezo, J.A. Espino, M. Bermudez. Using Android Wear for Avoiding Procrastination Behaviours in MOOCs. Learning at Scale (L@S), Edimburgo, Scotland, UK, pp 193-196. 2016.
  • A. Zapata, V. H. Menéndez, C. Romero, M.E. Prieto. Meta-learning for predicting the best vote aggregation method: Case study in collaborative searching of Los. Proceedings of the 9th International Conference on Educational Data Mining, EDM 2016, Raleigh, North Carolina, USA, pp 656-657, 2016.
  • F. Padillo, J. M. Luna, A. Cano, S. Ventura. A data structure to speed-up machine learning algorithms on massive datasets. Proceedings of the 11th International Conference, HAIS 2016, Seville, Spain, April 18-20, pp 365-376, 2016.
  • F. Padillo, J. M. Luna, S. Ventura. Subgroup discovery on Big Data: exhaustive methodologies using Map-Reduce. IEEE Big Data Science and Engineering, 2016.
  • M.A. Jiménez-Gómez, J. M. Luna, C. Romero, S. Ventura. Discovering Clues to Avoid Middle Shool Failure as Early as Possible. Learning Analytics and Knowledge (LAK), NY, USA. pp 300-305. 2015.
  • A. Bogarin, C. Romero, R. Cerezo. Discovering student's navigation path in moodle. International Conference on Educational Data Mining, Madrid, Spain, pp 556-557. 2015.
  • E. Pérez, O. Reyes, S. Ventura. Aplicación del aprendizaje activo en el diagnóstico médico. IV Encontro Regional de Computacao e Sistemas de Informacao (ENCOSIS-2015), 2015.
  • National conferences
  • J. Fuentes-Alventosa, C. Romero, C. García-Martínez. Predicción de la aceptación o rechazo de las calificaciones propuestas por el alumnado usando técnicas de minería de datos. JENUI. Almeria. pp 203-210. 2016.
  • J. M. Luna, F. Padillo, S. Ventura. Minería de reglas de asociación extraidas con algoritmos evolutivos. XI Congreso Español sobre Metaheurísticas, Algoritmos Evolutivos y Bioinspirados (MAEB), pp 127-136, 2016.
  • F. Padillo, J. M. Luna, S. Ventura, F. Herrera. Algoritmo de programación genética gramatical para la extracción de reglas de asociación en Big Data usando el paradigma MapReduce. XI Congreso Español sobre Metaheurísticas, Algoritmos Evolutivos y Bioinspirados (MAEB), pp 137-148, 2016.
  • F. Padillo, J. M. Luna, S. Ventura. Búsquedas exhaustivas de subgrupos con MapReduce en Big Data. XVII Conferencia de la Asociación Española para la Inteligencia Artificial, pp 779-789, 2016.
  • F. Padillo, J. M. Luna, S. Ventura. Minería de patrones en BigData. XVII Conferencia de la Asociación Española para la Inteligencia Artificial, pp 769-778, 2016.
  • O. G. Reyes, S. Ventura. Estrategia efectiva para aprendizaje activo multi-etiqueta. XVII Conferencia de la Asociación Española para la Inteligencia Artificial, pp 835-844, 2016.
  • F. Ibáñez A. Cano, and S. Ventura. Evaluación distribuida transparente para algoritmos evolutivos en JCLEC. II Jornadas de Algoritmos Evolutivos y Metaheurísticas (XVI CAEPIA), pp 231-240, 2015.
  • J.M. Moyano, E.L. Gibaja, A. Cano, J.M. Luna, and S. Ventura. Diseño Automático de Multi-Clasificadores Basados en Proyecciones de Etiquetas. II Jornadas de Fusión de la Información y ensembles (XVI CAEPIA), pp 355-366, 2015.
  • J.M. Moyano, E.L. Gibaja, A. Cano, J.M. Luna, and S. Ventura. Algoritmo evolutivo para optimizar ensembles de clasificadores multi-etiqueta. X Congreso Español sobre Metaheurísticas, Algoritmos Evolutivos y Bioinspirados (MAEB), pp 219-225, 2015.