EXTRACTING USEFUL KNOWLEDGE IN BIOMEDICINE THROUGH PATTERN MINING TECHNIQUES
Ph.D. Student: Antonio Manuel Trasierras
Advisors: José María Luna, Sebastián Ventura Soto
Started on: January 2019
Keywords: pattern mining, cancer, biomedicine
Recent technological advances have allowed the study of large volumes of genomic data. However, the methodology commonly used in this type of analysis presents several drawbacks such as the use of previous hypotheses, a priori knowledge and statistical tests based solely on correlations between pairs of variables. This framework can cause a loss of useful knowledge because it biases the search from the beginning. Therefore, with the aim of providing a solution to these problems, it is necessary to use techniques that avoid the biases produced by the ones mentioned above. To this end, the use of data science techniques is proposed, an interdisciplinary field that seeks the extraction of useful and novel (unknown) knowledge of data from various sources. Within this field, pattern mining includes a series of techniques of a descriptive nature that make it possible to obtain information relating to simple and complex relationships between the elements of a dataset. Pattern mining techniques have the advantage of not producing biases as they do not require the formulation of a baseline hypothesis. The study of the specific relationships between the items (genes) contained in the data is carried out without any prior knowledge of the function or functional category of the genomic variable. A clear advantage of these techniques is the possibility of being able to infer or delimit functional relationships of genes that have not been studied.
The main goal of this thesis is to cover these studies through new pattern mining techniques that allow to work without previous hypotheses nor apriori knowledge and therefore, to extract useful knowledge from genomic data.
The partial objectives are the following:
- Implementation of classic pattern mining techniques and new techniques such as supervised descriptive pattern mining in the study of cancer data for the search of molecular biomarkers that describe different functional modules affected in the types of cancer studied.
- Development of an open-source software tool that integrates the different methods of analysis and procedures used.
The development of this thesis is being supported by:
- Spanish Ministry of Science and Competitiveness, project TIN-2017-83445-P.