IMPROVING THE ARCHITECTURE AND INTERPRETABILITY OF DEEP NEURAL NETWORKS: AN EVOLUTIONARY COMPUTING APPROACH
Ph.D. Student: Antonio R. Moya Martín-Castaño
Advisor: Sebastián Ventura
Started on: October 2019
Keywords: Evolutionary Algortihms, Neural Networks
Many advances have been made in recent years in the field of artificial intelligence. Within these advances, the importance of a growing paradigm such as deep learning stands out, in which models inspired by the structure of the neuron networks of the human nervous system are developed to perform computationally costly tasks such as image recognition, speech synthesis, etc.
For proper learning in these deep models, it is necessary to properly adjust the architecture and hyper-parameters of these models, a task that is computationally impossible to do by brute force. For the optimisation of architecture and hyper-parameters, numerous techniques have already been developed. These include, for example, random search, grid search or bayesian optimisation techniques. In our proposal, we focus on other techniques also used for this and other areas: evolutionary algorithms. We can find in the literature numerous cases in which the use of these algorithms in optimisation problems leads to great results. From this type of algorithms we highlight their capacity to balance exploitation and exploration during the search process.
Despite reducing the time required by a brute-force technique, optimisation techniques (even more so in the case of evolutionary algorithms) generally have to deal with very high computing times. An evolutionary algorithm is therefore proposed in which the process of searching for the best solution is guided by partial evaluations of the deep learning models, obtaining an approximate result in each of these evaluations that indicates how good could be each model, so that the time spent in general by the evolutionary algorithm is notably reduced.
For this task of prediction from model evaluations, probabilistic models, that are defined by a combination of functions that explain the learning curve, are used.
The partial objectives are the following:
- Define an evolutionary algorithm that allows the optimisation of architecture and hyper-parameters of deep learning models used for different tasks, such as image classification, natural language processing or human activity recognition.
- Define probabilistic models that allow adequate prediction of how a model learns from a partial evaluation of this model.
- Use these probabilistic models to reduce evaluation times during the evolutionary algorithm.
The development of this thesis is being supported by:
- Spanish Ministry of Education, Culture and Sports under the FPU program (FPU18/06307).