SMILESENG

Page 29 - SMILESENG

P. 29

Intl. Summer School on Search- and Machine Learning-based Software Engineering
Third-party Library Recommendations for Python Developers using Software Analytics Techniques
Pedro P. Garc´ıa-Pozo Dept. Computer Science University of Co´rdoba i82gapop@uco.es
Aurora Ram´ırez Dept. Computer Science University of Co´rdoba aramirez@uco.es
Jose´ Rau´l Romero Dept. Computer Science University of Co´rdoba jrromero@uco.es
Abstract—This talk provides an overview of our ongoing research into the design of intelligent assistants to support Python developers. In the context of a bachelor thesis, we are taking the first steps towards this long-term research objective. In this sense, this short paper presents the motivation and research objectives of our work, as well as our first results focused on the analysis of the Python library ecosystem using software analytics techniques.
I. INTRODUCTION
Python has become one of the most widely used program- ming languages among developers due to its low learning curve, its portability and the large amount of available re- sources within the community. The availability of third-party libraries is clearly one of the key features of Python, which promotes code reuse while reducing development effort. PyPi, one of the package managers for Python, currently hosts more than 375,000 projects.1 In 2016, the Python ecosystem was already recognized as one of the most extensive and with higher growth prospects [1].
In such a vast and dynamic ecosystem, selecting the most suitable third-party library becomes a hard task. Several li- braries might meet the functional requirements, so program- mers need to consider other factors like its development sup- port, dependencies and compatibility with other libraries, etc. Furthermore, they should decide how the library functionalities are better integrated in their current program, and whether it actually provides better performance than their own code.
The problem of library recommendation has been studied in the recent literature for Java systems [2], [3], [4]. However, the recommendations are mostly based on the idea of finding similar projects to the one under evaluation, then choosing the libraries that appear in the related projects but have not been used in the new one. These recommender systems explore a large set of code repositories, using collaborative filtering, pattern mining or clustering to discover similarities among them. Recently, some authors have focused on how to support library migration, taking a new temporal perspective of the problem by means of deep learning [5].
In this bachelor thesis, we want to take the first steps towards providing intelligent recommendations about third- party libraries oriented to Python developers. Our research
1https://pypi.org/ (Accessed: 31/05/2022)
vision is that current recommender systems do not exploit all the potential knowledge hidden in software repositories, and still require additional steps to support developers in the effective integration of the recommended libraries.
II. RESEARCH OBJECTIVES
In the long-term, we have identified three research objec- tives towards the design of more effective intelligent develop- ment assistants specifically oriented to Python:
1) Analyze the use of Python library in repositories to extract hidden knowledge to make recommendations.
2) Enhance the code knowledge base to be able to adapt the recommendations to the project context.
3) Assist the developers regarding how their code should be combined or replaced by API calls to the library.
III. OVERVIEW OF THE APPROACH
Figure 1 shows a high-level view of the proposed approach to develop an intelligent assistant. We first need to analyze the Python library ecosystem, adopting mining software reposito- ries (MSR) best practices to extract data about library usage by a large number of repositories. Filtering and statistically studying the collected dataset is necessary at this stage to ensure that the information is representative and useful to perform next steps. In the second phase, we will enhance the knowledge base with additional information from the selected set of libraries (popularity, version compatibility or update). Library usage trends, e.g., whether some libraries are replaced in favor of others or they become obsolete, will be analyzed using temporal pattern mining techniques over the commit histories. Also, we will study dependencies among libraries to discover patterns of libraries frequently used together, or groups of libraries with related functionalities. Rule mining and clustering techniques will be applied to such purpose. As a final step, we will make use of all the available information to help developers integrate the library in his/her project. This implies the study of the project to detect pieces of code likely to be changed by API calls to the library. Understanding how other repositories use specific libraries, and whether the current code has similar dependencies is necessary to chose a library satisfying the developer’s needs and expertise. For this step, we hypothesize that code structural analysis can be combined with classification rules learned from other repositories as done
17

27 28 29 30 31