Software analytics – Knowledge Discovery and Intelligent Systems – KDIS

Software analytics (SA) provides software practitioners with methods and tools to extract knowledge from software artifacts, development processes and organisational practices. Usual sources are code repositories, forums, technical documents and internal reports, from which diverse topics can be analysed (API usage, bug prediction, program repair, etc.). Even when some new predictive models inferred by black-box techniques have gained importance due to their good performance [Bie20], the target audience of SA models –usually managers and engineers in tech companies– demand simple techniques and comprehensible outcomes to take actionable insights and make quick decisions [Ran16].

In this field of application, clearly aimed at a professional data consumer, techniques being explored look for facilitating the user experience when applying these methods. As an example, for defect prediction and effort estimation hyperparameter optimization of simple learning techniques has proven to be more effective than computationally expensive black-box techniques [Agr20]. Also, interactive proposals that have recently appeared [Min16] let the human build rule-based classifiers for code review [Bau20] or detect inaccurate effort estimations [Con19]. Finally, the trade-off between accuracy and explainability has been analysed, pointing out to the need of adapting XAI methods to SA and discussing how humans could participate in the process [Dam18]. Despite having reached these conclusions, very few specific proposals have been proposed yet [Hum19].

With this project, we seek to advance in the SA field in several ways:

Automatic design of workflows to extract and combine unstructured data from software repositories. Helping software managers to build data ingestion and preprocessing procedures can significantly increase the likelihood of adopting SA.
Explaining predictive models for software testing with interactive methods. Current ML proposals do not inform testers about the reasons why test cases fail and assume the availability of data features (testing information). Contextualized and simplified predictive models will increase its applicability.

Supporting the software development process with learning and optimisation techniques has been a recurrent problem for the team of this project [Ram17, Bar20], with especial emphasis on interactive approaches [Ram18, Ram19].

[Agr20] A. Agrawal, T. Menzies, L.L. Minku, M. Wagner, Z. Yu. “Better software analytics via DUO: Data mining algorithms using/used-by optimizers”. Empirical Software Engineering, vol. 25(3), pp. 2099-2136. 2020.
[Bar20] R. Barbudo, A. Ramírez, F. Servant, J.R. Romero. “GEML: A Grammar-based Evolutionary Machine Learning Approach for Design-Pattern Detection”. Submitted to Journal of Systems and Software. 2020 (currently under minor revision).
[Bau20] T. Baum, S. Herbold, K. Schneider. “GIMO: A multi-objective anytime rule mining system to ease iterative feedback from domain experts”. Expert Systems with Applications: X, vol. 8. Article no. 100040. 2020.
[Bie20] K. Biesialska, X. Franch, V. Muntés-Mulero. “Big Data analytics in Agile software development: a systematic mapping study”. Information and Software Technology. 2020.
[Con19] M. Conoscenti, V. Bresner, A. Vetrò, D. Méndez Fernández. “Combining data analytics and developers feedback for identifying reasons of inaccurate estimations in agile software development”. Journal of Systems and Software, vol. 156, pp. 126-135. 2019.
[Dam18] H.K. Dam, T.Tran. A. Ghose. “Explainable software analytics”. ACM/IEEE 40th Conference on Software Engineering: New Ideas and Emerging Results track, pp. 53-56. 2018.
[Hum19] J. Humphreys, H.K. Dam. “An explainable deep model for defect prediction”. IEEE/ACM 7th Int. Workshop on Realizing Artificial Intelligence Synergies in Software Engineering (RAISE), pp. 49-55. 2019.
[Min16] L.L. Minku, E. Mendes, B. Turhan. “Data mining for software engineering and human in the loop”. Progress in Artificial Intelligence, vol. 5(4), pp. 307-314.
[Ran16] V. Ranganath. “While models are good, simple explanations are better”. Perspectives on Data Science for Software Engineering. 2016.
[Ram17] A. Ramírez, J.A. Parejo, J.R. Romero, S. Segura, A. Ruiz-Cortés. “Evolutionary composition of QoS-aware web services: a many-objective perspective”. Expert Systems with Applications, vol. 72, pp. 357-370. 2017.
[Ram18] A. Ramírez, J.R. Romero, S. Ventura. “Interactive Multi-Objective Evolutionary Optimization of Software Architectures”. Information Sciences, vol. 463-464, pp. 92-109. 2018.
[Ram19] A. Ramírez, J.R. Romero, C.L. Simons. “A Systematic Review of Interaction in Search-Based Software Engineering”. IEEE Transactions on Software Engineering, vol. 45(8), pp. 760-781. 2019.