Machine Learning for Software Fault Detection: Issues and Possible Solutions

Francesco Lomio

Research output: Book/ReportDoctoral thesisCollection of Articles

Abstract

Over the past years, thanks to the availability of new technologies and advanced hardware, the research on artificial intelligence, more specifically machine and deep learning, has flourished. This newly found interest has led many researchers to start applying machine and deep learning techniques also in the field of software engineering, including in the domain of software quality.

In this thesis, we investigate the performance of machine learning models for the detection of software faults with a threefold purpose. First of all, we aim at establishing which are the most suitable models to use, secondly we aim at finding the common issues which prevent commonly used models from performing well in the detection of software faults. Finally, we propose possible solutions to these issues. The analysis of the performance of the machine learning models highlighted two main issues: the unbalanced data, and the time dependency within the data. To address these issues, we tested multiple techniques: treating the faults as anomalies and artificially generating more samples for solving the unbalanced data problem; the use of deep learning models that take into account the history of each data sample to solve the time dependency issue.

We found that using oversampling techniques to balance the data, and using deep learning models specific for time series classification substantially improve the detection of software faults.

The results shed some light on the issues related to machine learning for the prediction of software faults. These results indicate a need to consider the time dependency of the data used in software quality, which needs more attention from researchers. Also, improving the detection performance of software faults could help the practitioners to improve the quality of their software.

In the future, more advanced deep learning models can be investigated. This includes the use of other metrics as predictors and the use of more advanced time series analysis tools for better taking into account the time dependency of the data.
Original languageEnglish
Place of PublicationTampere
PublisherTampere University
ISBN (Electronic)978-952-03-2431-5
ISBN (Print)978-952-03-2430-8
Publication statusPublished - 2022
Publication typeG5 Doctoral dissertation (articles)

Publication series

NameTampere University Dissertations - Tampereen yliopiston väitöskirjat
Volume613
ISSN (Print)2489-9860
ISSN (Electronic)2490-0028

Fingerprint

Dive into the research topics of 'Machine Learning for Software Fault Detection: Issues and Possible Solutions'. Together they form a unique fingerprint.

Cite this