Abstract
Over the past years, thanks to the availability of new technologies and
advanced hardware, the research on artificial intelligence, more
specifically machine and deep learning, has flourished. This newly found
interest has led many researchers to start applying machine and deep
learning techniques also in the field of software engineering, including
in the domain of software quality.
In this thesis, we investigate the performance of machine learning models for the detection of software faults with a threefold purpose. First of all, we aim at establishing which are the most suitable models to use, secondly we aim at finding the common issues which prevent commonly used models from performing well in the detection of software faults. Finally, we propose possible solutions to these issues. The analysis of the performance of the machine learning models highlighted two main issues: the unbalanced data, and the time dependency within the data. To address these issues, we tested multiple techniques: treating the faults as anomalies and artificially generating more samples for solving the unbalanced data problem; the use of deep learning models that take into account the history of each data sample to solve the time dependency issue.
We found that using oversampling techniques to balance the data, and using deep learning models specific for time series classification substantially improve the detection of software faults.
The results shed some light on the issues related to machine learning for the prediction of software faults. These results indicate a need to consider the time dependency of the data used in software quality, which needs more attention from researchers. Also, improving the detection performance of software faults could help the practitioners to improve the quality of their software.
In the future, more advanced deep learning models can be investigated. This includes the use of other metrics as predictors and the use of more advanced time series analysis tools for better taking into account the time dependency of the data.
In this thesis, we investigate the performance of machine learning models for the detection of software faults with a threefold purpose. First of all, we aim at establishing which are the most suitable models to use, secondly we aim at finding the common issues which prevent commonly used models from performing well in the detection of software faults. Finally, we propose possible solutions to these issues. The analysis of the performance of the machine learning models highlighted two main issues: the unbalanced data, and the time dependency within the data. To address these issues, we tested multiple techniques: treating the faults as anomalies and artificially generating more samples for solving the unbalanced data problem; the use of deep learning models that take into account the history of each data sample to solve the time dependency issue.
We found that using oversampling techniques to balance the data, and using deep learning models specific for time series classification substantially improve the detection of software faults.
The results shed some light on the issues related to machine learning for the prediction of software faults. These results indicate a need to consider the time dependency of the data used in software quality, which needs more attention from researchers. Also, improving the detection performance of software faults could help the practitioners to improve the quality of their software.
In the future, more advanced deep learning models can be investigated. This includes the use of other metrics as predictors and the use of more advanced time series analysis tools for better taking into account the time dependency of the data.
Original language | English |
---|---|
Place of Publication | Tampere |
Publisher | Tampere University |
ISBN (Electronic) | 978-952-03-2431-5 |
ISBN (Print) | 978-952-03-2430-8 |
Publication status | Published - 2022 |
Publication type | G5 Doctoral dissertation (articles) |
Publication series
Name | Tampere University Dissertations - Tampereen yliopiston väitöskirjat |
---|---|
Volume | 613 |
ISSN (Print) | 2489-9860 |
ISSN (Electronic) | 2490-0028 |