A decision analysis approach for selecting software defect prediction method in the early phases


SOFTWARE QUALITY JOURNAL, vol.31, no.1, pp.121-177, 2023 (SCI-Expanded) identifier identifier

  • Publication Type: Article / Article
  • Volume: 31 Issue: 1
  • Publication Date: 2023
  • Doi Number: 10.1007/s11219-022-09595-0
  • Journal Indexes: Science Citation Index Expanded (SCI-EXPANDED), Scopus, ABI/INFORM, Applied Science & Technology Source, Compendex, Computer & Applied Sciences, INSPEC
  • Page Numbers: pp.121-177
  • Keywords: Early, Defect prediction, Software defect, Software quality, Prediction method, Multi-criteria, Decision analysis, Decision tree, Fuzzy TOPSIS, MACHINE LEARNING TECHNIQUES, FAULT PREDICTION, ATTRIBUTES, TOPSIS
  • Hacettepe University Affiliated: Yes


One of the most important quality indicators of a software product is its defect rates. In this regard and also with the proliferation in methods and tools supporting prediction in software engineering, the interest in software defect prediction (SDP) is increasing. Eventually, it becomes important for stakeholders to build the desired SDP model as early as possible and use it throughout the software development lifecycle. We aim to present a two-phase decision analysis approach, which is structured using decision tree and multi-criteria decision analysis (MCDA), in order to select the best-fit SDP method. To do this, we specify and use criteria to evaluate SDP methods according to the dataset characteristics and stakeholder needs that are elicited via a questionnaire in the early phases of the development lifecycle. We systematically determine the alternatives to be evaluated in the decision analysis and the criteria that may have an impact on the decision. In doing so, we conduct two different expert opinion studies to formulate the decision analysis. We also present case studies with selected SDP methods using public datasets, and investigate the trustworthiness of the proposed approach. The most convenient methods proposed by the decision analysis are naive Bayes (NB), decision tree (DT), and fuzzy logic for the case studies. It is inferred that the results of the decision analysis are consistent with the results of the empirical evidence that we present. The presented approach could be useful in helping software practitioners decide which SDP method is advantageous by revealing their specific requirements for the software projects and associated defect data. While our results provide guidance for future research on the context of early software defect prediction (ESDP), further studies on real software projects are necessary in order to expand knowledge prior to having decisions that are more reliable.