UPDF AI

Empirical Analysis of Temporal and Spatial Fault Characteristics in Multi-Fault Bug Repositories

Dylan Callaghan,Alexandra van der Spuy,Bernd Fischer

2025 · DOI: 10.48550/arXiv.2508.08872
arXiv.org · 0 Citations

TLDR

An empirical analysis of the temporal and spatial characteristics of faults existing in 16 open-source Java and Python projects, which form part of the Defects4J and BugsInPy datasets, respectively, shows that many faults in these software systems are long-lived, leading to the majority of software versions having multiple coexisting faults.

Abstract

Fixing software faults contributes significantly to the cost of software maintenance and evolution. Techniques for reducing these costs require datasets of software faults, as well as an understanding of the faults, for optimal testing and evaluation. In this paper, we present an empirical analysis of the temporal and spatial characteristics of faults existing in 16 open-source Java and Python projects, which form part of the Defects4J and BugsInPy datasets, respectively. Our findings show that many faults in these software systems are long-lived, leading to the majority of software versions having multiple coexisting faults. This is in contrast to the assumptions of the original datasets, where the majority of versions only identify a single fault. In addition, we show that although the faults are found in only a small subset of the systems, these faults are often evenly distributed amongst this subset, leading to relatively few bug hotspots.