The ABLoTS Approach for Bug Localization: is it replicable and generalizable?

Niu, Feifei; Mayr-Dorn, Christoph; Assunção, Wesley K. G.; Huang, LiGuo; Ge, Jidong; Luo, Bin; Egyed, Alexander

doi:10.1109/MSR59073.2023.00083

by Niu, Feifei, Mayr-Dorn, Christoph, Assunção, Wesley K. G., Huang, LiGuo, Ge, Jidong, Luo, Bin and Egyed, Alexander

Abstract:

Bug localization is the task of recommending source code locations (typically files) that probably contain the cause of a bug and hence need to be changed to fix the bug. Along these lines, information retrieval-based bug localization (IRBL) approaches have been adopted, which identify the most bug-prone files from the source code space. In current practice, a series of state-of-the-art IRBL techniques leverage the combination of different components, e.g., similar reports, version history, code structure, to achieve better performance. ABLoTS is a recently proposed approach with the core component, TraceScore, that utilizes requirements and traceability information between different issue reports, i.e., feature requests and bug reports, to identify buggy source code snippets with promising results. To evaluate the accuracy of these results and obtain additional insights into the practical applicability of ABLoTS, supporting of future more efficient and rapid replication and comparison, we conducted a replication study of this approach with the original data set and also on an extended data set. The extended data set includes 16 more projects comprising 25,893 bug reports and corresponding source code commits. While we find that the TraceScore component as the core of ABLoTS produces comparable results with the extended data set, we also find that the ABLoTS approach no longer achieves promising results, due to an overlooked side effect of incorrectly choosing a cut-off date that led to training data leaking into test data with significant effects on performance.

View Digital Library

Reference:

Niu, Feifei, Mayr-Dorn, Christoph, Assunção, Wesley K. G., Huang, LiGuo, Ge, Jidong, Luo, Bin and Egyed, Alexander: The ABLoTS Approach for Bug Localization: is it replicable and generalizable?, in IEEE/ACM 20th International Conference on Mining Software Repositories (MSR), 2023.

Bibtex Entry:

@Conference{Niu2023a,
  author    = {Niu, Feifei and Mayr-Dorn, Christoph and Assunção, Wesley K. G. and Huang, LiGuo and Ge, Jidong and Luo, Bin and Egyed, Alexander},
  booktitle = {IEEE/ACM 20th International Conference on Mining Software Repositories (MSR)},
  title     = {The ABLoTS Approach for Bug Localization: is it replicable and generalizable?},
  year      = {2023},
  pages     = {576-587},
  abstract  = {Bug localization is the task of recommending source code locations (typically files) that probably contain the cause of a bug and hence need to be changed to fix the bug. Along these lines, information retrieval-based bug localization (IRBL) approaches have been adopted, which identify the most bug-prone files from the source code space. In current practice, a series of state-of-the-art IRBL techniques leverage the combination of different components, e.g., similar reports, version history, code structure, to achieve better performance. ABLoTS is a recently proposed approach with the core component, TraceScore, that utilizes requirements and traceability information between different issue reports, i.e., feature requests and bug reports, to identify buggy source code snippets with promising results. To evaluate the accuracy of these results and obtain additional insights into the practical applicability of ABLoTS, supporting of future more efficient and rapid replication and comparison, we conducted a replication study of this approach with the original data set and also on an extended data set. The extended data set includes 16 more projects comprising 25,893 bug reports and corresponding source code commits. While we find that the TraceScore component as the core of ABLoTS produces comparable results with the extended data set, we also find that the ABLoTS approach no longer achieves promising results, due to an overlooked side effect of incorrectly choosing a cut-off date that led to training data leaking into test data with significant effects on performance.},
  doi       = {10.1109/MSR59073.2023.00083},
  keywords  = {LIT Secure and Correct Systems Lab, FWF P31989, FWF P34805},
  url       = {https://ieeexplore.ieee.org/document/10173939},
}