TRIAD: Automated Traceability Recovery based on Biterm-enhanced Deduction of Transitive Links among Artifacts

Gao, Hui; Kuang, Hongyu; Assunção, Wesley K. G.; Mayr-Dorn, Christoph; Rong, Guoping; Zhang, He; Ma, Xiaoxing; Egyed, Alexander

doi:10.48550/ARXIV.2312.16854

by Hui Gao, Hongyu Kuang, Wesley K. G. Assunção, Christoph Mayr-Dorn, Guoping Rong, He Zhang, Xiaoxing Ma, Alexander Egyed

Abstract:

Traceability allows stakeholders to extract and comprehend the trace links among software artifacts introduced across the software life cycle, to provide significant support for software engineering tasks. Despite its proven benefits, software traceability is challenging to recover and maintain manually. Hence, plenty of approaches for automated traceability have been proposed. Most rely on textual similarities among software artifacts, such as those based on Information Retrieval (IR). However, artifacts in different abstraction levels usually have different textual descriptions, which can greatly hinder the performance of IR-based approaches (e.g., a requirement in natural language may have a small textual similarity to a Java class). In this work, we leverage the consensual biterms and transitive relationships (i.e., inner- and outer-transitive links) based on intermediate artifacts to improve IR-based traceability recovery. We first extract and filter biterms from all source, intermediate, and target artifacts. We then use the consensual biterms from the intermediate artifacts to extend the biterms of both source and target artifacts, and finally deduce outer and inner-transitive links to adjust text similarities between source and target artifacts. We conducted a comprehensive empirical evaluation based on five systems widely used in other literature to show that our approach can outperform four state-of-the-art approaches, and how its performance is affected by different conditions of source, intermediate, and target artifacts. The results indicate that our approach can outperform baseline approaches in AP over 15% and MAP over 10% on average.

View PDF

Reference:

Hui Gao, Hongyu Kuang, Wesley K. G. Assunção, Christoph Mayr-Dorn, Guoping Rong, He Zhang, Xiaoxing Ma, Alexander Egyed, "TRIAD: Automated Traceability Recovery based on Biterm-enhanced Deduction of Transitive Links among Artifacts", In: CoRR, vol. abs/2312.16854, 2023.

Bibtex Entry:

@Article{Gao2023,
  author        = {Hui Gao and Hongyu Kuang and Wesley K. G. Assunção and Christoph Mayr-Dorn and Guoping Rong and He Zhang and Xiaoxing Ma and Alexander Egyed},
  journal       = {CoRR},
  title         = {{TRIAD:} Automated Traceability Recovery based on Biterm-enhanced Deduction of Transitive Links among Artifacts},
  year          = {2023},
  volume        = {abs/2312.16854},
  abstract      = {Traceability allows stakeholders to extract and comprehend the trace links among software artifacts introduced across the software life cycle, to provide significant support for software engineering tasks. Despite its proven benefits, software traceability is challenging to recover and maintain manually. Hence, plenty of approaches for automated traceability have been proposed. Most rely on textual similarities among software artifacts, such as those based on Information Retrieval (IR). However, artifacts in different abstraction levels usually have different textual descriptions, which can greatly hinder the performance of IR-based approaches (e.g., a requirement in natural language may have a small textual similarity to a Java class). In this work, we leverage the consensual biterms and transitive relationships (i.e., inner- and outer-transitive links) based on intermediate artifacts to improve IR-based traceability recovery. We first extract and filter biterms from all source, intermediate, and target artifacts. We then use the consensual biterms from the intermediate artifacts to extend the biterms of both source and target artifacts, and finally deduce outer and inner-transitive links to adjust text similarities between source and target artifacts. We conducted a comprehensive empirical evaluation based on five systems widely used in other literature to show that our approach can outperform four state-of-the-art approaches, and how its performance is affected by different conditions of source, intermediate, and target artifacts. The results indicate that our approach can outperform baseline approaches in AP over 15% and MAP over 10% on average.},
  archiveprefix = {arXiv},
  bibsource     = {dblp computer science bibliography, https://dblp.org},
  biburl        = {https://dblp.org/rec/journals/corr/abs-2312-16854.bib},
  doi           = {10.48550/ARXIV.2312.16854},
  eprint        = {2312.16854},
  timestamp     = {Fri, 19 Jan 2024 13:27:20 +0100},
  url           = {https://doi.org/10.48550/arXiv.2312.16854},
}