Supporting High-Level to Low-Level Requirements Coverage Reviewing with Large Language Models

Preda, Anamaria-Roberta; Mayr-Dorn, Christoph; Mashkoor, Atif; Egyed, Alexander

by Anamaria-Roberta Preda, Christoph Mayr-Dorn, Atif Mashkoor, Alexander Egyed

Abstract:

Refining high-level requirements into low-level ones is a common task, especially in safety-critical systems engineering. The objective is to describe every important aspect of the high-level requirement in a low-level requirement, ensuring a complete and correct implementation of the system’s features. To this end, standards and regulations for safety-critical systems require reviewing the coverage of high-level requirements by all its low-level requirements to ensure no missing aspects.The challenge of supporting automatic reviews for requirements coverage originates from the distinct levels of abstraction between high-level and low-level requirements, their reliance on natural language, and the often different vocabulary used. The rise of Large Language Models (LLMs), trained on extensive text corpora and capable of contextualizing both high-level and low-level requirements, opens new avenues for addressing this challenge.This paper presents an initial study to explore the performance of LLMs in assessing requirements coverage. We employed GPT-3.5 and GPT-4 to analyze requirements from five publicly accessible data sets, determining their ability to detect if low-level requirements sufficiently address the corresponding high-level requirement. Our findings reveal that GPT-3.5, utilizing a zero-shot prompting strategy augmented with the prompt of explaining, correctly identifies complete coverage in four out of five evaluation data sets. Additionally, it exhibits an impressive 99.7% recall rate in accurately identifying instances where coverage is incomplete due to removing a single low-level requirement across our entire set of evaluation data.CCS CONCEPTS• Software and its engineering → Software creation and management; Designing software; Requirements analysis.

View PDF

Reference:

Anamaria-Roberta Preda, Christoph Mayr-Dorn, Atif Mashkoor, Alexander Egyed, "Supporting High-Level to Low-Level Requirements Coverage Reviewing with Large Language Models", In: 21st IEEE/ACM International Conference on Mining Software Repositories, MSR 2024, Lisbon, Portugal, April 15-16, 2024, IEEE, pp. 242-253, 2024.

Bibtex Entry:

@InProceedings{Preda2024a,
  author    = {Anamaria-Roberta Preda and Christoph Mayr-Dorn and Atif Mashkoor and Alexander Egyed},
  booktitle = {21st {IEEE/ACM} International Conference on Mining Software Repositories, {MSR} 2024, Lisbon, Portugal, April 15-16, 2024},
  title     = {Supporting High-Level to Low-Level Requirements Coverage Reviewing with Large Language Models},
  year      = {2024},
  pages     = {242--253},
  publisher = {{IEEE}},
  abstract  = {Refining high-level requirements into low-level ones is a common task, especially in safety-critical systems engineering. The objective is to describe every important aspect of the high-level requirement in a low-level requirement, ensuring a complete and correct implementation of the system’s features. To this end, standards and regulations for safety-critical systems require reviewing the coverage of high-level requirements by all its low-level requirements to ensure no missing aspects.The challenge of supporting automatic reviews for requirements coverage originates from the distinct levels of abstraction between high-level and low-level requirements, their reliance on natural language, and the often different vocabulary used. The rise of Large Language Models (LLMs), trained on extensive text corpora and capable of contextualizing both high-level and low-level requirements, opens new avenues for addressing this challenge.This paper presents an initial study to explore the performance of LLMs in assessing requirements coverage. We employed GPT-3.5 and GPT-4 to analyze requirements from five publicly accessible data sets, determining their ability to detect if low-level requirements sufficiently address the corresponding high-level requirement. Our findings reveal that GPT-3.5, utilizing a zero-shot prompting strategy augmented with the prompt of explaining, correctly identifies complete coverage in four out of five evaluation data sets. Additionally, it exhibits an impressive 99.7% recall rate in accurately identifying instances where coverage is incomplete due to removing a single low-level requirement across our entire set of evaluation data.CCS CONCEPTS• Software and its engineering → Software creation and management; Designing software; Requirements analysis.},
  bibsource = {dblp computer science bibliography, https://dblp.org},
  biburl    = {https://dblp.org/rec/conf/msr/PredaMME24.bib},
  timestamp = {Wed, 26 Jun 2024 21:58:45 +0200},
  url       = {https://ieeexplore.ieee.org/document/10555592},
}