Automating Invariant Filtering: Leveraging LLMs to Streamline Test Oracle Generation
S. Fischer, and C. Klammer. Balancing Software Innovation and Regulatory Compliance, page 51--71. Cham, Springer Nature Switzerland, (2025)
Abstract
Automated generation of test oracles is a critical area of research in software quality assurance. One effective technique is the detection of invariants by analyzing dynamic execution data. Though a common challenge of these approaches is the detection of false-positive invariants. This paper investigates the potential of Large Language Models (LLMs) to assist in filtering these dynamically detected invariants, aiming to reduce the manual effort involved in discarding incorrect invariants. We conducted experiments using various GPT models from OpenAI, leveraging a dataset of invariants detected from the dynamic execution of REST APIs. By employing a Zero-shot Chain-of-Thought Prompting methodology, we guided the LLMs to articulate their reasoning behind their decisions. Our findings indicate that classification performance improves with detailed instructions and strategic prompt design (the best model achieving on average \$\$80.7\backslash\%\$\$80.7\%accuracy), with some performance differences between different invariant types.
%0 Conference Paper
%1 10.1007/978-3-031-89277-6_4
%A Fischer, Stefan
%A Klammer, Claus
%B Balancing Software Innovation and Regulatory Compliance
%C Cham
%D 2025
%E Fischbach, Jannik
%E Ramler, Rudolf
%E Winkler, Dietmar
%E Bergsmann, Johannes
%I Springer Nature Switzerland
%K Automating Filtering Invariant
%P 51--71
%T Automating Invariant Filtering: Leveraging LLMs to Streamline Test Oracle Generation
%U https://link.springer.com/chapter/10.1007/978-3-031-89277-6_4
%X Automated generation of test oracles is a critical area of research in software quality assurance. One effective technique is the detection of invariants by analyzing dynamic execution data. Though a common challenge of these approaches is the detection of false-positive invariants. This paper investigates the potential of Large Language Models (LLMs) to assist in filtering these dynamically detected invariants, aiming to reduce the manual effort involved in discarding incorrect invariants. We conducted experiments using various GPT models from OpenAI, leveraging a dataset of invariants detected from the dynamic execution of REST APIs. By employing a Zero-shot Chain-of-Thought Prompting methodology, we guided the LLMs to articulate their reasoning behind their decisions. Our findings indicate that classification performance improves with detailed instructions and strategic prompt design (the best model achieving on average \$\$80.7\backslash\%\$\$80.7\%accuracy), with some performance differences between different invariant types.
%@ 978-3-031-89277-6
@inproceedings{10.1007/978-3-031-89277-6_4,
abstract = {Automated generation of test oracles is a critical area of research in software quality assurance. One effective technique is the detection of invariants by analyzing dynamic execution data. Though a common challenge of these approaches is the detection of false-positive invariants. This paper investigates the potential of Large Language Models (LLMs) to assist in filtering these dynamically detected invariants, aiming to reduce the manual effort involved in discarding incorrect invariants. We conducted experiments using various GPT models from OpenAI, leveraging a dataset of invariants detected from the dynamic execution of REST APIs. By employing a Zero-shot Chain-of-Thought Prompting methodology, we guided the LLMs to articulate their reasoning behind their decisions. Our findings indicate that classification performance improves with detailed instructions and strategic prompt design (the best model achieving on average {\$}{\$}80.7{\backslash}{\%}{\$}{\$}80.7{\%}accuracy), with some performance differences between different invariant types.},
added-at = {2025-06-19T16:27:36.000+0200},
address = {Cham},
author = {Fischer, Stefan and Klammer, Claus},
biburl = {https://www.bibsonomy.org/bibtex/21e2322619a6f76e02abc3157d1b0413c/scch},
booktitle = {Balancing Software Innovation and Regulatory Compliance},
editor = {Fischbach, Jannik and Ramler, Rudolf and Winkler, Dietmar and Bergsmann, Johannes},
interhash = {78a29046513d97c5077fb40cba15d8e1},
intrahash = {1e2322619a6f76e02abc3157d1b0413c},
isbn = {978-3-031-89277-6},
keywords = {Automating Filtering Invariant},
pages = {51--71},
publisher = {Springer Nature Switzerland},
timestamp = {2025-06-19T16:27:36.000+0200},
title = {Automating Invariant Filtering: Leveraging LLMs to Streamline Test Oracle Generation},
url = {https://link.springer.com/chapter/10.1007/978-3-031-89277-6_4},
year = 2025
}