Abstract
Unit tests play a key role in ensuring the correctness of software. However,
manually creating unit tests is a laborious task, motivating the need for
automation. This paper presents TestPilot, an adaptive test generation
technique that leverages Large Language Models (LLMs). TestPilot uses Codex, an
off-the-shelf LLM, to automatically generate unit tests for a given program
without requiring additional training or few-shot learning on examples of
existing tests. In our approach, Codex is provided with prompts that include
the signature and implementation of a function under test, along with usage
examples extracted from documentation. If a generated test fails, TestPilot's
adaptive component attempts to generate a new test that fixes the problem by
re-prompting the model with the failing test and error message. We created an
implementation of TestPilot for JavaScript and evaluated it on 25 npm packages
with a total of 1,684 API functions to generate tests for. Our results show
that the generated tests achieve up to 93.1% statement coverage (median 68.2%).
Moreover, on average, 58.5% of the generated tests contain at least one
assertion that exercises functionality from the package under test. Our
experiments with excluding parts of the information included in the prompts
show that all components contribute towards the generation of effective test
suites. Finally, we find that TestPilot does not generate memorized tests:
92.7% of our generated tests have $łeq$ 50% similarity with existing tests (as
measured by normalized edit distance), with none of them being exact copies.
Users
Please
log in to take part in the discussion (add own reviews or comments).