An Empirical Evaluation of Using Large Language Models for Automated Unit Test Generation
An Empirical Evaluation of Using Large Language Models for Automated Unit Test Generation
Max Schäfer,Sarah Nadi,A. Eghbali,F. Tip
TLDR
A large-scale empirical evaluation on the effectiveness of large Language Models for automated unit test generation without requiring additional training or manual effort is presented.
摘要
Unit tests play a key role in ensuring the correctness of software. However, manually creating unit tests is a laborious task, motivating the need for automation. Large Language Models (LLMs) have recently been applied to various aspects of software development, including their suggested use for automated generation of unit tests, but while requiring additional training or few-shot learning on examples of existing tests. This paper presents a large-scale empirical evaluation on the effectiveness of LLMs for automated unit test generation without requiring additional training or manual effort. Concretely, we consider an approach where the LLM is provided with prompts that include the signature and implementation of a function under test, along with usage examples extracted from documentation. Furthermore, if a generated test fails, our approach attempts to generate a new test that fixes the problem by re-prompting the model with the failing test and error message. We implement our approach in
