AI Powered Software Testing The Impact of Large Language Models on Testing Methodologies
AI Powered Software Testing The Impact of Large Language Models on Testing Methodologies
KoçSistem KoçSistem
Istanbul, Türkiye Istanbul, Türkiye
[email protected] [email protected]
Abstract—Software testing is a crucial aspect of the software TABLE I. P ERFORMANCE OF LLM S IN U NIT T ESTING
development lifecycle, ensuring the delivery of high-quality, reli- Dataset Correctness Coverage LLM
5 Java projects from Defects4J 16.21% 5%-13% (line coverage) Bart
able, and secure software systems. With the advancements in Ar- 10 Jave projects 40% 89% (line coverage), 90% (branch coverage) ChatGPT
tificial Intelligence (AI) and Natural Language Processing (NLP), CodeSearchNet 41% N/A ChatGPT
HumanEval 78% 87% (line coverage), 92% (branch coverage) Codex
Large Language Models (LLMs) have emerged as powerful SF110 2% 2% (line coverage), 1% (branch coverage) Codex
tools capable of understanding and processing natural language
texts easly. This article investigates the application of AI-based
software testing, with a specific focus on the impact of LLMs
in traditional testing methodologies. Through a comprehensive II. A RTIFICIAL I NTELLIGENCE IN S OFTWARE T ESTING
review of relevant literature and SeturDigital’s 25 year testing Applications in the software testing life cycle (STLC)
experience, this article explores the potential benefits, challenges,
can be used to identify the areas in which AI approaches
and prospects of integrating LLMs into software testing.
have shown to be helpful in software testing research and
Keywords—artificial intelligence, large language model, software
testing life cycle, software testing practice. In both the middle and final stages of the software
testing lifecycle (STLC), LLMs have been utilized efficiently
[2].Various LLMs have been utilized in this context, and each
is having a different impact. For reference, Table 1 provides
I. I NTRODUCTION
a list of these LLMs along with the corresponding results.
With the advancement of algorithms and methodologies, A. Automatic Test Case Generation
artificial intelligence (AI) has proven useful in a variety of Every software system has a set of test cases, and as the
fields. The rapid development of technology has made AI tools system becomes more complicated, more test cases become
from theoretical possibilities into tangible realities. Artificial necessary, increasing the time and effort needed for effective
intelligence has several advantages in many software-related testing. Artificial intelligence (AI) technologies allow for the
areas. Software testing is the procedure and method used to analysis of large volumes of data related to use patterns and
make sure that the program is error-free and to determine applied data features during interactions to predict and develop
whether the actual results of the software match the expected a variety of dynamic test scenarios required for the highest
outcomes by the requirements and specifications [1] . level of confidence. An innovative solution that addresses
Software testing is essential to identify defects, vulnera- these issues by automating the development of test cases is
bilities, and inconsistencies in software systems before they automatic test case generation.
are deployed to end users. Traditionally, testing methodologies One of the key areas where LLMs excel is in generating
have relied on manual efforts by human testers or static rule- test cases automatically. Automating the generation of test
based programs, which can be time-consuming and resource cases could eliminate manual work, reducing test times and the
intensive. The rise of AI and LLMs has opened new possi- cost of developing and maintaining software [3]. By analyzing
bilities for automating various aspects of the testing process, software requirements and specifications, LLMs can construct
enhancing efficiency, and improving overall software quality. test cases that explore various code paths and edge cases. This
One of these ground-breaking inventions is ChatGPT which capability enhances test coverage and assists in identifying
was introduced by OpenAI in late 2022, a cutting-edge previously unnoticed bugs.
language model that highlights the enormous advancements
made in artificial intelligence. Our goal is to make use of B. Test Suit Optimization
ChatGPT’s features for software testing. We seek to improve Traditional software testing often involves the creation of
the efficiency of our software development by including this extensive test suites, leading to increased testing time and
innovative model in the testing procedure, ensuring consistent resource consumption. LLMs can optimize these test suites by
and secure performance across individual code units. removing redundant or less impactful test cases while ensuring
Authorized licensed use limited to: UNIVERSIDADE DE SAO PAULO. Downloaded on May 11,2024 at 15:02:48 UTC from IEEE Xplore. Restrictions apply.
the same level of coverage. This approach streamlines the
testing process and reduces overall testing efforts.
Organizations can dramatically improve their test suits,
making them more productive, cost-effective, and time-saving,
by utilizing the power of LLMs. This enables quality assurance
teams and software developers to concentrate on crucial areas
for development, ensuring the delivery of the best software
solutions to end customers. An exciting new era of smarter,
more efficient testing procedures is about to arise with the
incorporation of LLM-driven optimization into the software
testing workflow.
Fig. 1: Usage of ChatGPT for Testing
Authorized licensed use limited to: UNIVERSIDADE DE SAO PAULO. Downloaded on May 11,2024 at 15:02:48 UTC from IEEE Xplore. Restrictions apply.
The seamless communication of test scenarios and require-
ments is one of this study’s most important impacts. The
natural language features of ChatGPT make it possible to clar-
ify testing requirements clearly and concisely, promoting pro-
Fig. 2: Prompt for Unit Test Generation
ductive cooperation between developers and testers. Because
of the faster communication process, test suites are created
with higher precision, relevancy, and comprehensiveness. As
During the process, it was discovered that certain parts of a consequence, the software’s quality is improved, lowering
the.NET package would benefit from additional development. the possibility that the finished output would contain faults
As a result, it was decided to focus on utilizing the HTTP and mistakes.
client to perform the actual implementation because it provides Furthermore, the validation stage of our projects has been
greater flexibility for future development and improvements. completely transformed by AI-powered test creation. ChatGPT
We integrated ChatGPT API into the project with a helper enables our team to concentrate on more difficult and crucial
class. The provided code in Fig. 1 is a C# method that makes parts of software development, such as design, architecture,
an asynchronous call to the ChatGPT API in order to interact and user experience, by automating the production of unit
with the language model and obtain an answer based on tests. This increases our developers’ efficiency while also
the specified query. The only variable the method accepts as enabling us to provide higher-quality software in less time.
an argument is the input text or prompt that will be used The use of ChatGPT to perform AI-driven test generation
to communicate with the ChatGPT model. A string variable has produced noticeable advantages, removing testing bottle-
named is initialized inside the procedure to hold the eventual necks and enhancing the consistency, effectiveness, and quality
response obtained from the ChatGPT API. An API connection of software. We consider by implementing this strategy, the
is made by using the HttpClient class. The method provides field of software development will be shaped in the future,
the relevant HTTP request headers, such as the authorization opening opportunities for creative and effective solutions.
token needed to access the API.
V. C HALLENGES AND L IMITATIONS
Prompts are created to get useful and correct working
responses according to the details of the structures such as Despite the promising potential of LLMs in software testing,
libraries, frameworks used, and maintaining code standards. some challenges and limitations must be addressed. Ethical
As shown in Fig. 2, by modifying the input to our project’s considerations, data privacy, and bias in training data are
specific structures, libraries, frameworks, rules, and coding critical concerns that require thorough attention. Additionally,
standards, we set the foundation for meaningful and accurate LLMs can face difficulties in comprehending code-specific
outputs. This methodical approach improves the project’s over- nuances, hindering their accuracy in certain scenarios.
all quality while also streamlining the development process. Due to the need for access to large volumes of data for
Another important point for accurate responses is the tem- training, there is a chance that LLMs can unintentionally
perature value. the temperature value plays a key role in getting expose sensitive data or violate privacy laws. Strong data
the desired result. The temperature accepts values between anonymization and access control techniques must be used by
0 and 1. Higher temperatures (closer to 1) provide more developers and researchers to reduce this risk. Also, they might
innovative and diverse outcomes when text is generated using have any biases in the data, which would result in unfair or bi-
a language model, whereas lower temperatures (closer to 0) ased results. To address this, procedures for bias identification
produce more reliable results. We set the temperature to 0.5 for and data pretreatment should be used to guarantee fairness and
unit test production to get more precise and effective replies. objectivity in the process. On the other hand, LLMs can have
To find the ideal balance between originality and consistency difficulty understanding the nuances of a given code, which
depending on particular use cases, it is crucial to experiment may reduce their accuracy in some circumstances. When
with various temperature settings. working with complicated codebases or highly specialized
fields, where particulars are vital to the functionality of the
B. Understanding the Impacts software, this constraint becomes more obvious. To better the
The use of ChatGPT for AI-driven test generation has models’ comprehension of code semantics and context, this
proven to be a significant endeavor with major impacts on problem requires continual research and improvement.
software quality and the advancement of the project. Our main A multidisciplinary strategy is required to overcome these
goal has been to construct tests for projects with efficiency and obstacles and understand the potential advantages of LLMs in
effectiveness by using the strength of this language model. We software testing. An appropriate compromise between innova-
have been able to build unit tests quickly with the integration tion and ethical LLM usage can be achieved by collaboration
of ChatGPT into our test generation process, considerably between software developers, ethicists, and subject matter
speeding up the entire development cycle. By providing thor- experts.
ough test coverage, this acceleration not only saves critical As LLM technology continues to evolve, future research
time but also improves the reliability and adaptability of our in AI-based software testing should focus on mitigating the
projects. identified challenges. Developing LLMs specialized in code
Authorized licensed use limited to: UNIVERSIDADE DE SAO PAULO. Downloaded on May 11,2024 at 15:02:48 UTC from IEEE Xplore. Restrictions apply.
comprehension, addressing bias issues, and enhancing their
interpretability will further strengthen their applicability in
testing methodologies.
To avoid the models from reproducing or reinforcing unfair
or discriminatory outcomes, this can be accomplished by using
diverse and inclusive training data as well as bias detection and
correction approaches. Additionally, improving LLMs’ inter-
pretability is essential. It can be difficult to comprehend these
models’ decision-making processes since they are black boxes.
Researchers should concentrate on creating comprehensible
AI methods for LLMs to remedy this. It will be simpler for
developers and testers to trust and successfully comprehend
the outcome if methodologies are developed to give insights
into how LLMs arrive at particular test results.
VI. C ONCLUSION
In conclusion, recent technologies like ChatGPT and the
development of artificial intelligence (AI) have created new
opportunities in a variety of industries, including software
development and testing. ChatGPT serves as a reference to
the major developments made in this field as a result of the
quick growth of AI. As KoçDigital we used LLM for software
testing procedure.
AI-powered software testing, specifically leveraging LLMs,
holds tremendous potential in revolutionizing traditional test-
ing methodologies. The integration of LLMs in test case gen-
eration, test suite optimization, and bug localization can lead
to more efficient and reliable software development practices.
However, careful attention to ethical concerns and ongoing
research is vital to fully harness the capabilities of LLMs in
software testing and to realize their true benefits.
R EFERENCES
[1] Hussam Hourani, Ahmad Hammad, and Mohammad Lafi. The impact
of artificial intelligence on software testing. In 2019 IEEE Jordan
International Joint Conference on Electrical Engineering and Information
Technology (JEEIT), pages 565–570. IEEE, 2019.
[2] Junjie Wang, Yuchao Huang, Chunyang Chen, Zhe Liu, Song Wang,
and Qing Wang. Software testing with large language model: Survey,
landscape, and vision. arXiv preprint arXiv:2307.07221, 2023.
[3] Arjinder Singh and Sumit Sharma. Automated generation of functional
test cases and use case diagram using srs analysis. International Journal
of Computer Applications, 124(5), 2015.
[4] OpenAI. Models. [Online]. Available: https://platform.openai.com/docs/
models/gpt-3-5/.
[5] CodiumAI. ChatGPT for Automated Testing: Examples and
Best Practices. [Online]. Available: https://www.codium.ai/blog/
chatgpt-for-automated-testing-examples-and-best-practices/.
[6] Yutian Tang, Zhijie Liu, Zhichao Zhou, and Xiapu Luo. Chatgpt vs sbst:
A comparative assessment of unit test suite generation. arXiv preprint
arXiv:2307.00588, 2023.
[7] Zhiqiang Yuan, Yiling Lou, Mingwei Liu, Shiji Ding, Kaixin Wang,
Yixuan Chen, and Xin Peng. No more manual tests? evaluating and im-
proving chatgpt for unit test generation. arXiv preprint arXiv:2305.04207,
2023.
Authorized licensed use limited to: UNIVERSIDADE DE SAO PAULO. Downloaded on May 11,2024 at 15:02:48 UTC from IEEE Xplore. Restrictions apply.