0% found this document useful (0 votes)
85 views6 pages

Research Paper

The document discusses the evolving landscape of Software Engineering in an AI-driven world, emphasizing the increasing integration of AI systems, particularly Large Language Models (LLMs), to enhance developer productivity. It outlines key research challenges and opportunities in areas such as requirements engineering, software design, and testing, highlighting the need for effective human-AI collaboration. The authors envision a future where AI tools assist software engineers in various tasks while acknowledging the indispensable role of human oversight in the development process.

Uploaded by

SAIF Ullah
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
85 views6 pages

Research Paper

The document discusses the evolving landscape of Software Engineering in an AI-driven world, emphasizing the increasing integration of AI systems, particularly Large Language Models (LLMs), to enhance developer productivity. It outlines key research challenges and opportunities in areas such as requirements engineering, software design, and testing, highlighting the need for effective human-AI collaboration. The authors envision a future where AI tools assist software engineers in various tasks while acknowledging the indispensable role of human oversight in the development process.

Uploaded by

SAIF Ullah
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

The Future of Software Engineering in an AI-Driven World

Valerio Terragni Partha Roop Kelly Blincoe


[email protected] [email protected] [email protected]
University of Auckland University of Auckland University of Auckland
Auckland, New Zealand Auckland, New Zealand Auckland, New Zealand
ABSTRACT While the introduction of high-level programming languages has
A paradigm shift is underway in Software Engineering, with AI played a major role in allowing developers to write concise and ex-
systems such as LLMs gaining increasing importance for improv- pressive code, a paradigm shift occurred in the early 2000s with the
ing software development productivity. This trend is anticipated widespread use of APIs (Application Programming Interfaces)
to persist. In the next five years, we will likely see an increasing and libraries. Before that, programmers had to write extensive
arXiv:2406.07737v1 [cs.SE] 11 Jun 2024

symbiotic partnership between human developers and AI. The Soft- amounts of code to perform even basic tasks. The shift towards us-
ware Engineering research community cannot afford to overlook ing APIs and libraries had a profound impact on the efficiency and
this trend; we must address the key research challenges posed by capabilities of software development [28, 63]. Programming can
the integration of AI into the software development process. In now be informally summarised as chaining the inputs and outputs
this paper, we present our vision of the future of software devel- of API calls, allowing an even higher level of abstraction.
opment in an AI-driven world and explore the key challenges that The intuitive, informative, and concise nature of variable and
our research community should address to realize this vision. API names is bringing our programs closer to resembling human
language. Additionally, the ongoing evolution of higher-level pro-
CCS CONCEPTS gramming languages unmistakably demonstrates a trend towards
• Software and its engineering → Software testing and debug- making language constructs more closely aligned with human
ging; Designing software; Software design engineering. speech [15]. Can this trend continue and eventually programming
will reach the pinnacle of abstraction: natural language? This is very
KEYWORDS unlikely. Human speech lacks the basic criteria of programming
Software Engineering, Artificial Intelligence, Machine Learning, languages (e.g., lack of ambiguity). However, this does not mean
Large Language Models, APIs, Libraries, Software Testing, Require- that software engineers could not write programs specifying their
ments Engineering intent in natural languages. Developers have been using Stack-
Overflow.com (SO), to search for solutions of programming tasks
ACM Reference Format: using natural language as queries. Indeed, SO and similar Q&A
Valerio Terragni, Partha Roop, and Kelly Blincoe. 2018. The Future of Soft-
websites for developers [48] have become crucial tools to boost
ware Engineering in an AI-Driven World. In Proceedings of International
Workshop on Software Engineering in 2030 (SE 2030). ACM, New York, NY,
developer productivity [35, 35, 39, 42, 46, 47, 51].
USA, 6 pages. https://doi.org/XXXXXXX.XXXXXXX The recent rise of Large Language Models (LLMs) [62], es-
pecially following the global launch of GPT3.5 and GPT4.0 by
1 INTRODUCTION OpenAI, have brought another revolution of programming, rapidly
In the dawn of computing (1940s), programmers wrote machine overshadowing platforms like SO [10]. While program synthesis
code, consisting of binary instructions to directly program com- from natural language queries has been a subject of research for
puter’s hardware. It was quickly understood that programming many years [18], the performance of recent LLMs has shown results
needed a higher level of abstraction from the hardware [4]. that were unthinkable just a few years ago [6, 11, 19]. Now, develop-
This allowed programmers to write code that is more readable, un- ers no longer need to search on SO for code snippets; instead, they
derstandable, and portable across different hardware. From assem- can directly ask GPT (or other LLMs), and even have conversational
bly language (a more human-readable representation of machine interactions to better understand and improve the generated code.
code) to scripting languages (e.g., Python and JavaScript), the past Recently, SO removed statistics on its daily visit counts and offi-
70 years of programming languages and practices have witnessed a cially addressed concerns about declining website traffic in a blog
continuous pursuit of a higher level of abstraction [15]. This is to post1 . The post acknowledges the decline in visits and attributes the
increase the developers’ efficiency and at the same time cope with trend to the release of GPT-4. We are witnessing a paradigm shift
the demand of increasingly complex software systems. in software development where software engineers use LLMs or
other AI systems to boost their productivity [12, 40]. We can confi-
Permission to make digital or hard copies of all or part of this work for personal or dently say that LLMs, alongside high-level programming languages,
classroom use is granted without fee provided that copies are not made or distributed
for profit or commercial advantage and that copies bear this notice and the full citation libraries, and developer Q&A websites, have become essential tools
on the first page. Copyrights for components of this work owned by others than the for modern software development [12].
author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or
republish, to post on servers or to redistribute to lists, requires prior specific permission LLMs are here to stay. Indeed, their capabilities and performance
and/or a fee. Request permissions from [email protected]. in source code generation are set to improve in the future. This is
SE 2030, November 2024, Puerto Galinàs (Brazil)
© 2018 Copyright held by the owner/author(s). Publication rights licensed to ACM.
due to the increasing availability of open-source code for training
ACM ISBN 978-1-4503-XXXX-X/18/06
https://doi.org/XXXXXXX.XXXXXXX 1 https://stackoverflow.blog/2023/08/08/insights-into-stack-overflows-traffic/
SE 2030, November 2024, Puerto Galinàs (Brazil) Valerio Terragni, Partha Roop, and Kelly Blincoe

Requirements Engineering Software Design Development and Testing

Maintenance
Artificial Intelligence (AI)
generate
and verify
Bug reports
and forums
Requirements Design Source Code - Test Cases
instruct
and improve
explain enquiry and verify

Software Engineer

Figure 1: Logical architecture of the envisioned future symbiosis of Software Engineers and AI

purposes, alongside the ongoing efforts of the AI community to It is important to mention that we believe we are still very far from
enhance LLM performance. As such, over the next five years, we completely replacing software engineers with prompt engineers.
anticipate that software engineers will continue to use LLMs (or Capable software engineers (with prompt engineering training)
similar AI systems) in code development. will remain indispensable for understanding, reviewing, improving,
elicit
Our research community must acknowledge and address the req. combining, validating, and maintaining the source code generated
opportunities and challenges that arise from the usestakeholders
of AI in soft- by AI. In the short and medium term future, AI is merely a tool to
ware development. Concerns persist regarding the quality of AI- enhance developers’ productivity. While it can automate certain
generated code [30], with notable issues regarding security and tasks, we assume the presence of humans in the loop.
privacy [58]. Yet, there are numerous opportunities presented by With our proposed framework, engineers can either directly
the versatile capabilities of LLMs, especially when fine-tuned for create or update the artifacts (i.e., requirements, design, production
specific tasks, code bases or company practices. Indeed, LLMs have and test code) or instruct the AI (e.g., through prompt engineering)
proven highly effective in various software engineering tasks be- on how to do that. We envision a bi-directional communication
yond code generation, including documentation generation [16, 33], between humans and AI, where humans can ask questions or pro-
testing [43, 59], and program repair [26, 55]. Our research commu- vide instructions, and the AI can notify engineers of any detected
nity stands at the forefront of this revolution, we need to tem- issues or opportunities for improvement. Software engineers will
pestively address the challenges of the symbiotic partnership communicate with AI through conversational interactions fa-
between human developers and AI. cilitated by the conversational capabilities of LLMs. This interface
In this paper, we present our vision of the potential future of an empowers engineers to seek clarifications and explanations about
AI-driven software engineering, alongside the key research chal- the artifacts as well as the AI system’s output.
lenges and opportunities associated with the increasing integration Another important clarification is that, for simplicity, Figure 1
of AI into the software development process. represents a single AI system. Clearly, the AI system would not be
the same for every task. It is reasonable to assume that a dedicated
2 AI-DRIVEN SOFTWARE DEVELOPMENT AI system, fine-tuned for the specific task, will be in place.
Figure 1 overviews our envisioned AI-driven software develop-
ment framework. While certain aspects of this framework may
AI System: The primary research challenge in integrating AI
appear overly optimistic about the capabilities of future AI systems,
into the software development process will be orchestrating
it presents an interesting thought process for understanding the
the various AI subsystems that focus on specific development
potential symbiotic synergy between AI and software developers.
tasks and seamlessly integrating them using a single human-AI
Moreover, it sheds light on the research challenges that our com-
interface.
munity must address to realize this vision someday. Indeed, such
a vision is not completely unrealistic. We know that current AI
systems can accomplish most of the specified tasks, albeit with In particular, the AI subsystems must effectively communicate
limited quality [16, 20, 26, 33, 43, 55, 59]. with each other and with various software analysis tools responsible
for gathering information on the software artifacts in development.
The framework touches all main phases of the Software Devel- As the number of available AI systems continues to grow, to prevent
opment Life Cycle: Requirement Engineering, Software Design, information overload, humans will interact with a single unified in-
Implementation, Testing, and Maintenance. Note that we are not terface. Similar to mediator bots [41], an orchestrator of AIs can
assuming a waterfall model, the cycles may overlap, especially in efficiently manage all interactions with the AI subsystems behind
agile development methodologies where development cycles are the scenes. We envision that the AI’s orchestrator will constantly
shorter and more flexible. monitor changes in the artifacts (after every update from engineers)
The Actors in our framework are software engineers (e.g., devel- and invoke the dedicated AI subsystem to check for consistency
opers, architects, and tester) and a generic AI system (e.g., an LLM). and integrity of the artifacts.

This paper was accepted at the "International Workshop on Software Engineering in 2030," co-located with FSE 2024. It was also invited to
the special issue of ACM TOSEM.
The Future of Software Engineering in an AI-Driven World SE 2030, November 2024, Puerto Galinàs (Brazil)

2.1 Requirements Engineering topic in the AI community [57]. More research is needed to leverage
explainability techniques in the context of software design.
Requirement Engineering: The main research challenge
will be to enable AI agents that can understand user needs.
Software Design: An important research challenge will be to
understand how software engineers can effectively integrate
Understanding stakeholder needs is a complex activity due AI into their design workflows, communicate with them, and
to, for example, ambiguities in natural language, stakeholders not interpret their suggestions. In particular, AI must provide ex-
always knowing what they truly need, and changing needs. Yet, AI, planations for their design suggestions to increase trust and
and LLMs in particular, can still assist in requirements engineering facilitate human understanding.
activities. They are capable of analyzing, organizing, and summa-
rizing large amounts of data. Thus, they can play a crucial role
in the preliminary phase of requirements elicitation. Stakeholders 2.3 Software Development and Testing
can provide any form of documentation, and LLMs can summarise We envision that software development and testing will be inter-
large documents or translate them into formal requirement specifi- twined, as automated testing should be conducted to verify the
cations. Additionally, Chatbots powered by LLMs can also aid in correctness of the components generated by AI, as well as their
the elicitation of requirements by engaging in conversations with seamless integration into the code base. Given a set of unimple-
stakeholders. They can generate questions and suggestions to help mented requirements, AI will automatically generate and test the
stakeholders articulate their needs more clearly. Moreover, they production code, after which humans and AI will collaborate to
can propose relevant examples or scenarios to facilitate discussions improve and verify it.
and clarify ambiguities. For example, AI agents could produce mock
ups of interfaces or rapid prototypes to confirm understanding of Software Development: The key research challenge will be to
user needs. Stakeholders often describe their envisioned solutions understand how effective prompt engineering can guide code
to a problem, rather than the problem itself. The AI systems will generation, particularly when aiming for seamless integration
need to ensure stakeholders proposed solutions do not limit the into the code base while matching the design and technologies.
possibilities of innovative designs. Indeed, requirements might be too high level, and it remains
AI will also check for inconsistencies, conflicts, and missing re- a challenge how to decompose high-level requirements into
quirements. Figure 1 illustrates the interaction solely between the low-level implementation details.
Software Engineer and the AI. However, the AI could also engage
in conversations with stakeholders (e.g., clients, product owners) An important opportunity arises from the potential sharing
to elicit, analyse, specify, and validate requirements. Nonetheless, of low-level implementations generated by AI within the open-
humans will remain in the loop. Software Engineers should over- source community. Low-level implementations could be gener-
see these conversations, refine and validate the requirements, and ated as stateless, and immutable APIs. The advantage is that these
intervene if issues arise. APIs undergo human and automated verification and testing. This
enables other software project to reuse them rather than attempting
We will also need define a new prompt-friendly requirement
to re-generate them from scratch. By accessing existing databases
language that can enhance collaboration between humans and
of AI-generated APIs, AI systems can explore alternatives before
AI systems in transitioning from requirement engineering tasks
resorting to generating code from scratch. This concept parallels
to development tasks. We call this language "prompt-friendly" in
the notion of "APIzation" recently explored for Stack Overflow code
the sense that it should be easily understood by LLMs so that
snippets [51, 52].
they could generate the associated source code. For example, the
language might need to unambiguously separate functional and Testing will play a crucial role, as we need to ensure the cor-
non-functional requirements to help the LLM generate code. More rectness of the LLM-generated code and its integration with the
research on fine-tuning and prompt engineering is needed to un- codebase. Test cases can, of course, be created by developers, but
derstand what are good prompts to specify requirements and at the they can also be generated automatically. The latter type of test
same time to generate the corresponding source code. cases will be crucial for verifying AI-generated code. While LLMs
can generate test cases, we envision that automated test genera-
2.2 Software Design tors (e.g., Randoop [37], EvoSuite [14], and Pynguin [32]) will
work in combination with LLMs to improve the quality and fault
Starting from the requirements, AI will work alongside the software
detection effectiveness of the generated tests. We are already wit-
engineer to automatically propose initial design suggestions. These
nessing the first attempt of this combination, yielding promising
suggestions can serve as starting points for further refinement and
results [29]. While LLMs can be somewhat effective in generat-
validation by the engineers. LLMs should be fine-tuned with best
ing test cases [43, 59], current LLMs do not guarantee compilable
practices, design patterns, and knowledge from previous similar
or runnable test cases [59]. Therefore, an integration with tradi-
projects. We believe that human input will be needed for this step.
tional test generators that compile and run test cases is neces-
In particular, the AI should explain to developers the specific sary. Additionally, the feedback from compiling and running test
trade-offs that alternative design solutions entail, aiding them in cases is known to be extremely useful in improving LLM-generated
decision-making. Explainable AI is an important an active research tests [43, 59], or automatically generate test cases in general (e.g.,

This paper was accepted at the "International Workshop on Software Engineering in 2030," co-located with FSE 2024. It was also invited to
the special issue of ACM TOSEM.
SE 2030, November 2024, Puerto Galinàs (Brazil) Valerio Terragni, Partha Roop, and Kelly Blincoe

feedback-directed approach [38]). More research is needed to better ethical considerations when new product improvements and feature
exploit the synergy and complementarity of LLMs and traditional requests can be gathered from the crowd. The AI system should
test case generators [29]. not solely focus on the most popular feature requests and issues
but also those that are less popular but might target minority and
Software Testing: The key research challenge will be to au- disability groups [13, 34]. Further, the AI cannot simply add every
tomatically generate test cases with effective oracles to verify feature users suggest, some consideration with the product strategy
AI-generated code. must be considered [27].
Additionally, software exists within an ecosystem of external
Indeed, generating effective oracles that correctly distinguish libraries. The libraries upon which the project depends may release
between correct and incorrect executions is crucial. We cannot new versions to fix vulnerability issues or bugs, thus it is important
expect humans to write oracles for (many) AI-generated test cases; to upgrade the project dependencies. However, in certain situations
we need automatically generated oracles. Unit test generators (e.g., upgrade a library might not be beneficial (e.g., if the software sys-
Randoop [37] and Evosuite [14]) generate (regression) oracles tem does not utilise any of the methods that have been updated),
based on the implemented behavior, not the intended one. They the AI has to automatically recognise the important upgrades. In
capture the implemented behavior of the program with assertions particular, most library developers follow the semantic versioning
that predicate on the values returned by method calls and fail if a scheme, where major, minor, and patch releases are specified by the
future version leads to behavioral differences. Thus, they are only release number. While for minor and patch releases, the AI should
useful in a regression testing scenario, and their effectiveness is usu- attempt to automatically update them, for major release versions,
ally evaluated in such a scenario [23, 45]. Regarding AI-generated the AI system should discern whether updating the library is neces-
code, the regression scenario is not useful as we want to expose sary for the given software project. Major releases are not backward
faults the current version of AI-generated code. compatible, and a new library version might offer different func-
Metamorphic Testing (MT) [7] could be the key to address tionalities, which could entail a non-trivial task for adapting the
this challenge. MT alleviates the oracle problem by using relations client to the new library version. More research effort is needed
among the expected outputs of related inputs as oracles [8]. Re- to help developers in making this choice while at the same time
search shows that such relations, called Metamorphic Relations (MRs), automatically detect and propose fixes for resolving any static [25]
exist in virtually any software system [44]. MT proves highly benefi- or behavioral [24] breaking changes. This future research can be
cial when integrated into automated test generation, as a single MR informed by existing studies on automated program repair [17, 31].
can be applied to all test automatically generated inputs that satisfy
the input relation. However, MT’s automation and effectiveness 3 CONCLUSIONS
depend on the availability of MRs. The automated generation or This paper presented a vision of a symbiosis partnership be-
discovery of MRs presents a challenging and largely understudied tween AI and software developers motivated and inspired by
problem [1, 8, 9, 44]. Only recently has the research community recent advances in AI. This paper also discussed some key research
begun addressing metamorphic relation generation from different challenges that need to be addressed by the software engineering
angles [2, 3, 5, 56, 60, 61]. More research is needed on MR genera- community. While this paper focuses on specific software engineer-
tion [2, 3, 56] and oracle/generation improvement [21, 22, 36, 49, 50] ing challenges, it is essential to acknowledge broader AI-related
to facilitate effective testing of AI-generated code. concerns such as security, safety, bias, and privacy. Although not
covered here, these issues are crucial but fall more within the do-
2.4 Software Maintenance main of the AI community, and hopefully will be addressed soon.
We envision an AI-powered maintenance phase that remains con- We cannot ignore the opportunities that lie ahead. Nor should
stantly active in the background. The AI monitors external infor- we disregard the concerns associated with them. Specifically, we
mation about the software product and its ecosystem to gather must exercise caution against over-reliance on AI. While the
potential issues or opportunities for improvement. next generations of software engineers should be trained in prompt
engineering and AI, this should not overshadow the necessity of
Software Maintenance: The primary research challenge will core software engineering knowledge. Human judgment remains
be to enable AI to autonomously process and utilize a vast indispensable for critically assessing AI-generated artifacts. It is
amount of external information effectively to identify poten- crucial to emphasize again that AI serves as a tool to enhance
tial issues or opportunities for improvement. The AI should developers’ productivity and cannot (in the near future) replace
achieve this while ensuring fairness in its decision-making humans. Putting too much trust on the software artifacts generated
process and adherance to strategic direction. by AI can have serious repercussions on the quality and safety of
our software systems.
Indeed, issues or maintenance opportunities are often buried in This paper serves also as a call to arms for our community.
a large amount of sources, such as bug reports, discussions on We need multi-disciplinary collaborations across our community
developer forums, and feedback from app stores [53, 54]. The AI to address the key challenges and achieve the envisioned symbiotic
must be capable of extracting relevant insights, identifying potential partnership between human developers and AI. While our vision is
issues or opportunities for improvement, and proposing appropriate ambitious, we believe that a five-year time frame is reasonable for
fixes or changes to the software artifacts. In particular, there are realizing it.

This paper was accepted at the "International Workshop on Software Engineering in 2030," co-located with FSE 2024. It was also invited to
the special issue of ACM TOSEM.
The Future of Software Engineering in an AI-Driven World SE 2030, November 2024, Puerto Galinàs (Brazil)

ACKNOWLEDGMENTS [21] Gunel Jahangirova, David Clark, Mark Harman, and Paolo Tonella. 2016. Test
oracle assessment and improvement. In Proceedings of the 25th International
This work was supported by the Marsden Fund Council from Gov- Symposium on Software Testing and Analysis, ISSTA 2016, Saarbrücken, Germany,
ernment funding, administered by the Royal Society Te Apārangi, July 18-20, 2016. ACM, 247–258. https://doi.org/10.1145/2931037.2931062
[22] Gunel Jahangirova, David Clark, Mark Harman, and Paolo Tonella. 2019. An
New Zealand. Empirical Validation of Oracle Improvement. IEEE Transactions on Software
Engineering (2019). https://doi.org/10.1109/TSE.2019.2934409
[23] Gunel Jahangirova and Valerio Terragni. 2023. SBFT Tool Competition 2023 - Java
REFERENCES Test Case Generation Track. 61–64. https://doi.org/10.1109/sbft59156.2023.00025
[1] John Ahlgren, Maria Eugenia Berezin, Kinga Bojarczuk, Elena Dulskyte, Inna [24] Dhanushka Jayasuriya, Valerio Terragni, Jens Dietrich, and Kelly Blincoe. 2024.
Dvortsova, Johann George, Natalija Gucevska, Mark Harman, Maria Lomeli, Erik Understanding the Impact of APIs Behavioral Breaking Changes on Client Ap-
Meijer, et al. 2021. Testing Web Enabled Simulation at Scale Using Metamorphic plications. Proceedings of the ACM on Software Engineering (PACMSE) (2024), In
Testing. In Proceedings of the 43rd International Conference on Software Engineering. press. Issue FSE 2024.
https://doi.org/10.1109/ICSE-SEIP52600.2021.00023 [25] Dhanushka Jayasuriya, Valerio Terragni, Jens Dietrich, Samuel Ou, and Kelly
[2] Jon Ayerdi, Valerio Terragni, Aitor Arrieta, Paolo Tonella, Goiuria Sagardui, and Blincoe. 2023. Understanding Breaking Changes in the Wild. In Proceedings of
Maite Arratibel. 2021. Generating Metamorphic Relations for Cyber-Physical the 32nd ACM SIGSOFT International Symposium on Software Testing and Analysis.
Systems with Genetic Programming: An Industrial Case Study. In Proceedings 1433–1444.
of the 29th ACM Joint Meeting on European Software Engineering Conference and [26] Matthew Jin, Syed Shahriar, Michele Tufano, Xin Shi, Shuai Lu, Neel Sundaresan,
Symposium on the Foundations of Software Engineering. pp. 1264–1274. and Alexey Svyatkovskiy. 2023. Inferfix: End-to-end program repair with llms.
[3] Jon Ayerdi, Valerio Terragni, Gunel Jahangirova, Aitor Arrieta, and Paolo Tonella. In Proceedings of the 31st ACM Joint European Software Engineering Conference
2024. GenMorph: Automatically Generating Metamorphic Relations via Genetic and Symposium on the Foundations of Software Engineering. 1646–1656.
Programming. IEEE Transactions on Software Engineering (2024), 1–12. https: [27] Eric Knauss, Daniela Damian, Alessia Knauss, and Arber Borici. 2014. Openness
//doi.org/10.1109/TSE.2024.3407840 and requirements: Opportunities and tradeoffs in software ecosystems. In 2014
[4] G Octo Barnett and Robert A Greenes. 1970. High-level programming languages. IEEE 22nd International Requirements Engineering Conference (RE). IEEE, 213–222.
Computers and Biomedical Research 3, 5 (1970), 488–494. [28] Maxime Lamothe, Yann-Gaël Guéhéneuc, and Weiyi Shang. 2021. A systematic
[5] Arianna Blasi, Alessandra Gorla, Michael D. Ernst, Mauro Pezzè, and Antonio review of API evolution literature. ACM Computing Surveys (CSUR) 54, 8 (2021),
Carzaniga. 2021. MeMo: Automatically identifying metamorphic relations in 1–36.
Javadoc comments for test automation. J. Syst. Softw. 181 (2021), 111041. [29] Caroline Lemieux, Jeevana Priya Inala, Shuvendu K Lahiri, and Siddhartha Sen.
[6] Mark Chen, Jerry Tworek, Heewoo Jun, Qiming Yuan, Henrique Ponde de Oliveira 2023. Codamosa: Escaping coverage plateaus in test generation with pre-trained
Pinto, Jared Kaplan, Harri Edwards, Yuri Burda, Nicholas Joseph, Greg Brockman, large language models. In 2023 IEEE/ACM 45th International Conference on Soft-
et al. 2021. Evaluating large language models trained on code. arXiv preprint ware Engineering (ICSE). IEEE, 919–931.
arXiv:2107.03374 (2021). [30] Jiawei Liu, Chunqiu Steven Xia, Yuyao Wang, and Lingming Zhang. 2024. Is your
[7] T. Y. Chen, S. C. Cheung, and S. M. Yiu. 1998. Metamorphic Testing: A New code generated by chatgpt really correct? rigorous evaluation of large language
Approach for Generating Next Test Cases. Technical Report. Technical Report models for code generation. Advances in Neural Information Processing Systems
HKUST-CS98-01, Department of Computer Science, The Hong Kong University 36 (2024).
of Science and Technology. [31] Kui Liu, Li Li, Anil Koyuncu, Dongsun Kim, Zhe Liu, Jacques Klein, and
[8] Tsong Yueh Chen, Fei-Ching Kuo, Huai Liu, Pak-Lok Poon, Dave Towey, T. H. Tegawendé F Bissyandé. 2021. A critical review on the evaluation of automated
Tse, and Zhi Quan Zhou. 2018. Metamorphic Testing: A Review of Challenges program repair systems. Journal of Systems and Software 171 (2021), 110817.
and Opportunities. Comput. Surveys 51, 1, Article 4 (Jan. 2018), 27 pages. https: [32] Stephan Lukasczyk and Gordon Fraser. 2022. Pynguin: Automated unit test gen-
//doi.org/10.1145/3143561 eration for python. In Proceedings of the ACM/IEEE 44th International Conference
[9] Tsong Yueh Chen and TH Tse. 2021. New visions on metamorphic testing after a on Software Engineering: Companion Proceedings. 168–172.
quarter of a century of inception. In Proceedings of the 29th ACM Joint Meeting [33] Qinyu Luo, Yining Ye, Shihao Liang, Zhong Zhang, Yujia Qin, Yaxi Lu, Yesai Wu,
on European Software Engineering Conference and Symposium on the Foundations Xin Cong, Yankai Lin, Yingli Zhang, et al. 2024. RepoAgent: An LLM-Powered
of Software Engineering. 1487–1490. Open-Source Framework for Repository-level Code Documentation Generation.
[10] Leuson Da Silva, Jordan Samhi, and Foutse Khomh. 2024. ChatGPT vs LLaMA: arXiv preprint arXiv:2402.16667 (2024).
Impact, Reliability, and Challenges in Stack Overflow Discussions. arXiv preprint [34] Saurabh Malgaonkar, Sherlock A Licorish, and Bastin Tony Roy Savarimuthu.
arXiv:2402.08801 (2024). 2022. Prioritizing user concerns in app reviews–A study of requests for new
[11] Xueying Du, Mingwei Liu, Kaixin Wang, Hanlin Wang, Junwei Liu, Yixuan features, enhancements and bug fixes. Information and Software Technology 144
Chen, Jiayi Feng, Chaofeng Sha, Xin Peng, and Yiling Lou. 2024. Evaluating (2022), 106798.
Large Language Models in Class-Level Code Generation. In 2024 IEEE/ACM 46th [35] Ke Mao, Licia Capra, Mark Harman, and Yue Jia. 2015. A Survey of the Use of
International Conference on Software Engineering (ICSE). IEEE Computer Society, Crowdsourcing in Software Engineering. RN 15 (2015), 01.
865–865. [36] Facundo Molina, Pablo Ponzio, Nazareno Aguirre, and Marcelo Frias. 2021.
[12] Christof Ebert and Panos Louridas. 2023. Generative AI for software practitioners. EvoSpex: An Evolutionary Algorithm for Learning Postconditions. In 2021
IEEE Software 40, 4 (2023), 30–38. IEEE/ACM 43rd International Conference on Software Engineering (ICSE). 1223–
[13] Marcelo Medeiros Eler, Leandro Orlandin, and Alberto Dumont Alves Oliveira. 1235. https://doi.org/10.1109/ICSE43902.2021.00112
2019. Do Android app users care about accessibility? an analysis of user reviews [37] Carlos Pacheco, Shuvendu K Lahiri, Michael D Ernst, and Thomas Ball. 2007.
on the Google play store. In Proceedings of the 18th Brazilian symposium on human Feedback-directed random test generation. In 29th International Conference on
factors in computing systems. 1–11. Software Engineering (ICSE’07). IEEE, 75–84.
[14] Gordon Fraser and Andrea Arcuri. 2011. EvoSuite: Automatic Test Suite Gen- [38] Carlos Pacheco, Shuvendu K Lahiri, Michael D Ernst, and Thomas Ball. 2007.
eration for Object-Oriented Software. In Proceedings of the 19th ACM SIGSOFT Feedback-directed random test generation. In 29th International Conference on
symposium and the 13th European conference on Foundations of software engineer- Software Engineering (ICSE’07). IEEE, 75–84.
ing. 416–419. [39] Kavita Philip, Medha Umarji, Megha Agarwala, Susan Elliott Sim, Rosalva
[15] Maurizio Gabbrielli and Simone Martini. 2023. Programming languages: principles Gallardo-Valencia, Cristina V Lopes, and Sukanya Ratanotayanon. 2012. Soft-
and paradigms. Springer Nature. ware Reuse Through Methodical Component Reuse and Amethodical Snippet
[16] Mingyang Geng, Shangwen Wang, Dezun Dong, Haotian Wang, Ge Li, Zhi Remixing. In CSCW. 1361–1370.
Jin, Xiaoguang Mao, and Xiangke Liao. 2024. Large language models are few- [40] Asha Rajbhoj, Akanksha Somase, Piyush Kulkarni, and Vinay Kulkarni. 2024.
shot summarizers: Multi-intent comment generation via in-context learning. In Accelerating Software Development Using Generative AI: ChatGPT Case Study.
Proceedings of the 46th IEEE/ACM International Conference on Software Engineering. In Proceedings of the 17th Innovations in Software Engineering Conference. 1–11.
1–13. [41] Eric Ribeiro, Ronan Nascimento, Igor Steinmacher, Laerte Xavier, Marco Gerosa,
[17] Claire Le Goues, Michael Pradel, and Abhik Roychoudhury. 2019. Automated Hugo de Paula, and Mairieli Wessel. 2022. Together or Apart? Investigating
program repair. Commun. ACM 62, 12 (2019), 56–65. a mediator bot to aggregate bot’s comments on pull requests. In 2022 IEEE In-
[18] Sumit Gulwani, Oleksandr Polozov, Rishabh Singh, et al. 2017. Program synthesis. ternational Conference on Software Maintenance and Evolution (ICSME). IEEE,
Foundations and Trends® in Programming Languages 4, 1-2 (2017), 1–119. 434–438.
[19] Wenpin Hou and Zhicheng Ji. 2024. A systematic evaluation of large language [42] Caitlin Sadowski, Kathryn T Stolee, and Sebastian Elbaum. 2015. How Developers
models for generating programming code. arXiv preprint arXiv:2403.00894 (2024). Search for Code: A Case Study. In FSE. 191–201.
[20] Yuan Huang, Yinan Chen, Xiangping Chen, Junqi Chen, Rui Peng, Zhicao Tang, [43] Max Schäfer, Sarah Nadi, Aryaz Eghbali, and Frank Tip. 2023. Adaptive test
Jinbo Huang, Furen Xu, and Zibin Zheng. 2024. Generative Software Engineering. generation using a large language model. arXiv e-prints (2023), arXiv–2302.
arXiv preprint arXiv:2403.02583 (2024).

This paper was accepted at the "International Workshop on Software Engineering in 2030," co-located with FSE 2024. It was also invited to
the special issue of ACM TOSEM.
SE 2030, November 2024, Puerto Galinàs (Brazil) Valerio Terragni, Partha Roop, and Kelly Blincoe

[44] S. Segura, G. Fraser, A. Sanchez, and A. Ruiz-Cortés. 2016. A Survey on Meta- [54] Simon Van Oordt and Emitza Guzman. 2021. On the role of user feedback in
morphic Testing. IEEE Transactions on Software Engineering 42, 9 (Sept 2016), software evolution: a practitioners’ perspective. In 2021 IEEE 29th International
805–824. https://doi.org/10.1109/TSE.2016.2532875 Requirements Engineering Conference (RE). IEEE, 221–232.
[45] Sina Shamshiri, René Just, José Miguel Rojas, Gordon Fraser, Phil McMinn, and [55] Chunqiu Steven Xia, Yuxiang Wei, and Lingming Zhang. 2023. Automated
Andrea Arcuri. 2015. Do Automatically Generated Unit Tests Find Real Faults? program repair in the era of large pre-trained language models. In 2023 IEEE/ACM
An Empirical Study of Effectiveness and Challenges. In Proceedings of the 30th 45th International Conference on Software Engineering (ICSE). IEEE, 1482–1494.
IEEE/ACM International Conference on Automated Software Engineering (Lincoln, [56] Congying Xu, Valerio Terragni, Hengcheng Zhu, Jiarong Wu, and Shing-Chi
Nebraska) (ASE ’15). IEEE Press, 201–211. https://doi.org/10.1109/ASE.2015.86 Cheung. 2024. MR-Scout: Automated Synthesis of Metamorphic Relations from
[46] Susan Elliott Sim, Medha Umarji, Sukanya Ratanotayanon, and Cristina V Lopes. Existing Test Cases. ACM Trans. Softw. Eng. Methodol. (apr 2024). https://doi.
2011. How Well Do Search Engines Support Code Retrieval on the Web? TOSEM org/10.1145/3656340
21, 1 (2011), 4. [57] Feiyu Xu, Hans Uszkoreit, Yangzhou Du, Wei Fan, Dongyan Zhao, and Jun Zhu.
[47] Kathryn T. Stolee, Sebastian Elbaum, and Daniel Dobos. 2014. Solving the Search 2019. Explainable AI: A brief survey on history, research areas, approaches
for Source Code. ACM TOSEM 23, 3 (2014), 26. and challenges. In Natural Language Processing and Chinese Computing: 8th CCF
[48] Margaret-Anne Storey, Leif Singer, Brendan Cleary, Fernando Figueira Filho, and International Conference, NLPCC 2019, Dunhuang, China, October 9–14, 2019,
Alexey Zagalsky. 2014. The (R)Evolution of Social Media in Software Engineering. Proceedings, Part II 8. Springer, 563–574.
In FOSE. 100–116. [58] Yifan Yao, Jinhao Duan, Kaidi Xu, Yuanfang Cai, Zhibo Sun, and Yue Zhang. 2024.
[49] Valerio Terragni, Gunel Jahangirova, Paolo Tonella, and Mauro Pezzè. 2020. Evo- A survey on large language model (llm) security and privacy: The good, the bad,
lutionary Improvement of Assertion Oracles. In Proceedings of the 28th ACM Joint and the ugly. High-Confidence Computing (2024), 100211.
European Software Engineering Conference and Symposium on the Foundations of [59] Zhiqiang Yuan, Yiling Lou, Mingwei Liu, Shiji Ding, Kaixin Wang, Yixuan Chen,
Software Engineering. 1178–1189. and Xin Peng. 2023. No more manual tests? evaluating and improving chatgpt
[50] Valerio Terragni, Gunel Jahangirova, Paolo Tonella, and Mauro Pezzè. 2021. for unit test generation. arXiv preprint arXiv:2305.04207 (2023).
GAssert: A Fully Automated Tool to Improve Assertion Oracles. In Proceedings of [60] Bo Zhang, Hongyu Zhang, Junjie Chen, Dan Hao, and Pablo Moscato. 2019.
the 43nd IEEE/ACM International Conference on Software Engineering Companion Automatic Discovery and Cleansing of Numerical Metamorphic Relations. In
(ICSE 2021). 85–88. 2019 IEEE International Conference on Software Maintenance and Evolution (ICSME).
[51] Valerio Terragni, Yepang Liu, and Shing-Chi Cheung. 2016. CSNIPPEX: Auto- 235–245. https://doi.org/10.1109/ICSME.2019.00035
mated Synthesis of Compilable Code Snippets from Q&A Sites. In Proceedings of [61] J. Zhang, J. Chen, D. Hao, Y. Xiong, B. Xie, L. Zhang, and H. Mei. 2014. Search-
the 25th International Symposium on Software Testing and Analysis. 118–129. based Inference of Polynomial Metamorphic Relations. In Proceedings of the 29th
[52] Valerio Terragni and Pasquale Salza. 2021. APIzation: Generating Reusable ACM/IEEE International Conference on Automated Software Engineering (Vasteras,
APIs from StackOverflow Code Snippets. In Proceedings of the 36th IEEE/ACM Sweden) (ASE ’14). ACM, New York, NY, USA, 701–712. https://doi.org/10.1145/
International Conference on Automated Software Engineering. 542–554. 2642937.2642994
[53] James Tizard, Peter Devine, Hechen Wang, and Kelly Blincoe. 2022. A Software [62] Wayne Xin Zhao, Kun Zhou, Junyi Li, Tianyi Tang, Xiaolei Wang, Yupeng Hou,
Requirements Ecosystem: Linking Forum, Issue Tracker, and FAQs for Require- Yingqian Min, Beichen Zhang, Junjie Zhang, Zican Dong, et al. 2023. A survey
ments Management. IEEE Transactions on Software Engineering 49, 4 (2022), of large language models. arXiv preprint arXiv:2303.18223 (2023).
2381–2393. https://doi.org/10.1109/TSE.2022.3219458 [63] Hao Zhong and Hong Mei. 2017. An empirical study on API usages. IEEE
Transactions on Software Engineering 45, 4 (2017), 319–334.

This paper was accepted at the "International Workshop on Software Engineering in 2030," co-located with FSE 2024. It was also invited to
the special issue of ACM TOSEM.

You might also like