Leveraging LLMs for Legacy Code Modernization: Challenges and Opportunities for LLM-Generated Documentation

Diggs, Colin; Doyle, Michael; Madan, Amit; Scott, Siggy; Escamilla, Emily; Zimmer, Jacob; Nekoo, Naveed; Ursino, Paul; Bartholf, Michael; Robin, Zachary; Patel, Anand; Glasz, Chris; Macke, William; Kirk, Paul; Phillips, Jasper; Sridharan, Arun; Wendt, Doug; Rosen, Scott; Naik, Nitin; Brunelle, Justin F.; Thaker, Samruddhi

Computer Science > Machine Learning

arXiv:2411.14971 (cs)

[Submitted on 22 Nov 2024]

Title:Leveraging LLMs for Legacy Code Modernization: Challenges and Opportunities for LLM-Generated Documentation

Abstract:Legacy software systems, written in outdated languages like MUMPS and mainframe assembly, pose challenges in efficiency, maintenance, staffing, and security. While LLMs offer promise for modernizing these systems, their ability to understand legacy languages is largely unknown. This paper investigates the utilization of LLMs to generate documentation for legacy code using two datasets: an electronic health records (EHR) system in MUMPS and open-source applications in IBM mainframe Assembly Language Code (ALC). We propose a prompting strategy for generating line-wise code comments and a rubric to evaluate their completeness, readability, usefulness, and hallucination. Our study assesses the correlation between human evaluations and automated metrics, such as code complexity and reference-based metrics. We find that LLM-generated comments for MUMPS and ALC are generally hallucination-free, complete, readable, and useful compared to ground-truth comments, though ALC poses challenges. However, no automated metrics strongly correlate with comment quality to predict or measure LLM performance. Our findings highlight the limitations of current automated measures and the need for better evaluation metrics for LLM-generated documentation in legacy systems.

Comments:	Abbreviated version submitted to LLM4Code 2025 (a workshop co-located with ICSE 2025), 13 pages, 3 figures
Subjects:	Machine Learning (cs.LG); Software Engineering (cs.SE)
Cite as:	arXiv:2411.14971 [cs.LG]
	(or arXiv:2411.14971v1 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.2411.14971

Submission history

From: Emily Escamilla [view email]
[v1] Fri, 22 Nov 2024 14:27:27 UTC (3,596 KB)

Computer Science > Machine Learning

Title:Leveraging LLMs for Legacy Code Modernization: Challenges and Opportunities for LLM-Generated Documentation

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:Leveraging LLMs for Legacy Code Modernization: Challenges and Opportunities for LLM-Generated Documentation

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators