UPDF AI

Enhancing Commit Message Categorization in Open-Source Repositories Using Structured Taxonomy and Large Language Models

Muna Al-razgan,Manal Alaqil,Ruba Almuwayshir,Zamzam Alhijji

2024 · DOI: 10.54364/aaiml.2024.44171
Advances in Artificial Intelligence and Machine Learning · 0 Citations

TLDR

This study contributes to the field by providing a structured taxonomy of CMs and exploring how tools like ChatGPT-4 can be used to analyze them, indicating that the quality of CMs varies greatly, which has a clear impact on how efficiently they can be categorized.

Abstract

Version Control Systems (VCS) manage source code changes by storing modifications in a database. A key feature of VCS is the commit function, which saves the project’s current state and summarizes changes through Commit Message (CM). These messages are vital for collaboration, particularly in open-source artificial intelligence (AI) projects on platforms, where contributors work on rapidly evolving codebases. This paper presents an empirical analysis of CM within open-source AI repositories on GitHub, focusing on their content, the effectiveness of categorization by Large Language Models (LLMs), and the impact of message quality on categorization accuracy. A sample of 384 CMs from 34 repositories was manually categorized to establish a taxonomy. Python was then used for automated keyword extraction, refined with regex patterns. Also, an experiment involved assessing the performance of ChatGPT-4 in categorizing CMs, first without guidance and later using our developed taxonomy. Our findings indicate that the quality of CMs varies greatly, which has a clear impact on how efficiently they can be categorized. This study contributes to the field by providing a structured taxonomy of CMs and exploring how tools like ChatGPT-4 can be used to analyze them. The insights from this research are intended to benefit both academic studies and real-world software development, particularly by helping teams better understand and automate the handling of CM in AI projects.

Cited Papers
Citing Papers