TY - JOUR
T1 - SevPredict
T2 - Exploring the Potential of Large Language Models in Software Maintenance
AU - Arshad, Muhammad Ali
AU - Riaz, Adnan
AU - Fatima, Rubia
AU - Yasin, Affan
N1 - Publisher Copyright:
© 2024 by the authors.
PY - 2024/12
Y1 - 2024/12
N2 - The prioritization of bug reports based on severity is a crucial aspect of bug triaging, enabling a focus on more critical issues. Traditional methods for assessing bug severity range from manual inspection to the application of machine and deep learning techniques. However, manual evaluation tends to be resource-intensive and inefficient, while conventional learning models often lack contextual understanding. This study explores the effectiveness of large language models (LLMs) in predicting bug report severity. We propose a novel approach called SevPredict using GPT-2, an advanced LLM, and compare it against state-of-the-art models. The comparative analysis between the proposed approach and state-of-the-art approaches suggests that the proposed approach outperforms the state-of-the-art approaches in terms of performance evaluation metrics. SevPredict shows improvements over the best-performing state-of-the-art approach (BERT-SBR) with 1.72% higher accuracy, 2.18% higher precision, and 4.94% higher MCC. The improvements are even more substantial when compared to the approach by Ramay et al., with SevPredict demonstrating 10.66% higher accuracy, 10.39% higher precision, 3.29% higher recall, 7.19% higher F1-score, and a remarkable 41.27% higher MCC. These findings not only demonstrate the superiority of our GPT-2-based approach in predicting the severity of bug reports but also highlight its potential to significantly advance automated bug triaging and software maintenance. This research introduces a severity prediction tool named SevPredict.
AB - The prioritization of bug reports based on severity is a crucial aspect of bug triaging, enabling a focus on more critical issues. Traditional methods for assessing bug severity range from manual inspection to the application of machine and deep learning techniques. However, manual evaluation tends to be resource-intensive and inefficient, while conventional learning models often lack contextual understanding. This study explores the effectiveness of large language models (LLMs) in predicting bug report severity. We propose a novel approach called SevPredict using GPT-2, an advanced LLM, and compare it against state-of-the-art models. The comparative analysis between the proposed approach and state-of-the-art approaches suggests that the proposed approach outperforms the state-of-the-art approaches in terms of performance evaluation metrics. SevPredict shows improvements over the best-performing state-of-the-art approach (BERT-SBR) with 1.72% higher accuracy, 2.18% higher precision, and 4.94% higher MCC. The improvements are even more substantial when compared to the approach by Ramay et al., with SevPredict demonstrating 10.66% higher accuracy, 10.39% higher precision, 3.29% higher recall, 7.19% higher F1-score, and a remarkable 41.27% higher MCC. These findings not only demonstrate the superiority of our GPT-2-based approach in predicting the severity of bug reports but also highlight its potential to significantly advance automated bug triaging and software maintenance. This research introduces a severity prediction tool named SevPredict.
KW - large language models
KW - mining software repository
KW - severity prediction
UR - http://www.scopus.com/inward/record.url?scp=85213446464&partnerID=8YFLogxK
U2 - 10.3390/ai5040132
DO - 10.3390/ai5040132
M3 - Article
AN - SCOPUS:85213446464
SN - 2673-2688
VL - 5
SP - 2739
EP - 2760
JO - AI (Switzerland)
JF - AI (Switzerland)
IS - 4
ER -