SevPredict: Exploring the Potential of Large Language Models in Software Maintenance

Muhammad Ali Arshad; Adnan Riaz; Rubia Fatima; Affan Yasin

doi:10.3390/ai5040132

SevPredict: Exploring the Potential of Large Language Models in Software Maintenance

Muhammad Ali Arshad, Adnan Riaz^*, Rubia Fatima, Affan Yasin^*

^*Corresponding author for this work

Research output: Contribution to journal › Article › peer-review

1 Citation (Scopus)

Abstract

The prioritization of bug reports based on severity is a crucial aspect of bug triaging, enabling a focus on more critical issues. Traditional methods for assessing bug severity range from manual inspection to the application of machine and deep learning techniques. However, manual evaluation tends to be resource-intensive and inefficient, while conventional learning models often lack contextual understanding. This study explores the effectiveness of large language models (LLMs) in predicting bug report severity. We propose a novel approach called SevPredict using GPT-2, an advanced LLM, and compare it against state-of-the-art models. The comparative analysis between the proposed approach and state-of-the-art approaches suggests that the proposed approach outperforms the state-of-the-art approaches in terms of performance evaluation metrics. SevPredict shows improvements over the best-performing state-of-the-art approach (BERT-SBR) with 1.72% higher accuracy, 2.18% higher precision, and 4.94% higher MCC. The improvements are even more substantial when compared to the approach by Ramay et al., with SevPredict demonstrating 10.66% higher accuracy, 10.39% higher precision, 3.29% higher recall, 7.19% higher F1-score, and a remarkable 41.27% higher MCC. These findings not only demonstrate the superiority of our GPT-2-based approach in predicting the severity of bug reports but also highlight its potential to significantly advance automated bug triaging and software maintenance. This research introduces a severity prediction tool named SevPredict.

Original language	English
Pages (from-to)	2739-2760
Number of pages	22
Journal	AI (Switzerland)
Volume	5
Issue number	4
DOIs	https://doi.org/10.3390/ai5040132
Publication status	Published - Dec 2024

Keywords

large language models
mining software repository
severity prediction

Access to Document

10.3390/ai5040132

Cite this

@article{1a87c5520cba4051984db5588dd86dc0,

title = "SevPredict: Exploring the Potential of Large Language Models in Software Maintenance",

abstract = "The prioritization of bug reports based on severity is a crucial aspect of bug triaging, enabling a focus on more critical issues. Traditional methods for assessing bug severity range from manual inspection to the application of machine and deep learning techniques. However, manual evaluation tends to be resource-intensive and inefficient, while conventional learning models often lack contextual understanding. This study explores the effectiveness of large language models (LLMs) in predicting bug report severity. We propose a novel approach called SevPredict using GPT-2, an advanced LLM, and compare it against state-of-the-art models. The comparative analysis between the proposed approach and state-of-the-art approaches suggests that the proposed approach outperforms the state-of-the-art approaches in terms of performance evaluation metrics. SevPredict shows improvements over the best-performing state-of-the-art approach (BERT-SBR) with 1.72% higher accuracy, 2.18% higher precision, and 4.94% higher MCC. The improvements are even more substantial when compared to the approach by Ramay et al., with SevPredict demonstrating 10.66% higher accuracy, 10.39% higher precision, 3.29% higher recall, 7.19% higher F1-score, and a remarkable 41.27% higher MCC. These findings not only demonstrate the superiority of our GPT-2-based approach in predicting the severity of bug reports but also highlight its potential to significantly advance automated bug triaging and software maintenance. This research introduces a severity prediction tool named SevPredict.",

keywords = "large language models, mining software repository, severity prediction",

author = "Arshad, {Muhammad Ali} and Adnan Riaz and Rubia Fatima and Affan Yasin",

note = "Publisher Copyright: {\textcopyright} 2024 by the authors.",

year = "2024",

month = dec,

doi = "10.3390/ai5040132",

language = "English",

volume = "5",

pages = "2739--2760",

journal = "AI (Switzerland)",

issn = "2673-2688",

number = "4",

}

TY - JOUR

T1 - SevPredict

T2 - Exploring the Potential of Large Language Models in Software Maintenance

AU - Arshad, Muhammad Ali

AU - Riaz, Adnan

AU - Fatima, Rubia

AU - Yasin, Affan

PY - 2024/12

Y1 - 2024/12

N2 - The prioritization of bug reports based on severity is a crucial aspect of bug triaging, enabling a focus on more critical issues. Traditional methods for assessing bug severity range from manual inspection to the application of machine and deep learning techniques. However, manual evaluation tends to be resource-intensive and inefficient, while conventional learning models often lack contextual understanding. This study explores the effectiveness of large language models (LLMs) in predicting bug report severity. We propose a novel approach called SevPredict using GPT-2, an advanced LLM, and compare it against state-of-the-art models. The comparative analysis between the proposed approach and state-of-the-art approaches suggests that the proposed approach outperforms the state-of-the-art approaches in terms of performance evaluation metrics. SevPredict shows improvements over the best-performing state-of-the-art approach (BERT-SBR) with 1.72% higher accuracy, 2.18% higher precision, and 4.94% higher MCC. The improvements are even more substantial when compared to the approach by Ramay et al., with SevPredict demonstrating 10.66% higher accuracy, 10.39% higher precision, 3.29% higher recall, 7.19% higher F1-score, and a remarkable 41.27% higher MCC. These findings not only demonstrate the superiority of our GPT-2-based approach in predicting the severity of bug reports but also highlight its potential to significantly advance automated bug triaging and software maintenance. This research introduces a severity prediction tool named SevPredict.

AB - The prioritization of bug reports based on severity is a crucial aspect of bug triaging, enabling a focus on more critical issues. Traditional methods for assessing bug severity range from manual inspection to the application of machine and deep learning techniques. However, manual evaluation tends to be resource-intensive and inefficient, while conventional learning models often lack contextual understanding. This study explores the effectiveness of large language models (LLMs) in predicting bug report severity. We propose a novel approach called SevPredict using GPT-2, an advanced LLM, and compare it against state-of-the-art models. The comparative analysis between the proposed approach and state-of-the-art approaches suggests that the proposed approach outperforms the state-of-the-art approaches in terms of performance evaluation metrics. SevPredict shows improvements over the best-performing state-of-the-art approach (BERT-SBR) with 1.72% higher accuracy, 2.18% higher precision, and 4.94% higher MCC. The improvements are even more substantial when compared to the approach by Ramay et al., with SevPredict demonstrating 10.66% higher accuracy, 10.39% higher precision, 3.29% higher recall, 7.19% higher F1-score, and a remarkable 41.27% higher MCC. These findings not only demonstrate the superiority of our GPT-2-based approach in predicting the severity of bug reports but also highlight its potential to significantly advance automated bug triaging and software maintenance. This research introduces a severity prediction tool named SevPredict.

KW - large language models

KW - mining software repository

KW - severity prediction

UR - http://www.scopus.com/inward/record.url?scp=85213446464&partnerID=8YFLogxK

U2 - 10.3390/ai5040132

DO - 10.3390/ai5040132

M3 - Article

AN - SCOPUS:85213446464

SN - 2673-2688

VL - 5

SP - 2739

EP - 2760

JO - AI (Switzerland)

JF - AI (Switzerland)

IS - 4

ER -

SevPredict: Exploring the Potential of Large Language Models in Software Maintenance

Abstract

Keywords

Access to Document

Other files and links

Fingerprint

Cite this