TY - GEN
T1 - DKE-Research at SemEval-2024 Task 2
T2 - 18th International Workshop on Semantic Evaluation, SemEval 2024, co-located with the 2024 Annual Conference of the North American Chapter of the Association for Computational Linguistics, NAACL 2024
AU - Wang, Yuqi
AU - Wang, Zeqiang
AU - Wang, Wei
AU - Chen, Qi
AU - Huang, Kaizhu
AU - Nguyen, Anh
AU - De, Suparna
N1 - Publisher Copyright:
© 2024 Association for Computational Linguistics.
PY - 2024
Y1 - 2024
N2 - Safe and reliable natural language inference is critical for extracting insights from clinical trial reports but poses challenges due to biases in large pre-trained language models. This paper presents a novel data augmentation technique to improve model robustness for biomedical natural language inference in clinical trials. By generating synthetic examples through semantic perturbations and domain-specific vocabulary replacement and adding a new task for numerical and quantitative reasoning, we introduce greater diversity and reduce shortcut learning. Our approach, combined with multitask learning and the DeBERTa architecture, achieved significant performance gains on the NLI4CT 2024 benchmark compared to the original language models. Ablation studies validate the contribution of each augmentation method in improving robustness. Our best-performing model ranked 12th in terms of faithfulness and 8th in terms of consistency, respectively, out of the 32 participants.
AB - Safe and reliable natural language inference is critical for extracting insights from clinical trial reports but poses challenges due to biases in large pre-trained language models. This paper presents a novel data augmentation technique to improve model robustness for biomedical natural language inference in clinical trials. By generating synthetic examples through semantic perturbations and domain-specific vocabulary replacement and adding a new task for numerical and quantitative reasoning, we introduce greater diversity and reduce shortcut learning. Our approach, combined with multitask learning and the DeBERTa architecture, achieved significant performance gains on the NLI4CT 2024 benchmark compared to the original language models. Ablation studies validate the contribution of each augmentation method in improving robustness. Our best-performing model ranked 12th in terms of faithfulness and 8th in terms of consistency, respectively, out of the 32 participants.
UR - http://www.scopus.com/inward/record.url?scp=85191198434&partnerID=8YFLogxK
M3 - Conference Proceeding
AN - SCOPUS:85191198434
T3 - SemEval 2024 - 18th International Workshop on Semantic Evaluation, Proceedings of the Workshop
SP - 88
EP - 94
BT - SemEval 2024 - 18th International Workshop on Semantic Evaluation, Proceedings of the Workshop
A2 - Ojha, Atul Kr.
A2 - Dohruoz, A. Seza
A2 - Madabushi, Harish Tayyar
A2 - Da San Martino, Giovanni
A2 - Rosenthal, Sara
A2 - Rosa, Aiala
PB - Association for Computational Linguistics (ACL)
Y2 - 20 June 2024 through 21 June 2024
ER -