DKE-Research at SemEval-2024 Task 2: Incorporating Data Augmentation with Generative Models and Biomedical Knowledge to Enhance Inference Robustness

Yuqi Wang; Zeqiang Wang; Wei Wang; Qi Chen; Kaizhu Huang; Anh Nguyen; Suparna De

DKE-Research at SemEval-2024 Task 2: Incorporating Data Augmentation with Generative Models and Biomedical Knowledge to Enhance Inference Robustness

Yuqi Wang, Zeqiang Wang, Wei Wang, Qi Chen, Kaizhu Huang, Anh Nguyen, Suparna De

Xi'an Jiaotong-Liverpool University

Research output: Chapter in Book or Report/Conference proceeding › Conference Proceeding › peer-review

1 Citation (Scopus)

Abstract

Safe and reliable natural language inference is critical for extracting insights from clinical trial reports but poses challenges due to biases in large pre-trained language models. This paper presents a novel data augmentation technique to improve model robustness for biomedical natural language inference in clinical trials. By generating synthetic examples through semantic perturbations and domain-specific vocabulary replacement and adding a new task for numerical and quantitative reasoning, we introduce greater diversity and reduce shortcut learning. Our approach, combined with multitask learning and the DeBERTa architecture, achieved significant performance gains on the NLI4CT 2024 benchmark compared to the original language models. Ablation studies validate the contribution of each augmentation method in improving robustness. Our best-performing model ranked 12th in terms of faithfulness and 8th in terms of consistency, respectively, out of the 32 participants.

Original language	English
Title of host publication	SemEval 2024 - 18th International Workshop on Semantic Evaluation, Proceedings of the Workshop
Editors	Atul Kr. Ojha, A. Seza Dohruoz, Harish Tayyar Madabushi, Giovanni Da San Martino, Sara Rosenthal, Aiala Rosa
Publisher	Association for Computational Linguistics (ACL)
Pages	88-94
Number of pages	7
ISBN (Electronic)	9798891761070
Publication status	Published - 2024
Event	18th International Workshop on Semantic Evaluation, SemEval 2024, co-located with the 2024 Annual Conference of the North American Chapter of the Association for Computational Linguistics, NAACL 2024 - Hybrid, Mexico City, Mexico Duration: 20 Jun 2024 → 21 Jun 2024

Publication series

Name	SemEval 2024 - 18th International Workshop on Semantic Evaluation, Proceedings of the Workshop

Conference

Conference	18th International Workshop on Semantic Evaluation, SemEval 2024, co-located with the 2024 Annual Conference of the North American Chapter of the Association for Computational Linguistics, NAACL 2024
Country/Territory	Mexico
City	Hybrid, Mexico City
Period	20/06/24 → 21/06/24

Cite this

Wang, Y., Wang, Z., Wang, W., Chen, Q., Huang, K., Nguyen, A., & De, S. (2024). DKE-Research at SemEval-2024 Task 2: Incorporating Data Augmentation with Generative Models and Biomedical Knowledge to Enhance Inference Robustness. In A. K. Ojha, A. S. Dohruoz, H. T. Madabushi, G. Da San Martino, S. Rosenthal, & A. Rosa (Eds.), SemEval 2024 - 18th International Workshop on Semantic Evaluation, Proceedings of the Workshop (pp. 88-94). (SemEval 2024 - 18th International Workshop on Semantic Evaluation, Proceedings of the Workshop). Association for Computational Linguistics (ACL).

Wang, Yuqi ; Wang, Zeqiang ; Wang, Wei et al. / DKE-Research at SemEval-2024 Task 2 : Incorporating Data Augmentation with Generative Models and Biomedical Knowledge to Enhance Inference Robustness. SemEval 2024 - 18th International Workshop on Semantic Evaluation, Proceedings of the Workshop. editor / Atul Kr. Ojha ; A. Seza Dohruoz ; Harish Tayyar Madabushi ; Giovanni Da San Martino ; Sara Rosenthal ; Aiala Rosa. Association for Computational Linguistics (ACL), 2024. pp. 88-94 (SemEval 2024 - 18th International Workshop on Semantic Evaluation, Proceedings of the Workshop).

@inproceedings{378b272042b548f49130a98d83be1db4,

title = "DKE-Research at SemEval-2024 Task 2: Incorporating Data Augmentation with Generative Models and Biomedical Knowledge to Enhance Inference Robustness",

abstract = "Safe and reliable natural language inference is critical for extracting insights from clinical trial reports but poses challenges due to biases in large pre-trained language models. This paper presents a novel data augmentation technique to improve model robustness for biomedical natural language inference in clinical trials. By generating synthetic examples through semantic perturbations and domain-specific vocabulary replacement and adding a new task for numerical and quantitative reasoning, we introduce greater diversity and reduce shortcut learning. Our approach, combined with multitask learning and the DeBERTa architecture, achieved significant performance gains on the NLI4CT 2024 benchmark compared to the original language models. Ablation studies validate the contribution of each augmentation method in improving robustness. Our best-performing model ranked 12th in terms of faithfulness and 8th in terms of consistency, respectively, out of the 32 participants.",

author = "Yuqi Wang and Zeqiang Wang and Wei Wang and Qi Chen and Kaizhu Huang and Anh Nguyen and Suparna De",

note = "Publisher Copyright: {\textcopyright} 2024 Association for Computational Linguistics.; 18th International Workshop on Semantic Evaluation, SemEval 2024, co-located with the 2024 Annual Conference of the North American Chapter of the Association for Computational Linguistics, NAACL 2024 ; Conference date: 20-06-2024 Through 21-06-2024",

year = "2024",

language = "English",

series = "SemEval 2024 - 18th International Workshop on Semantic Evaluation, Proceedings of the Workshop",

publisher = "Association for Computational Linguistics (ACL)",

pages = "88--94",

editor = "Ojha, {Atul Kr.} and Dohruoz, {A. Seza} and Madabushi, {Harish Tayyar} and {Da San Martino}, Giovanni and Sara Rosenthal and Aiala Rosa",

booktitle = "SemEval 2024 - 18th International Workshop on Semantic Evaluation, Proceedings of the Workshop",

}

Wang, Y, Wang, Z, Wang, W , Chen, Q, Huang, K, Nguyen, A & De, S 2024, DKE-Research at SemEval-2024 Task 2: Incorporating Data Augmentation with Generative Models and Biomedical Knowledge to Enhance Inference Robustness. in AK Ojha, AS Dohruoz, HT Madabushi, G Da San Martino, S Rosenthal & A Rosa (eds), SemEval 2024 - 18th International Workshop on Semantic Evaluation, Proceedings of the Workshop. SemEval 2024 - 18th International Workshop on Semantic Evaluation, Proceedings of the Workshop, Association for Computational Linguistics (ACL), pp. 88-94, 18th International Workshop on Semantic Evaluation, SemEval 2024, co-located with the 2024 Annual Conference of the North American Chapter of the Association for Computational Linguistics, NAACL 2024, Hybrid, Mexico City, Mexico, 20/06/24.

DKE-Research at SemEval-2024 Task 2: Incorporating Data Augmentation with Generative Models and Biomedical Knowledge to Enhance Inference Robustness. / Wang, Yuqi; Wang, Zeqiang; Wang, Wei et al.
SemEval 2024 - 18th International Workshop on Semantic Evaluation, Proceedings of the Workshop. ed. / Atul Kr. Ojha; A. Seza Dohruoz; Harish Tayyar Madabushi; Giovanni Da San Martino; Sara Rosenthal; Aiala Rosa. Association for Computational Linguistics (ACL), 2024. p. 88-94 (SemEval 2024 - 18th International Workshop on Semantic Evaluation, Proceedings of the Workshop).

Research output: Chapter in Book or Report/Conference proceeding › Conference Proceeding › peer-review

TY - GEN

T1 - DKE-Research at SemEval-2024 Task 2

T2 - 18th International Workshop on Semantic Evaluation, SemEval 2024, co-located with the 2024 Annual Conference of the North American Chapter of the Association for Computational Linguistics, NAACL 2024

AU - Wang, Yuqi

AU - Wang, Zeqiang

AU - Wang, Wei

AU - Chen, Qi

AU - Huang, Kaizhu

AU - Nguyen, Anh

AU - De, Suparna

PY - 2024

Y1 - 2024

N2 - Safe and reliable natural language inference is critical for extracting insights from clinical trial reports but poses challenges due to biases in large pre-trained language models. This paper presents a novel data augmentation technique to improve model robustness for biomedical natural language inference in clinical trials. By generating synthetic examples through semantic perturbations and domain-specific vocabulary replacement and adding a new task for numerical and quantitative reasoning, we introduce greater diversity and reduce shortcut learning. Our approach, combined with multitask learning and the DeBERTa architecture, achieved significant performance gains on the NLI4CT 2024 benchmark compared to the original language models. Ablation studies validate the contribution of each augmentation method in improving robustness. Our best-performing model ranked 12th in terms of faithfulness and 8th in terms of consistency, respectively, out of the 32 participants.

AB - Safe and reliable natural language inference is critical for extracting insights from clinical trial reports but poses challenges due to biases in large pre-trained language models. This paper presents a novel data augmentation technique to improve model robustness for biomedical natural language inference in clinical trials. By generating synthetic examples through semantic perturbations and domain-specific vocabulary replacement and adding a new task for numerical and quantitative reasoning, we introduce greater diversity and reduce shortcut learning. Our approach, combined with multitask learning and the DeBERTa architecture, achieved significant performance gains on the NLI4CT 2024 benchmark compared to the original language models. Ablation studies validate the contribution of each augmentation method in improving robustness. Our best-performing model ranked 12th in terms of faithfulness and 8th in terms of consistency, respectively, out of the 32 participants.

UR - http://www.scopus.com/inward/record.url?scp=85191198434&partnerID=8YFLogxK

M3 - Conference Proceeding

AN - SCOPUS:85191198434

T3 - SemEval 2024 - 18th International Workshop on Semantic Evaluation, Proceedings of the Workshop

SP - 88

EP - 94

BT - SemEval 2024 - 18th International Workshop on Semantic Evaluation, Proceedings of the Workshop

A2 - Ojha, Atul Kr.

A2 - Dohruoz, A. Seza

A2 - Madabushi, Harish Tayyar

A2 - Da San Martino, Giovanni

A2 - Rosenthal, Sara

A2 - Rosa, Aiala

PB - Association for Computational Linguistics (ACL)

Y2 - 20 June 2024 through 21 June 2024

ER -

Wang Y, Wang Z, Wang W , Chen Q, Huang K, Nguyen A et al. DKE-Research at SemEval-2024 Task 2: Incorporating Data Augmentation with Generative Models and Biomedical Knowledge to Enhance Inference Robustness. In Ojha AK, Dohruoz AS, Madabushi HT, Da San Martino G, Rosenthal S, Rosa A, editors, SemEval 2024 - 18th International Workshop on Semantic Evaluation, Proceedings of the Workshop. Association for Computational Linguistics (ACL). 2024. p. 88-94. (SemEval 2024 - 18th International Workshop on Semantic Evaluation, Proceedings of the Workshop).

DKE-Research at SemEval-2024 Task 2: Incorporating Data Augmentation with Generative Models and Biomedical Knowledge to Enhance Inference Robustness

Abstract

Publication series

Conference

Other files and links

Fingerprint

Cite this