DKE-Research at SemEval-2024 Task 2: Incorporating Data Augmentation with Generative Models and Biomedical Knowledge to Enhance Inference Robustness

Yuqi Wang, Zeqiang Wang, Wei Wang, Qi Chen, Kaizhu Huang, Anh Nguyen, Suparna De

Research output: Chapter in Book or Report/Conference proceedingConference Proceedingpeer-review

1 Citation (Scopus)

Abstract

Safe and reliable natural language inference is critical for extracting insights from clinical trial reports but poses challenges due to biases in large pre-trained language models. This paper presents a novel data augmentation technique to improve model robustness for biomedical natural language inference in clinical trials. By generating synthetic examples through semantic perturbations and domain-specific vocabulary replacement and adding a new task for numerical and quantitative reasoning, we introduce greater diversity and reduce shortcut learning. Our approach, combined with multitask learning and the DeBERTa architecture, achieved significant performance gains on the NLI4CT 2024 benchmark compared to the original language models. Ablation studies validate the contribution of each augmentation method in improving robustness. Our best-performing model ranked 12th in terms of faithfulness and 8th in terms of consistency, respectively, out of the 32 participants.

Original languageEnglish
Title of host publicationSemEval 2024 - 18th International Workshop on Semantic Evaluation, Proceedings of the Workshop
EditorsAtul Kr. Ojha, A. Seza Dohruoz, Harish Tayyar Madabushi, Giovanni Da San Martino, Sara Rosenthal, Aiala Rosa
PublisherAssociation for Computational Linguistics (ACL)
Pages88-94
Number of pages7
ISBN (Electronic)9798891761070
Publication statusPublished - 2024
Event18th International Workshop on Semantic Evaluation, SemEval 2024, co-located with the 2024 Annual Conference of the North American Chapter of the Association for Computational Linguistics, NAACL 2024 - Hybrid, Mexico City, Mexico
Duration: 20 Jun 202421 Jun 2024

Publication series

NameSemEval 2024 - 18th International Workshop on Semantic Evaluation, Proceedings of the Workshop

Conference

Conference18th International Workshop on Semantic Evaluation, SemEval 2024, co-located with the 2024 Annual Conference of the North American Chapter of the Association for Computational Linguistics, NAACL 2024
Country/TerritoryMexico
CityHybrid, Mexico City
Period20/06/2421/06/24

Fingerprint

Dive into the research topics of 'DKE-Research at SemEval-2024 Task 2: Incorporating Data Augmentation with Generative Models and Biomedical Knowledge to Enhance Inference Robustness'. Together they form a unique fingerprint.

Cite this