ICDXML: enhancing ICD coding with probabilistic label trees and dynamic semantic representations

Zeqiang Wang, Yuqi Wang, Haiyang Zhang, Wei Wang, Jun Qi, Jianjun Chen, Nishanth Sastry, Jon Johnson, Suparna De*

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

Abstract

Accurately assigning standardized diagnosis and procedure codes from clinical text is crucial for healthcare applications. However, this remains challenging due to the complexity of medical language. This paper proposes a novel model that incorporates extreme multi-label classification tasks to enhance International Classification of Diseases (ICD) coding. The model utilizes deformable convolutional neural networks to fuse representations from hidden layer outputs of pre-trained language models and external medical knowledge embeddings fused using a multimodal approach to provide rich semantic encodings for each code. A probabilistic label tree is constructed based on the hierarchical structure existing in ICD labels to incorporate ontological relationships between ICD codes and enable structured output prediction. Experiments on medical code prediction on the MIMIC-III database demonstrate competitive performance, highlighting the benefits of this technique for robust clinical code assignment.

Original languageEnglish
Article number18319
JournalScientific Reports
Volume14
Issue number1
DOIs
Publication statusPublished - Dec 2024

Keywords

  • Extreme multi-label classification
  • Few-shot learning
  • ICD coding
  • Medical knowledge representation
  • Natural language processing

Fingerprint

Dive into the research topics of 'ICDXML: enhancing ICD coding with probabilistic label trees and dynamic semantic representations'. Together they form a unique fingerprint.

Cite this