ICDXML: enhancing ICD coding with probabilistic label trees and dynamic semantic representations

Zeqiang Wang; Yuqi Wang; Haiyang Zhang; Wei Wang; Jun Qi; Jianjun Chen; Nishanth Sastry; Jon Johnson; Suparna De

doi:10.1038/s41598-024-69214-9

ICDXML: enhancing ICD coding with probabilistic label trees and dynamic semantic representations

Zeqiang Wang, Yuqi Wang, Haiyang Zhang, Wei Wang, Jun Qi, Jianjun Chen, Nishanth Sastry, Jon Johnson, Suparna De^*

^*Corresponding author for this work

Department of Computing

Research output: Contribution to journal › Article › peer-review

1 Citation (Scopus)

Abstract

Accurately assigning standardized diagnosis and procedure codes from clinical text is crucial for healthcare applications. However, this remains challenging due to the complexity of medical language. This paper proposes a novel model that incorporates extreme multi-label classification tasks to enhance International Classification of Diseases (ICD) coding. The model utilizes deformable convolutional neural networks to fuse representations from hidden layer outputs of pre-trained language models and external medical knowledge embeddings fused using a multimodal approach to provide rich semantic encodings for each code. A probabilistic label tree is constructed based on the hierarchical structure existing in ICD labels to incorporate ontological relationships between ICD codes and enable structured output prediction. Experiments on medical code prediction on the MIMIC-III database demonstrate competitive performance, highlighting the benefits of this technique for robust clinical code assignment.

Original language	English
Article number	18319
Journal	Scientific Reports
Volume	14
Issue number	1
DOIs	https://doi.org/10.1038/s41598-024-69214-9
Publication status	Published - Dec 2024

Keywords

Extreme multi-label classification
Few-shot learning
ICD coding
Medical knowledge representation
Natural language processing

Access to Document

10.1038/s41598-024-69214-9

Cite this

@article{0e86812a7b0b4510b92d4030fe514e40,

title = "ICDXML: enhancing ICD coding with probabilistic label trees and dynamic semantic representations",

abstract = "Accurately assigning standardized diagnosis and procedure codes from clinical text is crucial for healthcare applications. However, this remains challenging due to the complexity of medical language. This paper proposes a novel model that incorporates extreme multi-label classification tasks to enhance International Classification of Diseases (ICD) coding. The model utilizes deformable convolutional neural networks to fuse representations from hidden layer outputs of pre-trained language models and external medical knowledge embeddings fused using a multimodal approach to provide rich semantic encodings for each code. A probabilistic label tree is constructed based on the hierarchical structure existing in ICD labels to incorporate ontological relationships between ICD codes and enable structured output prediction. Experiments on medical code prediction on the MIMIC-III database demonstrate competitive performance, highlighting the benefits of this technique for robust clinical code assignment.",

keywords = "Extreme multi-label classification, Few-shot learning, ICD coding, Medical knowledge representation, Natural language processing",

author = "Zeqiang Wang and Yuqi Wang and Haiyang Zhang and Wei Wang and Jun Qi and Jianjun Chen and Nishanth Sastry and Jon Johnson and Suparna De",

note = "Publisher Copyright: {\textcopyright} The Author(s) 2024.",

year = "2024",

month = dec,

doi = "10.1038/s41598-024-69214-9",

language = "English",

volume = "14",

journal = "Scientific Reports",

issn = "2045-2322",

number = "1",

}

TY - JOUR

T1 - ICDXML

T2 - enhancing ICD coding with probabilistic label trees and dynamic semantic representations

AU - Wang, Zeqiang

AU - Wang, Yuqi

AU - Zhang, Haiyang

AU - Wang, Wei

AU - Qi, Jun

AU - Chen, Jianjun

AU - Sastry, Nishanth

AU - Johnson, Jon

AU - De, Suparna

N1 - Publisher Copyright: © The Author(s) 2024.

PY - 2024/12

Y1 - 2024/12

N2 - Accurately assigning standardized diagnosis and procedure codes from clinical text is crucial for healthcare applications. However, this remains challenging due to the complexity of medical language. This paper proposes a novel model that incorporates extreme multi-label classification tasks to enhance International Classification of Diseases (ICD) coding. The model utilizes deformable convolutional neural networks to fuse representations from hidden layer outputs of pre-trained language models and external medical knowledge embeddings fused using a multimodal approach to provide rich semantic encodings for each code. A probabilistic label tree is constructed based on the hierarchical structure existing in ICD labels to incorporate ontological relationships between ICD codes and enable structured output prediction. Experiments on medical code prediction on the MIMIC-III database demonstrate competitive performance, highlighting the benefits of this technique for robust clinical code assignment.

AB - Accurately assigning standardized diagnosis and procedure codes from clinical text is crucial for healthcare applications. However, this remains challenging due to the complexity of medical language. This paper proposes a novel model that incorporates extreme multi-label classification tasks to enhance International Classification of Diseases (ICD) coding. The model utilizes deformable convolutional neural networks to fuse representations from hidden layer outputs of pre-trained language models and external medical knowledge embeddings fused using a multimodal approach to provide rich semantic encodings for each code. A probabilistic label tree is constructed based on the hierarchical structure existing in ICD labels to incorporate ontological relationships between ICD codes and enable structured output prediction. Experiments on medical code prediction on the MIMIC-III database demonstrate competitive performance, highlighting the benefits of this technique for robust clinical code assignment.

KW - Extreme multi-label classification

KW - Few-shot learning

KW - ICD coding

KW - Medical knowledge representation

KW - Natural language processing

UR - http://www.scopus.com/inward/record.url?scp=85200536681&partnerID=8YFLogxK

U2 - 10.1038/s41598-024-69214-9

DO - 10.1038/s41598-024-69214-9

M3 - Article

AN - SCOPUS:85200536681

SN - 2045-2322

VL - 14

JO - Scientific Reports

JF - Scientific Reports

IS - 1

M1 - 18319

ER -

ICDXML: enhancing ICD coding with probabilistic label trees and dynamic semantic representations

Abstract

Keywords

Access to Document

Other files and links

Fingerprint

Cite this