copy delete add this publication to your clipboard
community post
history of this post
URL
DOI
BibTeX
EndNote
APA
Chicago
DIN 1505
Harvard
MSOffice XML

Horses to Zebras: Ontology-Guided Data Augmentation and Synthesis for ICD-9 Coding

M. Falis, H. Dong, A. Birch, and B. Alex. Proceedings of the 21st Workshop on Biomedical Language Processing, page 389--401. Dublin, Ireland, Association for Computational Linguistics, (May 2022)

Abstract

Medical document coding is the process of assigning labels from a structured label space (ontology -- e.g., ICD-9) to medical documents. This process is laborious, costly, and error-prone. In recent years, efforts have been made to automate this process with neural models. The label spaces are large (in the order of thousands of labels) and follow a big-head long-tail label distribution, giving rise to few-shot and zero-shot scenarios. Previous efforts tried to address these scenarios within the model, leading to improvements on rare labels, but worse results on frequent ones. We propose data augmentation and synthesis techniques in order to address these scenarios. We further introduce an analysis technique for this setting inspired by confusion matrices. This analysis technique points to the positive impact of data augmentation and synthesis, but also highlights more general issues of confusion within families of codes, and underprediction.

Description

Horses to Zebras: Ontology-Guided Data Augmentation and Synthesis for ICD-9 Coding - ACL Anthology

Links and resources

BibTeX key: falis-etal-2022-horses
entry type: inproceedings
address: Dublin, Ireland
booktitle: Proceedings of the 21st Workshop on Biomedical Language Processing
year: 2022
month: may
pages: 389--401
publisher: Association for Computational Linguistics
url: https://aclanthology.org/2022.bionlp-1.39

@hangdong's tags highlighted

Cite this publication

@inproceedings{falis-etal-2022-horses, abstract = {Medical document coding is the process of assigning labels from a structured label space (ontology {--} e.g., ICD-9) to medical documents. This process is laborious, costly, and error-prone. In recent years, efforts have been made to automate this process with neural models. The label spaces are large (in the order of thousands of labels) and follow a big-head long-tail label distribution, giving rise to few-shot and zero-shot scenarios. Previous efforts tried to address these scenarios within the model, leading to improvements on rare labels, but worse results on frequent ones. We propose data augmentation and synthesis techniques in order to address these scenarios. We further introduce an analysis technique for this setting inspired by confusion matrices. This analysis technique points to the positive impact of data augmentation and synthesis, but also highlights more general issues of confusion within families of codes, and underprediction.}, added-at = {2022-05-18T21:18:35.000+0200}, address = {Dublin, Ireland}, author = {Falis, Mat{\'u}{\v{s}} and Dong, Hang and Birch, Alexandra and Alex, Beatrice}, biburl = {https://www.bibsonomy.org/bibtex/2e07aaa826b42890aad83ed46baf05d96/hangdong}, booktitle = {Proceedings of the 21st Workshop on Biomedical Language Processing}, description = {Horses to Zebras: Ontology-Guided Data Augmentation and Synthesis for ICD-9 Coding - ACL Anthology}, interhash = {00d2e2104f11197f1f17462cb47dbf32}, intrahash = {e07aaa826b42890aad83ed46baf05d96}, keywords = {clinical_coding data_augmentation evaluation icd icd-9 multi-label-classification myown ontologies zero-shot zsl}, month = may, pages = {389--401}, publisher = {Association for Computational Linguistics}, timestamp = {2022-05-18T21:18:35.000+0200}, title = {Horses to Zebras: Ontology-Guided Data Augmentation and Synthesis for {ICD}-9 Coding}, url = {https://aclanthology.org/2022.bionlp-1.39}, year = 2022 }

BibSonomy

copy delete add this publication to your clipboard
community post
history of this post
URL
DOI
BibTeX
EndNote
APA
Chicago
DIN 1505
Harvard
MSOffice XML

Horses to Zebras: Ontology-Guided Data Augmentation and Synthesis for ICD-9 Coding

Abstract

Description

Links and resources

Tags

community

Cite this publication

More citation styles

search on

Meta data

Comments and Reviews
(0)

BibSonomy

copydeleteadd this publication to your clipboardcommunity posthistory of this postURLDOIBibTeXEndNoteAPAChicagoDIN 1505HarvardMSOffice XML Horses to Zebras: Ontology-Guided Data Augmentation and Synthesis for ICD-9 Coding

Abstract

Description

Links and resources

Tags

community

Cite this publication

More citation styles

search on

Meta data

Comments and Reviews (0)

copy delete add this publication to your clipboard
community post
history of this post
URL
DOI
BibTeX
EndNote
APA
Chicago
DIN 1505
Harvard
MSOffice XML

Horses to Zebras: Ontology-Guided Data Augmentation and Synthesis for ICD-9 Coding

Comments and Reviews
(0)