Article,

Effective Information Extraction Framework for Heterogeneous Clinical Reports Using Online Machine Learning and Controlled Vocabularies

S. Zheng, J. Lu, N. Ghasemzadeh, S. Hayek, A. Quyyumi, and F. Wang.
JMIR Med Inform, (May 2017)
DOI: 10.2196/medinform.7235

Abstract

Extracting structured data from narrated medical reports is challenged by the complexity of heterogeneous structures and vocabularies and often requires significant manual effort. Traditional machine-based approaches lack the capability to take user feedbacks for improving the extraction algorithm in real time.Our goal was to provide a generic information extraction framework that can support diverse clinical reports and enables a dynamic interaction between a human and a machine that produces highly accurate results.A clinical information extraction system IDEAL-X has been built on top of online machine learning. It processes one document at a time, and user interactions are recorded as feedbacks to update the learning model in real time. The updated model is used to predict values for extraction in subsequent documents. Once prediction accuracy reaches a user-acceptable threshold, the remaining documents may be batch processed. A customizable controlled vocabulary may be used to support extraction.Three datasets were used for experiments based on report styles: 100 cardiac catheterization procedure reports, 100 coronary angiographic reports, and 100 integrated reports-each combines history and physical report, discharge summary, outpatient clinic notes, outpatient clinic letter, and inpatient discharge medication report. Data extraction was performed by 3 methods: online machine learning, controlled vocabularies, and a combination of these. The system delivers results with F1 scores greater than 95%.IDEAL-X adopts a unique online machine learning-based approach combined with controlled vocabularies to support data extraction for clinical reports. The system can quickly learn and improve, thus it is highly adaptable.

BibTeX key: zheng2017effective
entry type: article
year: 2017
month: may
journal: JMIR Med Inform
number: 2
volume: 5
pmid: 28487265
DOI: 10.2196/medinform.7235
url: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5442348/

Users

Comments and Reviewsshow / hide

Please log in to take part in the discussion (add own reviews or comments).

Cite this publication

%0 Journal Article %1 zheng2017effective %A Zheng, S %A Lu, J J %A Ghasemzadeh, N %A Hayek, S S %A Quyyumi, A A %A Wang, F %D 2017 %J JMIR Med Inform %K active learning proposal:bmbf2019 %N 2 %R 10.2196/medinform.7235 %T Effective Information Extraction Framework for Heterogeneous Clinical Reports Using Online Machine Learning and Controlled Vocabularies %U https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5442348/ %V 5 %X Extracting structured data from narrated medical reports is challenged by the complexity of heterogeneous structures and vocabularies and often requires significant manual effort. Traditional machine-based approaches lack the capability to take user feedbacks for improving the extraction algorithm in real time.Our goal was to provide a generic information extraction framework that can support diverse clinical reports and enables a dynamic interaction between a human and a machine that produces highly accurate results.A clinical information extraction system IDEAL-X has been built on top of online machine learning. It processes one document at a time, and user interactions are recorded as feedbacks to update the learning model in real time. The updated model is used to predict values for extraction in subsequent documents. Once prediction accuracy reaches a user-acceptable threshold, the remaining documents may be batch processed. A customizable controlled vocabulary may be used to support extraction.Three datasets were used for experiments based on report styles: 100 cardiac catheterization procedure reports, 100 coronary angiographic reports, and 100 integrated reports-each combines history and physical report, discharge summary, outpatient clinic notes, outpatient clinic letter, and inpatient discharge medication report. Data extraction was performed by 3 methods: online machine learning, controlled vocabularies, and a combination of these. The system delivers results with F1 scores greater than 95%.IDEAL-X adopts a unique online machine learning-based approach combined with controlled vocabularies to support data extraction for clinical reports. The system can quickly learn and improve, thus it is highly adaptable.

@article{zheng2017effective, abstract = {Extracting structured data from narrated medical reports is challenged by the complexity of heterogeneous structures and vocabularies and often requires significant manual effort. Traditional machine-based approaches lack the capability to take user feedbacks for improving the extraction algorithm in real time.Our goal was to provide a generic information extraction framework that can support diverse clinical reports and enables a dynamic interaction between a human and a machine that produces highly accurate results.A clinical information extraction system IDEAL-X has been built on top of online machine learning. It processes one document at a time, and user interactions are recorded as feedbacks to update the learning model in real time. The updated model is used to predict values for extraction in subsequent documents. Once prediction accuracy reaches a user-acceptable threshold, the remaining documents may be batch processed. A customizable controlled vocabulary may be used to support extraction.Three datasets were used for experiments based on report styles: 100 cardiac catheterization procedure reports, 100 coronary angiographic reports, and 100 integrated reports-each combines history and physical report, discharge summary, outpatient clinic notes, outpatient clinic letter, and inpatient discharge medication report. Data extraction was performed by 3 methods: online machine learning, controlled vocabularies, and a combination of these. The system delivers results with F1 scores greater than 95%.IDEAL-X adopts a unique online machine learning-based approach combined with controlled vocabularies to support data extraction for clinical reports. The system can quickly learn and improve, thus it is highly adaptable.}, added-at = {2019-06-24T09:46:01.000+0200}, author = {Zheng, S and Lu, J J and Ghasemzadeh, N and Hayek, S S and Quyyumi, A A and Wang, F}, biburl = {https://www.bibsonomy.org/bibtex/2b4f7acdb1e75c42cfe09661d0ae72061/nosebrain}, description = {Effective Information Extraction Framework for Heterogeneous Clinical Reports Using Online Machine Learning and Controlled Vocabularies}, doi = {10.2196/medinform.7235}, interhash = {e339bbc81095adf0a5a534e1ded0d9ab}, intrahash = {b4f7acdb1e75c42cfe09661d0ae72061}, journal = {JMIR Med Inform}, keywords = {active learning proposal:bmbf2019}, month = may, number = 2, pmid = {28487265}, timestamp = {2019-07-01T14:40:53.000+0200}, title = {Effective Information Extraction Framework for Heterogeneous Clinical Reports Using Online Machine Learning and Controlled Vocabularies}, url = {https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5442348/}, volume = 5, year = 2017 }

BibSonomy

Effective Information Extraction Framework for Heterogeneous Clinical Reports Using Online Machine Learning and Controlled Vocabularies

Abstract

Tags

Users

Comments and Reviewsshow / hide

Cite this publication

More citation styles

search on