Word embeddings are a popular approach to unsupervised learning of word
relationships that are widely used in natural language processing. In this
article, we present a new set of embeddings for medical concepts learned using
an extremely large collection of multimodal medical data. Leaning on recent
theoretical insights, we demonstrate how an insurance claims database of 60
million members, a collection of 20 million clinical notes, and 1.7 million
full text biomedical journal articles can be combined to embed concepts into a
common space, resulting in the largest ever set of embeddings for 108,477
medical concepts. To evaluate our approach, we present a new benchmark
methodology based on statistical power specifically designed to test embeddings
of medical concepts. Our approach, called cui2vec, attains state-of-the-art
performance relative to previous methods in most instances. Finally, we provide
a downloadable set of pre-trained embeddings for other researchers to use, as
well as an online tool for interactive exploration of the cui2vec embeddings
Description
[1804.01486] Clinical Concept Embeddings Learned from Massive Sources of Multimodal Medical Data
%0 Generic
%1 beam2018clinical
%A Beam, Andrew L.
%A Kompa, Benjamin
%A Schmaltz, Allen
%A Fried, Inbar
%A Weber, Griffin
%A Palmer, Nathan P.
%A Shi, Xu
%A Cai, Tianxi
%A Kohane, Isaac S.
%D 2018
%K clinic data embeddings medical multimodal
%T Clinical Concept Embeddings Learned from Massive Sources of Multimodal
Medical Data
%U http://arxiv.org/abs/1804.01486
%X Word embeddings are a popular approach to unsupervised learning of word
relationships that are widely used in natural language processing. In this
article, we present a new set of embeddings for medical concepts learned using
an extremely large collection of multimodal medical data. Leaning on recent
theoretical insights, we demonstrate how an insurance claims database of 60
million members, a collection of 20 million clinical notes, and 1.7 million
full text biomedical journal articles can be combined to embed concepts into a
common space, resulting in the largest ever set of embeddings for 108,477
medical concepts. To evaluate our approach, we present a new benchmark
methodology based on statistical power specifically designed to test embeddings
of medical concepts. Our approach, called cui2vec, attains state-of-the-art
performance relative to previous methods in most instances. Finally, we provide
a downloadable set of pre-trained embeddings for other researchers to use, as
well as an online tool for interactive exploration of the cui2vec embeddings
@misc{beam2018clinical,
abstract = {Word embeddings are a popular approach to unsupervised learning of word
relationships that are widely used in natural language processing. In this
article, we present a new set of embeddings for medical concepts learned using
an extremely large collection of multimodal medical data. Leaning on recent
theoretical insights, we demonstrate how an insurance claims database of 60
million members, a collection of 20 million clinical notes, and 1.7 million
full text biomedical journal articles can be combined to embed concepts into a
common space, resulting in the largest ever set of embeddings for 108,477
medical concepts. To evaluate our approach, we present a new benchmark
methodology based on statistical power specifically designed to test embeddings
of medical concepts. Our approach, called cui2vec, attains state-of-the-art
performance relative to previous methods in most instances. Finally, we provide
a downloadable set of pre-trained embeddings for other researchers to use, as
well as an online tool for interactive exploration of the cui2vec embeddings},
added-at = {2020-03-25T11:49:40.000+0100},
author = {Beam, Andrew L. and Kompa, Benjamin and Schmaltz, Allen and Fried, Inbar and Weber, Griffin and Palmer, Nathan P. and Shi, Xu and Cai, Tianxi and Kohane, Isaac S.},
biburl = {https://www.bibsonomy.org/bibtex/20504f0d094ce620c92bc59b9a68a25f1/nosebrain},
description = {[1804.01486] Clinical Concept Embeddings Learned from Massive Sources of Multimodal Medical Data},
interhash = {302cbb4a36706f0537796a15aa045d88},
intrahash = {0504f0d094ce620c92bc59b9a68a25f1},
keywords = {clinic data embeddings medical multimodal},
note = {cite arxiv:1804.01486},
timestamp = {2020-03-25T11:49:40.000+0100},
title = {Clinical Concept Embeddings Learned from Massive Sources of Multimodal
Medical Data},
url = {http://arxiv.org/abs/1804.01486},
year = 2018
}