Abstract
MOTIVATION: Duplicate publication impacts the quality of the scientific
corpus, has been difficult to detect, and studies this far have been
limited in scope and size. Using text similarity searches, we were
able to identify signatures of duplicate citations among a body of
abstracts. RESULTS: A sample of 62,213 Medline citations was examined
and a database of manually verified duplicate citations was created
to study author publication behavior. We found that 0.04\% of the
citations with no shared authors were highly similar and are thus
potential cases of plagiarism. 1.35\% with shared authors were sufficiently
similar to be considered a duplicate. Extrapolating, this would correspond
to 3500 and 117,500 duplicate citations in total, respectively. AVAILABILITY:
eTBLAST, an automated citation matching tool, and Déjà vu,
the duplicate citation database, are freely available at http://invention.swmed.edu/
and http://spore.swmed.edu/dejavu
- as
- bibliometrics,medline,medline:
- controlled
- data,medical
- data,plagiarism,semantics,vocabulary,
- headings,natural
- language
- numerical
- processing,periodicals
- statistics
- subject
- topic,periodicals
- topic:
- \&
Users
Please
log in to take part in the discussion (add own reviews or comments).