Article,

Sherlock N-overlap: Invasive Normalization and Overlap Coefficient for the Similarity Analysis Between Source Code

F. Allyson, M. Danilo, S. Jose, and B. Giovanni.
IEEE TRANSACTIONS ON COMPUTERS, 68 (5): 740-751 (2019)
DOI: 10.1109/TC.2018.2881449

Abstract

Some tools for detecting similarity, such as Sherlock, compare textual documents of any nature, but have limitations to compare source code files. The presence or absence of blank spaces between structure elements, variable names, among other actions interfere with the similarity index found. This paper evidences that the preprocessing of the source code improves Sherlock performance. The results are based on experiments conducted with 66 source code previously plagiarized, and a base formed by 2160 codes created by students of engineering courses in programming classes. In this last set, the situation of similarity was not previously known, so a method was created to calculate precision and recall, in a relative way, based on a set of reference tools, as a kind of oracle. Our approach, called Sherlock N-overlap obtained, in most of the cases tested, similarity indexes superior to other complex tools such as MOSS, JPlag and SIM.

BibTeX key: WOS:000464129300009
entry type: article
address: 10662 LOS VAQUEROS CIRCLE, PO BOX 3014, LOS ALAMITOS, CA 90720-1314 USA
year: 2019
journal: IEEE TRANSACTIONS ON COMPUTERS
number: 5
pages: 740-751
publisher: IEEE COMPUTER SOC
volume: 68
issn: 0018-9340
pubstate: published
tppubtype: article
DOI: 10.1109/TC.2018.2881449

BibSonomy

Sherlock N-overlap: Invasive Normalization and Overlap Coefficient for the Similarity Analysis Between Source Code

Abstract

Tags

Users

Comments and Reviewsshow / hide

Cite this publication

More citation styles

search on