Abstract
Some tools for detecting similarity, such as Sherlock, compare textual
documents of any nature, but have limitations to compare source code
files. The presence or absence of blank spaces between structure
elements, variable names, among other actions interfere with the
similarity index found. This paper evidences that the preprocessing of
the source code improves Sherlock performance. The results are based on
experiments conducted with 66 source code previously plagiarized, and a
base formed by 2160 codes created by students of engineering courses in
programming classes. In this last set, the situation of similarity was
not previously known, so a method was created to calculate precision and
recall, in a relative way, based on a set of reference tools, as a kind
of oracle. Our approach, called Sherlock N-overlap obtained, in most of
the cases tested, similarity indexes superior to other complex tools
such as MOSS, JPlag and SIM.
Users
Please
log in to take part in the discussion (add own reviews or comments).