Automated Plagiarism Detection for Computer Programming Exercises Based on Patterns of Resubmission
N. Tahaei, and D. Noelle. Proceedings of the 2018 ACM Conference on International Computing Education Research, page 178--186. New York, NY, USA, ACM, (2018)
DOI: 10.1145/3230977.3231006
Abstract
Plagiarism detection for computer programming exercises is a difficult problem. A traditional strategy has been to compare the submissions from all of the students in a class, searching for similarities between submissions suggestive of copying. Automated tools exist that compare submissions in order to help with this search. Increasingly, however, instructors have allowed students to submit multiple solutions, receiving formative feedback between submissions, with feedback often generated by automated assessment systems. Allowing multiple submissions allows for a fundamentally new way to detect plagiarism. Specifically, students may struggle with an exercise until frustration leads them to submit work that is not their own. We present a method for detecting plagiarism from the sequence of submissions made by an individual student. We have explored a variety of measures of program change over submissions, and we have found a set of features that can be transformed, using logistic regression, into a score capturing the likelihood of plagiarism. We have applied this method to data from four exercises from an undergraduate programming class. We show that our automatically generated scores are strongly correlated with the assessments of plagiarism made by an expert instructor. Thus, the scores can act as a powerful tool for searching for cases of academic dishonesty.
Description
Automated Plagiarism Detection for Computer Programming Exercises Based on Patterns of Resubmission
%0 Conference Paper
%1 Tahaei:2018:APD:3230977.3231006
%A Tahaei, Narjes
%A Noelle, David C.
%B Proceedings of the 2018 ACM Conference on International Computing Education Research
%C New York, NY, USA
%D 2018
%I ACM
%K automatic-assessment plagiarism programming
%P 178--186
%R 10.1145/3230977.3231006
%T Automated Plagiarism Detection for Computer Programming Exercises Based on Patterns of Resubmission
%U http://doi.acm.org/10.1145/3230977.3231006
%X Plagiarism detection for computer programming exercises is a difficult problem. A traditional strategy has been to compare the submissions from all of the students in a class, searching for similarities between submissions suggestive of copying. Automated tools exist that compare submissions in order to help with this search. Increasingly, however, instructors have allowed students to submit multiple solutions, receiving formative feedback between submissions, with feedback often generated by automated assessment systems. Allowing multiple submissions allows for a fundamentally new way to detect plagiarism. Specifically, students may struggle with an exercise until frustration leads them to submit work that is not their own. We present a method for detecting plagiarism from the sequence of submissions made by an individual student. We have explored a variety of measures of program change over submissions, and we have found a set of features that can be transformed, using logistic regression, into a score capturing the likelihood of plagiarism. We have applied this method to data from four exercises from an undergraduate programming class. We show that our automatically generated scores are strongly correlated with the assessments of plagiarism made by an expert instructor. Thus, the scores can act as a powerful tool for searching for cases of academic dishonesty.
%@ 978-1-4503-5628-2
@inproceedings{Tahaei:2018:APD:3230977.3231006,
abstract = {Plagiarism detection for computer programming exercises is a difficult problem. A traditional strategy has been to compare the submissions from all of the students in a class, searching for similarities between submissions suggestive of copying. Automated tools exist that compare submissions in order to help with this search. Increasingly, however, instructors have allowed students to submit multiple solutions, receiving formative feedback between submissions, with feedback often generated by automated assessment systems. Allowing multiple submissions allows for a fundamentally new way to detect plagiarism. Specifically, students may struggle with an exercise until frustration leads them to submit work that is not their own. We present a method for detecting plagiarism from the sequence of submissions made by an individual student. We have explored a variety of measures of program change over submissions, and we have found a set of features that can be transformed, using logistic regression, into a score capturing the likelihood of plagiarism. We have applied this method to data from four exercises from an undergraduate programming class. We show that our automatically generated scores are strongly correlated with the assessments of plagiarism made by an expert instructor. Thus, the scores can act as a powerful tool for searching for cases of academic dishonesty.},
acmid = {3231006},
added-at = {2018-08-15T08:26:03.000+0200},
address = {New York, NY, USA},
author = {Tahaei, Narjes and Noelle, David C.},
biburl = {https://www.bibsonomy.org/bibtex/276022fa85efaccd72d8ba82abc3b9ee5/brusilovsky},
booktitle = {Proceedings of the 2018 ACM Conference on International Computing Education Research},
description = {Automated Plagiarism Detection for Computer Programming Exercises Based on Patterns of Resubmission},
doi = {10.1145/3230977.3231006},
interhash = {a822af12ca5fac4572027ec70760e102},
intrahash = {76022fa85efaccd72d8ba82abc3b9ee5},
isbn = {978-1-4503-5628-2},
keywords = {automatic-assessment plagiarism programming},
location = {Espoo, Finland},
numpages = {9},
pages = {178--186},
publisher = {ACM},
series = {ICER '18},
timestamp = {2018-08-15T08:26:03.000+0200},
title = {Automated Plagiarism Detection for Computer Programming Exercises Based on Patterns of Resubmission},
url = {http://doi.acm.org/10.1145/3230977.3231006},
year = 2018
}