The Computer Journal Advance Access originally published online on June 24, 2005
The Computer Journal 2005 48(6):651-661; doi:10.1093/comjnl/bxh119
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||
PDetect: A Clustering Approach for Detecting Plagiarism in Source Code Datasets
1 Division of Computing Systems, Department of Industrial Informatics, Technological Educational Institute of Kavala, GR-65404 Kavala, Greece
2 Department of Informatics, Aristotle University, 54124 Thessaloniki, Greece
lmous{at}teikav.edu.gr, avakali{at}csd.auth.gr
Efficient detection of plagiarism in programming assignments of students is of a great importance to the educational procedure. This paper presents a clustering oriented approach for facing the problem of source code plagiarism. The implemented software, called PDetect, accepts as input a set of program sources and extracts subsets (the clusters of plagiarism) such that each program within a particular subset has been derived from the same original. PDetect proposes the use of an appropriate measure for evaluating plagiarism detection performance and supports the idea of combining different plagiarism detection schemes. Furthermore, a cluster analysis is performed in order to provide information beneficial to the plagiarism detection process. PDetect is designed such that it may be easily adapted over any keyword-based programming language and it is quite beneficial when compared with earlier (state-of-the-art) plagiarism detection approaches.