Raphael
A. Finkel, Arkady Zaslavsky, Krisztian Monostori, and Heinz Schmidt
raphael@cs.uky.edu; University of Kentucky, Lexington, KY, USA
A.Zaslavsky@monash.edu.au; Monash University, Melbourne, Australia
Krisztian.Monostori@csse.monash.edu.au; Monash University, Melbourne, Australia
Heinz.Schmidt@csse.monash.edu.au; Monash University, Melbourne, Australia
Abstract
Easy access to the Web has led to increased potential for students cheating
on assignments by plagiarising others work. By the same token,
Web-based tools offer the potential for instructors to check submitted assignments
for signs of plagiarism. Overlap-detection tools are easy to use and accurate
in plagiarism detection, so they can be an excellent deterrent to plagiarism.
Documents can overlap for other reasons, too: Old documents are superseded,
and authors summarize previous work identically in several papers. Overlap-detection
tools can pinpoint interconnections in a corpus of documents and could be used
in search engines.
We describe a web-accessible text registry based on signature extraction. We extract a small but diagnostic signature from each registered text for permanent storage and comparison against other stored signatures. This comparison allows us to estimate the amount of overlap between pairs of documents, although the total time required is linear in the total size of the documents. We compare our algorithm with several alternatives and present both efficiency and accuracy results.
Keywords: plagiarism
document overlap culling digest