Using the MatchDetectReveal System for
Comparative Analysis of Texts
Abstract
In this paper we are introducing
the MatchDetectReveal system, which is capable of
identifying the similarity between documents. Different applications of the
system are discussed including cross-referencing multiple editions of literary
works, plagiarism detection, organizing collections of documents and
comparative analysis of texts. The system uses suffix trees and suffix vectors
for comparing documents. These data structures are very fast and powerful,
which allows fast comparison of documents. The front-end of the system is fully
Web-based, thus users only need to use a Web browser to access the system. The
results are also presented as HTML files utilising the hyperlink capabilities
of HTML documents.
Keywords document databases, document management, digital libraries.