Using the MatchDetectReveal System for Comparative Analysis of Texts

           Krisztián Monostori, Arkady Zaslavsky                                 Alejandro Bia

                  School of Computer Science and                        Miguel de Cervantes Digital Library

                           Software Engineering

            Monash University, Melbourne, Australia                University of Alicante, Alicante, Spain

                900 Dandenong Rd, Caulfiled East                                      Ap. Correus 99

                        VIC 3145, AUSTRLIA                                      E-03080 Alacant, SPAIN

 

          {Krisztian.Monostori, Arkady.Zaslavsky}                              abia@dlsi.ua.es

                         @csse.monash.edu.au

 

 

Abstract

 

In this paper we are introducing the MatchDetectReveal system, which is capable of identifying the similarity between documents. Different applications of the system are discussed including cross-referencing multiple editions of literary works, plagiarism detection, organizing collections of documents and comparative analysis of texts. The system uses suffix trees and suffix vectors for comparing documents. These data structures are very fast and powerful, which allows fast comparison of documents. The front-end of the system is fully Web-based, thus users only need to use a Web browser to access the system. The results are also presented as HTML files utilising the hyperlink capabilities of HTML documents.

 

Keywords  document databases, document management, digital libraries.