Parallel Document Overlap Detection in Digital Libraries

Overview

Introduction

General Copy-Detection Architecture

Parsing

Chunking Units

Chunking Units

Fingerprints

Problems

MatchDetectReveal

Visualizer

String-matching Algorithms

Suffix Trees

Example of a Suffix Tree

Linear-Time Suffix Tree Construction Algorithms

Matching Statistics

Calculating Matching Statistics Using a Suffix Tree

Example of a Modified Suffix Tree

Sequential Algorithm

Running Time of the Sequential Algorithm

Cluster Computing

Existing Clusters

Monash Parallel Parametric Modelling Engine

Clustor

STCD Approach with Clustor

STOD Approach with Clustor

Clustor Tool

Plan File for Copying Only Necessary Files

Copying Only the Necessary Files

Plan File for Copying All Files at Startup

Copying All Files When Nodes Start

Speed-up vs Number of Processors

Using the MPI Interface

WMPI and Myrinet

Parallel Architecture Using MPI

Performance Results of MPI

Summary

Future Work