DOIONLINE :: HOME

Publish In	International Journal of Advance Computational Engineering and Networking (IJACEN)-IJACEN	Journal Home Volume Issue
Issue	Volume-5,Issue-10 ( Oct, 2017 )
Paper Title	Modified N-Gram based Model for Identifying and Filtering Near-Duplicate Documents Detection
Author Name	Farheen Naaz, Farheen Siddiqui
Affilition	School of Engineering Sciences and Technology (SEST), Jamia Hamdard, Hamdard University, New Delhi. India
Pages	55-59
Abstract	During last three decades World Wide Web (WWW) has expanded exponentially. A great deal of the web is full of duplicate or near-duplicate content. Documents that are served on the web are in different formats like PDF, HTML, excel and text. Our proposed solution is created on a publicly available dataset files. The dataset consists of files which are tagged as duplicate. Our work in this paper is based on the duplicate and near duplicate document detection using n-Gram based, a low-dimensional demonstration(LSI-SVD) approach, implemented in c#.net. Keywords - Duplicate document, N-gram, SVD (Singular Value Decomposition), LSI(Latent Semantic Indexing), Cosine similarity etc.
	View Paper

Need advice?