DOIONLINE

DOIONLINE NO - IJAECS-IRAJ-DOIONLINE-5450

Publish In
International Journal of Advances in Electronics and Computer Science-IJAECS
Journal Home
Volume Issue
Issue
Volume-3,Issue-8  ( Aug, 2016 )
Paper Title
Performance Comparison of Similarity Functions For Document Retrieval System
Author Name
Su Mon Phyo, Lai Lai Win Kyi
Affilition
[email protected], [email protected]
Pages
98-102
Abstract
Nowadays, measuring the similarity of documents plays an important role in text related researches and applications such as document clustering, plagiarism detection, information retrieval, machine translation and automatic essay scoring. Many researches have been proposed to solve this problem. They can be grouped into three main approaches: String-based, Corpus-based and Knowledge-based Similarities. String based approach is further categorized as the characterbased approach and the term-based approach. Some of the existing similarity measures can’t properly decide the document pair similarity in some circumstance. So, this paper proposes a new similarity approach (called KSD: Keyword Similarity Distance) based on term-based similarity function to properly decide the similarity score in each document pair. The KSDfunction takes keyword similarity distance between each pair of documents and then computes average similarity scores for all documents. In the paper, the proposed function gives the correct related document list than the existing similarity functions. Three similarity functions such as cosine, overlap and proposed similarity are appliedfor evaluating the performance of similarity scores. The keyword extraction process and the similarity calculation are done in C#. According to the experimental results, the proposed function will outperform than other similarity function. Keywords— Similarity function, KSD, Cosine, Overlap.
  View Paper