Keywords are useful for a variety of purposes, including summarizing, indexing, labeling, categorization, clustering, and searching. The objective of the proposed system is to automatic keyword extraction. The proposed system solves this problem through many statistics and linguistic approaches in addition to the novel use of data mining. The entered document first, pre-processed to remove noisy data, word tagging, and word stemming. Second, three extracting approaches utilized in the proposed system, N-gram approach; part-of-speech approach (POS) that extracts phrases, which match a set of patterns, and NP-chunk which extract noun phrases. The proposed system uses a scoring system to give a weight for each candidate keyword depending on many features. The proposed system uses document classification as a subsystem that classifies the document in order to recognize the meaningful keywords that are not frequently used in the class. The proposed system also presents a new approach to use rules mined from the extracted keywords' database to improve the accuracy of keyword extraction, by integrating data mining with the keyword extraction system. We compared the results of our algorithm to the manual extracted keywords, and we obtained a good result reached 74% of accuracy.
4th Mosharaka International Conference on Communications, Computers and Applications (MIC-CCA 2011)
Congress
2011 Global Congress on Communications, Computers and Applications (GC-CCA 2011), 22-24 July 2011, Istanbul, Turkey
Pages
37-42
Topics
Design of Computing Algorithms Programming Languages
ISSN
2227-331X
DOI
BibTeX
@inproceedings{145CCA2011,
title={Data Mining Implementation for Keywords Extraction},
author={Rafeeq A. Al-Hashemi, and Hilal Saleh, and Ahmed Alobaidi},
booktitle={2011 Global Congress on Communications, Computers and Applications (GC-CCA 2011)},
year={2011},
pages={37-42},
doi={}},
organization={Mosharaka for Research and Studies}
}