Skip Navigation


The Computer Journal Advance Access originally published online on October 14, 2008
The Computer Journal 2009 52(8):890-901; doi:10.1093/comjnl/bxn049
This Article
Right arrow Full Text (PDF)
Right arrow All Versions of this Article:
52/8/890    most recent
bxn049v1
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by Chou, C.-H.
Right arrow Articles by Chen, Y.-H.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?

© The Author 2008. Published by Oxford University Press on behalf of The British Computer Society. All rights reserved. For Permissions, please email: journals.permissions@oxfordjournals.org

This article appears in the following The Computer Journal issue: Incorporating Systems, communications and services in smart homes and Software engineering for e-business Special Issues [View the issue table of contents]

GA-Based Keyword Selection for the Design of an Intelligent Web Document Search System

Chih-Hsun Chou*, Chang-Hsing Lee and Ya-Hui Chen

Department of Computer Science and Information Engineering, Chung Hua University, No. 707, Section 2, WuFu Road, Hsinchu, 300 Taiwan, Republic of China

* Corresponding author: chc{at}chu.edu.tw

Received 31 January 2008; revised 31 July 2008

The main steps for designing an automatic document classification system include feature extraction and classification. In this article a method to improve feature extraction is proposed. In this method, genetic algorithm was applied to determine the threshold values of four criteria for extracting the representative keywords for each class. The purpose of these four threshold values is to extract as few representative keywords as possible. This keyword extraction method was combined with two classification algorithms, vector space model and support vector machine, for examining the performance of the proposed classification system under various extracting conditions.

Key Words: web document classification • keyword extraction • genetic algorithm • vector space method • support vector machine


Add to CiteULike CiteULike   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us    What's this?




Disclaimer: Please note that abstracts for content published before 1996 were created through digital scanning and may therefore not exactly replicate the text of the original print issues. All efforts have been made to ensure accuracy, but the Publisher will not be held responsible for any remaining inaccuracies. If you require any further clarification, please contact our Customer Services Department.