VECPAR'06 - Seventh International Meeting on High Performance Computing for Computational Science |
Text Classification from Positive and Unlabeled Documents Based on GA
Tao Peng (College of Computer Science and Technology, Jilin University)
Wanli Zuo (College of Computer Science and Technology, Jilin University)
Fengling He (College of Computer Science and Technology, Jilin University)
Automatic text classification is one of the most important tools in Information Retrieval. As the traditional methods for text classification cannot find the best feature set, the GA is applied to the feature selection because it can get the global optimal solution. This paper presents a novel text classifier from positive and unlabeled documents based on GA. Firstly, we identify reliable negative documents by improved 1-DNF algorithm. Secondly, we build a set of classifiers by iteratively applying SVM algorithm on training example sets. Thirdly, we discuss an approach to evaluate the weighted vote of all classifiers generated in the iteration steps to construct the final classifier based on GA instead of choosing one of the classifiers as the final classifier. GA evolving process can discover the best combination of the weights. The experimental result on the Reuter data set shows that the performance is exciting.
Data Processing
Logos Universidade Federal do Rio de Janeiro - Coordenação dos Programas de Pós-graduação de Engenharia Instituto Nacional de Matemática Pura e Aplicada Rio de Janeiro | Brazil | 2006 | July | 10 11 12 13