VECPAR'06 - List of Papers

Text Classification from Positive and Unlabeled Documents Based on GA

Tao Peng (College of Computer Science and Technology, Jilin University)
Wanli Zuo (College of Computer Science and Technology, Jilin University)
Fengling He (College of Computer Science and Technology, Jilin University)

Abstract:

Automatic text classification is one of the most important tools in Information Retrieval. As the traditional methods for text classification cannot find the best feature set, the GA is applied to the feature selection because it can get the global optimal solution. This paper presents a novel text classifier from positive and unlabeled documents based on GA. Firstly, we identify reliable negative documents by improved 1-DNF algorithm. Secondly, we build a set of classifiers by iteratively applying SVM algorithm on training example sets. Thirdly, we discuss an approach to evaluate the weighted vote of all classifiers generated in the iteration steps to construct the final classifier based on GA instead of choosing one of the classifiers as the final classifier. GA evolving process can discover the best combination of the weights. The experimental result on the Reuter data set shows that the performance is exciting.

Keywords:

Data Processing


vecpar.fe.up.pt/2006 \| vecpar2006@fe.up.pt
List of Papers \| List of Authors Previous Paper \| Next Paper Text Classification from Positive and Unlabeled Documents Based on GA Tao Peng (College of Computer Science and Technology, Jilin University) Wanli Zuo (College of Computer Science and Technology, Jilin University) Fengling He (College of Computer Science and Technology, Jilin University) Abstract: Automatic text classification is one of the most important tools in Information Retrieval. As the traditional methods for text classification cannot find the best feature set, the GA is applied to the feature selection because it can get the global optimal solution. This paper presents a novel text classifier from positive and unlabeled documents based on GA. Firstly, we identify reliable negative documents by improved 1-DNF algorithm. Secondly, we build a set of classifiers by iteratively applying SVM algorithm on training example sets. Thirdly, we discuss an approach to evaluate the weighted vote of all classifiers generated in the iteration steps to construct the final classifier based on GA instead of choosing one of the classifiers as the final classifier. GA evolving process can discover the best combination of the weights. The experimental result on the Reuter data set shows that the performance is exciting. Keywords: Data Processing Download the fullpaper Previous Paper \| Next Paper Top

	Rio de Janeiro \| Brazil \| 2006 \| July \| 10 11 12 13