Text categorization with diversity random forests

Chun Yang, Xu Cheng Yin*, Kaizhu Huang

*Corresponding author for this work

Research output: Chapter in Book or Report/Conference proceedingConference Proceedingpeer-review

1 Citation (Scopus)

Abstract

Text categorization (TC), has many typical traits, such as large and difficult category taxonomies, noise and incremental data, etc. Random Forests, one of the most important but simple state-of-the-art ensemble methods, has been used to solve such type of subjects with good performance. most current Random Forests approaches with diversity-related issues focus on maximizing tree diversity while producing and training component trees. There are much diverse characteristics for component trees in TC trained on data of noise, huge categories and features. Consequently, given numerous component trees from the original Random Forests, we propose a novel method, Diversity Random Forests, which diversely and adaptively select and combine tree classifiers with diversity learning and sample weighting. Diversity Random Forests includes two key issues. First, by designing a matrix for the data distribution creatively, we formulate a unified optimization model for learning and selecting diverse trees, where tree weights are learned through a convex quadratic programming problem with given sample weights. Second, we propose a new self-training algorithm to iteratively run the convex optimization and automatically learn the sample weights. Extensive experiments on a variety of text categorization benchmark data sets show that the proposed approach consistently outperforms state-of-the-art methods.

Original languageEnglish
Title of host publicationNeural Information Processing - 21st International Conference, ICONIP 2014, Proceedings
EditorsChu Kiong Loo, Keem Siah Yap, Kok Wai Wong, Andrew Teoh, Kaizhu Huang
PublisherSpringer Verlag
Pages317-324
Number of pages8
ISBN (Electronic)9783319126425
DOIs
Publication statusPublished - 2014
Event21st International Conference on Neural Information Processing, ICONIP 2014 - Kuching, Malaysia
Duration: 3 Nov 20146 Nov 2014

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume8836
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

Conference21st International Conference on Neural Information Processing, ICONIP 2014
Country/TerritoryMalaysia
CityKuching
Period3/11/146/11/14

Fingerprint

Dive into the research topics of 'Text categorization with diversity random forests'. Together they form a unique fingerprint.

Cite this