Text categorization with diversity random forests

Chun Yang, Xu Cheng Yin*, Kaizhu Huang

*Corresponding author for this work

Research output: Chapter in Book or Report/Conference proceedingConference Proceedingpeer-review

1 Citation (Scopus)


Text categorization (TC), has many typical traits, such as large and difficult category taxonomies, noise and incremental data, etc. Random Forests, one of the most important but simple state-of-the-art ensemble methods, has been used to solve such type of subjects with good performance. most current Random Forests approaches with diversity-related issues focus on maximizing tree diversity while producing and training component trees. There are much diverse characteristics for component trees in TC trained on data of noise, huge categories and features. Consequently, given numerous component trees from the original Random Forests, we propose a novel method, Diversity Random Forests, which diversely and adaptively select and combine tree classifiers with diversity learning and sample weighting. Diversity Random Forests includes two key issues. First, by designing a matrix for the data distribution creatively, we formulate a unified optimization model for learning and selecting diverse trees, where tree weights are learned through a convex quadratic programming problem with given sample weights. Second, we propose a new self-training algorithm to iteratively run the convex optimization and automatically learn the sample weights. Extensive experiments on a variety of text categorization benchmark data sets show that the proposed approach consistently outperforms state-of-the-art methods.

Original languageEnglish
Title of host publicationNeural Information Processing - 21st International Conference, ICONIP 2014, Proceedings
EditorsChu Kiong Loo, Keem Siah Yap, Kok Wai Wong, Andrew Teoh, Kaizhu Huang
PublisherSpringer Verlag
Number of pages8
ISBN (Electronic)9783319126425
Publication statusPublished - 2014
Event21st International Conference on Neural Information Processing, ICONIP 2014 - Kuching, Malaysia
Duration: 3 Nov 20146 Nov 2014

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349


Conference21st International Conference on Neural Information Processing, ICONIP 2014


Dive into the research topics of 'Text categorization with diversity random forests'. Together they form a unique fingerprint.

Cite this