Techreport,

Intelligent Fusion of Structural and Citation-Based Evidence for Text Classification

B. Zhang, M. Goncalves, W. Fan, Y. Chen, E. Fox, P. Calado, and M. Cristo.
TR-04-16. Computer Science, Virginia Tech, (2004)

Full text

Abstract

This paper investigates how citation-based information and structural content (e.g., title, abstract) can be combined to improve classification of text documents into predefined categories. We evaluate different measures of similarity, five derived from the citation structure of the collection, and three measures derived from the structural content, and determine how they can be fused to improve classification effectiveness. To discover the best fusion framework, we apply Genetic Programming (GP) techniques. Our empirical experiments using documents from the ACM digital library and the ACM classification scheme show that we can discover similarity functions that work better than any evidence in isolation and whose combined performance through a simple majority voting is comparable to that of Support Vector Machine classifiers.

BibTeX key: Zhang05cTR
entry type: techreport
year: 2004
institution: Computer Science, Virginia Tech
number: TR-04-16
notes: See also Zhang05c ID Code: 693 Deposited By: Administrator, Eprints Deposited On: 09 September 2005 Site Administrator: eprints@cs.vt.edu
size: 9 pages
Document: http://eprints.cs.vt.edu/archive/00000693/01/GP5.pdf

Users

Comments and Reviewsshow / hide

Please log in to take part in the discussion (add own reviews or comments).

Cite this publication

@techreport{Zhang05cTR, abstract = {This paper investigates how citation-based information and structural content (e.g., title, abstract) can be combined to improve classification of text documents into predefined categories. We evaluate different measures of similarity, five derived from the citation structure of the collection, and three measures derived from the structural content, and determine how they can be fused to improve classification effectiveness. To discover the best fusion framework, we apply Genetic Programming (GP) techniques. Our empirical experiments using documents from the ACM digital library and the ACM classification scheme show that we can discover similarity functions that work better than any evidence in isolation and whose combined performance through a simple majority voting is comparable to that of Support Vector Machine classifiers.}, added-at = {2008-06-19T17:35:00.000+0200}, author = {Zhang, Baoping and Goncalves, Marcos Andre and Fan, Weiguo and Chen, Yuxin and Fox, Edward A. and Calado, Pavel and Cristo, Marco}, biburl = {https://www.bibsonomy.org/bibtex/2f82476bb97eaa644fd4a513fdab213d0/brazovayeye}, institution = {Computer Science, Virginia Tech}, interhash = {22310dbb1df275d83dcdd7e99af4fae6}, intrahash = {f82476bb97eaa644fd4a513fdab213d0}, keywords = {Classification, Computer Digital Information Libraries Retrieval, Science, algorithms, analysis, citation document genetic programming, similarity,}, notes = {See also \cite{Zhang05c} ID Code: 693 Deposited By: Administrator, Eprints Deposited On: 09 September 2005 Site Administrator: eprints@cs.vt.edu}, number = {TR-04-16}, size = {9 pages}, timestamp = {2008-06-19T17:55:12.000+0200}, title = {Intelligent Fusion of Structural and Citation-Based Evidence for Text Classification}, url = {http://eprints.cs.vt.edu/archive/00000693/01/GP5.pdf}, year = 2004 }

BibSonomy

Intelligent Fusion of Structural and Citation-Based Evidence for Text Classification

Abstract

Tags

Users

Comments and Reviewsshow / hide

Cite this publication

More citation styles

search on