<?xml version="1.0" ?>
<!-- This file was exported from BibSonomy, http://www.bibsonomy.org -->

<bibliography>

<biblioentry xreflabel="rennie2001naive" id="rennie2001naive">
   <authorgroup>
       <author><firstname>Jason</firstname><othername role="mi">D. M.</othername><surname>Rennie</surname></author> 
   </authorgroup>
<citetitle pubwork="article">Improving Multi&#45;class Text Classification with Naive Bayes</citetitle>





   <pubdate>2001</pubdate>  
   <abstract>
      <para>There are numerous text documents available in electronic form. More and more are becoming available every day. Such documents represent a massive amount of information that is easily accessible. Seeking value in this huge collection requires organization; much of the work of organizing documents can be automated through text classification. The accuracy and our understanding of such systems greatly influences their usefulness. In this paper&#44; we seek 1) to advance the understanding of commonly used text classification techniques&#44; and 2) through that understanding&#44; improve the tools that are available for text classification. We begin by clarifying the assumptions made in the derivation of Naive Bayes&#44; noting basic properties and proposing ways for its extension and improvement. Next&#44; we investigate the quality of Naive Bayes parameter estimates and their impact on classification. Our analysis leads to a theorem which gives an explanation for the improvements that can be found in multiclass classification with Naive Bayes using Error&#45;Correcting Output Codes. We use experimental evidence on two commonly&#45;used data sets to exhibit an application of the theorem. Finally&#44; we show fundamental flaws in a commonly&#45;used feature selection algorithm and develop a statistics&#45;based framework for text feature selection. Greater understanding of Naive Bayes and the properties of text allows us to make better use of it in text classification.
      </para>
   </abstract>
</biblioentry>
</bibliography>
