BibliographyType,ISBN,Identifier,Author,Title,Journal,Volume,Number,Month,Pages,Year,Address,Note,URL,Booktitle,Chapter,Edition,Series,Editor,Publisher,ReportType,Howpublished,Institution,Organizations,School,Annote,Custom1,Custom2,Custom3,Custom4,Custom5
7,"","barjoseph2003","Bar-Joseph, Z.; Gerber, G. K.; Lee, T. I.; Rinaldi, N. J.; Yoo, J. Y.; Robert, F.; Gordon, D. B.; Fraenkel, E.; Jaakkola, T. S.; Young, R. A. & Gifford, D. K.","Computational discovery of gene modules and regulatory networks.","Nat Biotechnol",21,11,"November","1337--1342",2003,"","","","","","","","","","","","","","","","","","biclustering classification dataset gene microarray network plasticity thesis ","",""
7,"","bekkerman2004ace","Bekkerman, R.; McCallum, A. & Huang, G.","Automatic Categorization of Email into Folders: Benchmark Experiments on Enron and SRI Corpora","Center for Intelligent Information Retrieval, Technical Report IR",418,,"","",2004,"","","","","","","","","","","","","","","","Office workers everywhere are drowning in email—not only spam, but also large quantities of legitimate email to be read and organized for browsing. Although there have been extensive investigations of automatic document categorization, email gives rise to a number of unique challenges, and there has been relatively little study of classifying email into folders. This paper presents an extensive benchmark study of email foldering using two large corpora of real-world email messages and foldering schemes: one from former Enron employees, another from participants in an SRI research pro ject. We discuss the challenges that arise from differences between email foldering and traditional document classification. We show experimental results from an array of automated classiﬁcation methods and evaluation methodologies, including a new evaluation method of foldering results based on the email timeline, and including enhancements to the exponential gradient method Winnow, providing top-tier accuracy with a fraction the training time of alternative methods. We also establish that classiﬁcation accuracy in many cases is relatively low, confirming the challenges of email data, and pointing toward email foldering as an important area for further research.","","algorithms automatic bayes benchmark categorization classification email enron folders information ir paper read:2008 retrieval sri svm winnow ","",""
6,"","ByWC07","Byde, Andrew; Wan, Hui & Cayzer, Steve","Personalized Tag Recommendations via Tagging and Content-based Similarity Metrics","",,,"March","",2007,"","","http://www.icwsm.org/papers/paper47.html","Proceedings of the International Conference on Weblogs and Social Media","","","","","","","","","","","","This short paper describes a novel technique for generating personalized tag recommendations for users of social book- marking sites such as del.icio.us. Existing techniques recom- mend tags on the basis of their popularity among the group of all users; on the basis of recent use; or on the basis of simple heuristics to extract keywords from the url being tagged. Our method is designed to complement these approaches, and is based on recommending tags from urls that are similar to the one in question, according to two distinct similarity metrics, whose principal utility covers complementary cases.","","bookmarking classification content kde projekt recommender seminar tagging tagging_convergence tagging_proposal ws07 ","",""
7,"","paper:cohen:2004","Cohen, Ira; Cozman, Fabio G.; Sebe, Nicu; Cirelo, Marcelo C. & Huang, Thomas S.","Semisupervised learning of classifiers: theory, algorithms, and their application to human-computer interaction","",26,,"","1553- 1566",2004,"","","http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=1343843","Pattern Analysis and Machine Intelligence, IEEE Transactions on","","","","","","","","","","","","Automatic classification is one of the basic tasks required in any pattern recognition and human computer interaction application. In this paper, we discuss training probabilistic classifiers with labeled and unlabeled data. We provide a new analysis that shows under what conditions unlabeled data can be used in learning to improve classification performance. We also show that, if the conditions are violated, using unlabeled data can be detrimental to classification performance. We discuss the implications of this analysis to a specific type of probabilistic classifiers, Bayesian networks, and propose a new structure learning algorithm that can utilize unlabeled data to improve classification. Finally, we show how the resulting algorithms are successfully employed in two applications related to human-computer interaction and pattern recognition: facial expression recognition and face detection.","","algorithm classification machine-learning reading-group semisupervised ","",""
10,"","Grobelnik98","Grobelnik, Marko & Mladeni\'c, Dunja","Efficient text categorization","",,,"","",1998,"","","http://citeseer.ist.psu.edu/grobelnik98efficient.html","","","","","","","","","","","","","We present an approach to text categorization using machine learning techniques. The approach is developed and tested on large text hierarchy named Yahoo that is available on the Web. We handle the large number of features and training examples by taking into account hierarchical structure of examples and using feature subset selection for large text data. The large number of categories is handled separately for each testing example by pruning unpromising categories. In this way, the number of...","","TextMining classification text ","",""
6,"1-59593-180-5","IfrimTW-ICML2005","Ifrim, Georgiana; Theobald, Martin & Weikum, Gerhard","Learning Word-to-Concept Mappings for Automatic Text Classification","",,,"","18--26",2005,"Bonn, Germany","","http://www.mpi-inf.mpg.de/~ifrim/publications/icml-lws05.pdf","Proceedings of the 22nd International Conference on Machine Learning - Learning in Web Search (LWS 2005)","","","","Raedt, Luc De & Wrobel, Stefan","","","","","","","","","","classification concept model tc text topic wordnet ","",""
7,"","orengo1997","Orengo, CA; Michie, AD; Jones, S; Jones, DT; Swindells, MB & Thornton, JM","CATH - a hierarchic classification of protein domain structures","Structure",5,8,"August","1093--1108",1997,"","","http://www.sciencedirect.com/science/article/B6VSR-4CP0VB1-3/1/5c587435799d19f9d1a3d04f8810f644","","","","","","","","","","","","","Background: Protein evolution gives rise to families of structurally related proteins, within which sequence identities can be extremely low. As a result, structure-based classifications can be effective at identifying unanticipated relationships in known structures and in optimal cases function can also be assigned. The ever increasing number of known protein structures is too large to classify all proteins manually, therefore, automatic methods are needed for fast evaluation of protein structures. Results: We present a semi-automatic procedure for deriving a novel hierarchical classification of protein domain structures (CATH). The four main levels of our classification are protein class (C), architecture (A), topology (T) and homologous superfamily (H). Class is the simplest level, and it essentially describes the secondary structure composition of each domain. In contrast, architecture summarises the shape revealed by the orientations of the secondary structure units, such as barrels and sandwiches. At the topology level, sequential connectivity is considered, such that members of the same architecture might have quite different topologies. When structures belonging to the same T-level have suitably high similarities combined with similar functions, the proteins are assumed to be evolutionarily related and put into the same homologous superfamily. Conclusions: Analysis of the structural families generated by CATH reveals the prominent features of protein structure space. We find that nearly a third of the homologous superfamilies (H-levels) belong to ten major T-levels, which we call superfolds, and furthermore that nearly two-thirds of these H-levels cluster into nine simple architectures. A database of well-characterised protein structure families, such as CATH, will facilitate the assignment of structure-function/ evolution relationships to both known and newly determined protein structures.","","classification evolution families fold protein structure ","",""
7,"","soergel1999roo","Soergel, D.","The rise of ontologies or the reinvention of classification","Journal of the American Society for Information Science",50,12,"","1119--1120",1999,"","","","","","","","","","","","","","","","","","classification knowledge ontology organization ","",""
7,"","vanderstraeten2008","Vanderstraeten, J & Matthyssens, P","Country classification and the cultural dimension: a review and evaluation","INTERNATIONAL MARKETING REVIEW",25,2,"","230-251",2008,"","","","","","","","","","","","","","","","","","classification culture geography nation ","",""
7,"","DeAlwis2007","","Unsupervised classification of saturated areas using a time series of remotely sensed images","Hydrology and Earth System Sciences",4,,"","1663-1696",2007,"","","http://www.hydrol-earth-syst-sci-discuss.net/4/1663/2007/","","","","","","","","","","","","","The spatial distribution of saturated areas is an important consideration in numerous applications, such as water resource planning or sighting of management practices. However, in humid well vegetated climates where runoff is produced by saturation excess processes on hydrologically active areas (HAA) the delineation of these areas can be difficult and time consuming. Much of the non-point source pollution in these watersheds originates from these HAAs. Thus, a technique that can simply and reliably predict these areas would be a powerful tool for scientists and watershed managers tasked with implementing practices to improve water quality. Remotely sensed data is a source of spatial information and could be used to identify HAAs, should a proper technique be developed. The objective of this study is to develop a methodology to determine the spatial variability of saturated areas using a temporal sequence of remotely sensed images. The Normalized Difference Water Index (NDWI) was derived from medium resolution LANDSAT 7 ETM+ imagery collected over seven months in the Town Brook watershed in the Catskill Mountains of New York State and used to characterize the areas that were susceptible to saturation. We found that within a single landcover type, saturated areas were characterized by the soil surface water content when the vegetation was dormant and leaf water content of vegetation during the growing season. The resulting HAA map agreed well with both observed and spatially distributed computer simulated saturated areas. This methodology appears promising for delineating saturated areas in the landscape.
","","classification drought moisture reflectance remotesensing satellite vegetation ","",""
