@schneeschmelze

German Typographers vs. German Grammar: Decomposition of Wikipedia Category Labels into Attribute-Value Pairs

. Proceedings of the Tenth ACM International Conference on Web Search and Data Mining, page 315--324. New York, NY, USA, ACM, (2017)
DOI: 10.1145/3018661.3018662

Abstract

Given an instance (Julieta Pinto), most methods for open-domain information extraction focus on acquiring knowledge in the form of either class labels (Costa Rican short story writers, Women novelists) referring to concepts to which the instance belongs; or facts (nationality: Costa Rica) connecting the instance (Julieta Pinto) to other instances or concepts (Costa Rica), where the fact and the other instance often take the form of an attribute (nationality) and a value (Costa Rica) respectively. From extraction through internal representation and storage, class labels and facts are treated as if they carved out disconnected slices within the larger space of factual knowledge. This paper argues that class labels and facts pertaining to an instance exist in symbiosis rather than as a dichotomy. A constituent (Costa Rican) within a class label (Costa Rican short story writers) of an instance may be indicative of a fact (nationality: Costa Rica) applicable to the instance and vice-versa. As an illustration of the relationship between class labels and facts, the paper introduces an open-domain method for the better understanding of the semantics of class labels in one of the larger and most widely-used repositories of knowledge, namely the categories in the Wikipedia category network. The method exploits the category network to associate constituents (Costa Rican) within names of Wikipedia categories, with attributes (nationality) that explain their role.

Description

German Typographers vs. German Grammar

Links and resources

Tags