@dbenz

Toward Domain Specific Thesaurus Construction: Divide-and-Conquer Method

, , , , , , and . Relation, 10 (1.129): 7396 (2009)

Abstract

This paper describes new thesaurus construction method in which class-based, small size thesauruses are constructed and merged as a whole based on domain classification system. This method has advantages in that 1) taxonomy construction complexity is reduced, 2) each class-based thesaurus can be reused in other domain thesaurus, and 3) term distribution per classes in target domain is easily identified. The method is composed of three steps: term extraction step, term classification step, and taxonomy construction step. All steps are balanced approaches of automatic processing and manual verification. We constructed Korean IT domain thesaurus based on proposed method. Because terms are extracted from Korean newspaper and patent corpus in IT domain, the thesaurus includes many Korean neologisms. The thesaurus consists of 81 upper level classes and over 1,000 IT terms.

Links and resources

Tags