Abstract
This paper presents the Part of Speech Tagger (POS) for Kadazan language by
implementing Brill's approach which is also known as a Transformation-Based Error
Driven Learning approach. Kadazan language is chosen because there is not even one POS
tagger has been developed for this language yet. Hence, this study has been carried out in
order to develop a POS tagger especially for Kadazan language that can tag Kadazan
corpus systematically, help to reduce the ambiguity problem and at the same time can be
used as a learning language tool. Therefore, the main objective of this study is to automate
the tagging process for Kadazan language. Brill' approach is an enhance version of the
original Rule-Based approach which it transforms the tags based on a set of predefined
rules. Brill’s approach uses rules to transform wrong tags into correct tags in the corpus. In
order to achieve the main goal, several objectives have been set which are to create the
specific lexical and contextual rules for Kadazan language, by applying Brill’s approach
based on rules and to evaluate the effectiveness of Kadazan Part of Speech using Brill’s
approach. The tagging process is divided into four main phases. In first phase, Brill’s
approach process begins by inputting a new untagged text into the system. In second phase,
the input text will go through the initial state annotater to tag all the words inside the corpus
to its most likely tags and produce a temporary corpus. In third phase, the temporary
corpus is then compared to the goal corpus to detect if there is any errors occurred. In last
phase, the rules will be applied to reduce any errors occurred and fix the temporary corpus.
The tagging approach has been trained using two Kadazan children’s story books which
contain 2069 words. Evaluation process is done by comparing the tagging results of Brill’s
approach with the manual tagging. Kadazan Part of Speech Tagger has achieved around 93
% of accuracy. This study has shown how Brill’s tagging approach can be used to identify
tags for Kadazan language.
Users
Please
log in to take part in the discussion (add own reviews or comments).