Abstract
This thesis describes a knowledge-based method of automatic
phrase alignment, with the aim of annotating a multilingual
treebank for linguistic studies. Most current phrase alignment
methods are based on extracting many-to-many-links from N-gram
tables, perhaps filtering out true constituents or dependency
links in a later step. Such methods do not utilise the full
information available in a deep syntactic parse. Additionally, the
goal is typically to build a machine translation system; very few
methods aim at building treebanks for linguistic
studies. Consequently, there is in principle no reason to exclude
links which are not linguistically motivated.
The method described in this thesis, on the other hand, has the
explicit goal of annotating a parallel treebank for linguistic
research. It takes as input parallel sentences with deep,
syntactic analyses in Lexical-Functional Grammar. The grammars
giving rise to the analyses are assumed to follow common analysis
guidelines; if so, structural similarity in analyses gives us
evidence that constituents (syntactic phrases) or functional
elements (predicates, arguments, adjuncts) may be linked. A set of
principles for function and constituent alignment are formulated
(keeping our annotation goal in mind), and an implementation of
these principles is given. Finally, the method is evaluated both
manually and automatically, and compared with methods based on
N-gram tables. The results suggest that the method seems
promising, but also show that there are specific possibilities for
improvement.
Links and resources
Tags