Mastersthesis,

Syntaktisk fraselenking

.
Universitetet i Bergen, Bergen, Norway, (2010)

Abstract

This thesis describes a knowledge-based method of automatic phrase alignment, with the aim of annotating a multilingual treebank for linguistic studies. Most current phrase alignment methods are based on extracting many-to-many-links from N-gram tables, perhaps filtering out true constituents or dependency links in a later step. Such methods do not utilise the full information available in a deep syntactic parse. Additionally, the goal is typically to build a machine translation system; very few methods aim at building treebanks for linguistic studies. Consequently, there is in principle no reason to exclude links which are not linguistically motivated. The method described in this thesis, on the other hand, has the explicit goal of annotating a parallel treebank for linguistic research. It takes as input parallel sentences with deep, syntactic analyses in Lexical-Functional Grammar. The grammars giving rise to the analyses are assumed to follow common analysis guidelines; if so, structural similarity in analyses gives us evidence that constituents (syntactic phrases) or functional elements (predicates, arguments, adjuncts) may be linked. A set of principles for function and constituent alignment are formulated (keeping our annotation goal in mind), and an implementation of these principles is given. Finally, the method is evaluated both manually and automatically, and compared with methods based on N-gram tables. The results suggest that the method seems promising, but also show that there are specific possibilities for improvement.

Tags

Users

  • @unhammer

Comments and Reviews