Zusammenfassung
An important characteristic of English written text is the abundance of noun
compounds - sequences of nouns acting as a single noun, e.g., colon cancer
tumor suppressor protein. While eventually mastered by domain experts, their
interpretation poses a major challenge for automated analysis. Understanding
noun compounds' syntax and semantics is important for many natural language
applications, including question answering, machine translation, information
retrieval, and information extraction. I address the problem of noun compounds
syntax by means of novel, highly accurate unsupervised and lightly supervised
algorithms using the Web as a corpus and search engines as interfaces to that
corpus. Traditionally the Web has been viewed as a source of page hit counts,
used as an estimate for n-gram word frequencies. I extend this approach by
introducing novel surface features and paraphrases, which yield
state-of-the-art results for the task of noun compound bracketing. I also show
how these kinds of features can be applied to other structural ambiguity
problems, like prepositional phrase attachment and noun phrase coordination. I
address noun compound semantics by automatically generating paraphrasing verbs
and prepositions that make explicit the hidden semantic relations between the
nouns in a noun compound. I also demonstrate how these paraphrasing verbs can
be used to solve various relational similarity problems, and how paraphrasing
noun compounds can improve machine translation.
Beschreibung
[1912.01113] Using the Web as an Implicit Training Set: Application to Noun Compound Syntax and Semantics
Links und Ressourcen
Tags
Community