Abstract
Performing Named Entity Recognition on ancient documents is a
time-consuming, complex and error-prone manual task. It is a prerequisite
though to being able to identify related documents and correlate between named
entities in distinct sources, helping to precisely recreate historic events. In
order to reduce the manual effort, automated classification approaches could be
leveraged. Classifying terms in ancient documents in an automated manner poses
a difficult task due to the sources' challenging syntax and poor conservation
states. This paper introduces and evaluates two approaches that can cope with
complex syntactial environments by using statistical information derived
from a term's context and combining it with domain-specific heuristic knowledge
to perform a classification. Furthermore, these approaches can easily be
adapted to new domains.
Users
Please
log in to take part in the discussion (add own reviews or comments).