@jnothman

Extracting Complex Biological Events with Rich Graph-Based Feature Sets

, , , , , and . Proceedings of the BioNLP 2009 Workshop Companion Volume for Shared Task, page 10--18. Boulder, Colorado, Association for Computational Linguistics, (June 2009)

Description

Treat task as finding nodes (events, entities) and edges (args) in a graph. Trigger detection: SVM word tagger; known multiword triggers from training are postprocessed; for cases where tokens have multiple annotations, create new double-annotation classes; postprocessing handles multiple events of the same type sharing a trigger. Large feature space. Introduce β multiplier for negative class to calibrate precision-recall tradeoff for whole system. Multi-class SVM for argument classification with many features, based on *groupings* of tokens along the shortest dependency path between candidate trigger and arg. Each pair-classification decision is independent. Rule-based post-processing on graph produces valid events. Had considered N-best reranking of candidate graphs, but couldn't build a system of any effect (it has potential of 11.5% F-score improvement). System errors almost evenly split between trigger and edge detectors. Ignore multi-sentence annotations, since 95% of all events are in 1 sentence. Intend to open-source their system.

Links and resources

Tags