Although term extraction has been researched for more than 20 years, only a few studies focus on under-resourced languages. Moreover, bilingual term mapping from comparable corpora for these languages has attracted researchers only recently. This paper presents methods for term extraction, term tagging in documents, and bilingual term mapping from comparable corpora for four under-resourced languages: Croatian, Latvian, Lithuanian, and Romanian. Methods described in this paper are language independent as long as language specific parameter data is provided by the user and the user has access to a part of speech or a morpho-syntactic tagger.
J. Tang, M. Hong, J. Li, and B. Liang. International Semantic Web Conference, volume 4273 of Lecture Notes in Computer Science, page 640-653. Springer, (2006)
N. Collier, C. Nobata, and J. ichi Tsujii. Proceedings of the 18th conference on Computational linguistics, page 201--207. Morristown, NJ, USA, Association for Computational Linguistics, (2000)
F. Reichartz, H. Korte, and G. Paass. Proceedings of the ACL-IJCNLP 2009 Conference Short Papers, page 365--368. Suntec, Singapore, Association for Computational Linguistics, (August 2009)
J. Finkel, T. Grenager, and C. Manning. Proceedings of the 43nd Annual Meeting of the Association for Computational Linguistics (ACL 2005), page 363--370. (2005)
Y. Zhang, N. Zincir-Heywood, and E. Milios. WIDM '05: Proceedings of the 7th annual ACM international workshop on Web information and data management, page 51--58. New York, NY, USA, ACM, (2005)
O. Medelyan, and I. Witten. JCDL '06: Proceedings of the 6th ACM/IEEE-CS joint conference on Digital libraries, page 296--297. New York, NY, USA, ACM, (2006)