копировать удалить добавить публикацию в буфер
Запись сообщества
посмотреть историю данной записи
URL
DOI
BibTeX
EndNote
APA
Chicago
DIN 1505
Harvard
MSOffice XML

Compression-Based Parts-of-Speech Tagger for The Arabic Language

I. Alkhazi, и W. Teahan. International Journal of Computational Linguistics (IJCL), 10 (1): 1 - 15 (апреля 2019)

Аннотация

This paper explores the use of Compression-based models to train a Part-of-Speech (POS) tagger for the Arabic language. The newly developed tagger is based on the Prediction-by-Partial Matching (PPM) compression system, which has already been employed successfully in several NLP tasks. Several models were trained for the new tagger, the first models were trained using a silver-standard data from two different POS Arabic taggers, and the second model utilised the BAAC corpus, which is a 50K term manually annotated MSA corpus, where the PPM tagger achieved an accuracy of 93.07%. Also, the tag-based models were utilised to evaluate the performance of the new tagger by first tagging different Classical Arabic corpora and Modern Standard Arabic corpora then compressing the text using tag-based compression models. The results show that the use of silver-standard models has led to a reduction in the quality of the tag-based compression by an average of 0.43%, whereas the use of the gold-standard model has increased the tag-based compression quality by an average of 4.61% when used to tag Modern Standard Arabic text.

Линки и ресурсы

ключ BibTeX: alkhazi2019compressionbased
тип записи: article
год: 2019
месяц: April
журнал: International Journal of Computational Linguistics (IJCL)
номер: 1
страницы: 1 - 15
том: 10
language: English
issn: 2180-1266
url: http://www.cscjournals.org/library/manuscriptinfo.php?mc=IJCL-95

тэги

Цитировать эту публикацию

искать в

Метаданные

Последнее изменение 5 лет назад
Создан 5 лет назад

Комментарии и рецензии
(0)

Комментарии, или рецензии отсутствуют. Вы можете их написать!