Compression-Based Parts-of-Speech Tagger for The Arabic Language
, and .
International Journal of Computational Linguistics (IJCL) 10 (1): 1 - 15 (April 2019)

This paper explores the use of Compression-based models to train a Part-of-Speech (POS) tagger for the Arabic language. The newly developed tagger is based on the Prediction-by-Partial Matching (PPM) compression system, which has already been employed successfully in several NLP tasks. Several models were trained for the new tagger, the first models were trained using a silver-standard data from two different POS Arabic taggers, and the second model utilised the BAAC corpus, which is a 50K term manually annotated MSA corpus, where the PPM tagger achieved an accuracy of 93.07%. Also, the tag-based models were utilised to evaluate the performance of the new tagger by first tagging different Classical Arabic corpora and Modern Standard Arabic corpora then compressing the text using tag-based compression models. The results show that the use of silver-standard models has led to a reduction in the quality of the tag-based compression by an average of 0.43%, whereas the use of the gold-standard model has increased the tag-based compression quality by an average of 4.61% when used to tag Modern Standard Arabic text.
  • @cscjournals
This publication has not been reviewed yet.

rating distribution
average user rating0.0 out of 5.0 based on 0 reviews
    Please log in to take part in the discussion (add own reviews or comments).