копировать удалить добавить публикацию в буфер
Запись сообщества
посмотреть историю данной записи
URL
DOI
BibTeX
EndNote
APA
Chicago
DIN 1505
Harvard
MSOffice XML

Exploring Twitter as a Source of an Arabic Dialect Corpus

A. Alshutayri, и E. Atwell. International Journal of Computational Linguistics (IJCL), 8 (2): 37-44 (июня 2017)

Аннотация

Given the lack of Arabic dialect text corpora in comparison with what is available for dialects of English and other languages, there is a need to create dialect text corpora for use in Arabic natural language processing. What is more, there is an increasing use of Arabic dialects in social media, so this text is now considered quite appropriate as a source of a corpus. We collected 210,915K tweets from five groups of Arabic dialects Gulf, Iraqi, Egyptian, Levantine, and North African. This paper explores Twitter as a source and describes the methods that we used to extract tweets and classify them according to the geographic location of the sender. We classified Arabic dialects by using Waikato Environment for Knowledge Analysis (WEKA) data analytic tool which contains many alternative filters and classifiers for machine learning. Our approach in classification tweets achieved an accuracy equal to 79%.

Линки и ресурсы

ключ BibTeX: alshutayri2017exploring
тип записи: article
год: 2017
месяц: June
журнал: International Journal of Computational Linguistics (IJCL)
номер: 2
страницы: 37-44
том: 8
language: English
issn: 2180-1266
url: http://www.cscjournals.org/library/manuscriptinfo.php?mc=IJCL-83

тэги

Цитировать эту публикацию

искать в

Метаданные

Последнее изменение 6 лет назад
Создан 6 лет назад

Комментарии и рецензии
(0)

Комментарии, или рецензии отсутствуют. Вы можете их написать!