My primary area of research is Arabic Computational Linguistics. Specifically:
Stemming: Details about the stemmer I have developed for Arabic. With link to Java code.
Tagging: Details about the Part-Of-Speech (POS) tagger I am developing for Arabic.
Corpora: Details about the Arabic corpora I am using. I have manually tagged 50,000 words of Arabic newspaper text with the basic tags (noun, verb, particle). I have also tagged 1,700 words with more detailed tags (i.e. singular, masculine, definite common noun). These are available for research purposes. Please e-mail me if you would like a copy of them.
Publications: I have included a couple of my publications here that can be viewed or downloaded.
Z. Sheikh, and F. Sánchez-Martínez. Proceedings of the First International Workshop on Free/Open-Source Rule-Based Machine Translation, page 67--74. Alicante, Universidad de Alicante. Departamento de Lenguajes y Sistemas Informáticos, (2009)