The following articles and conference proceedings describe MADA, TOKAN and related work.
NOTE: When citing MADA or TOKAN in your own publications, please be sure to include the version number and what version of SAMA, BAMA or Aramorph you used. This is important because different versions can produce significantly different results, and therefore the versions must be considered when comparing to previous work.
Habash, Nizar, Owen Rambow and Ryan Roth. MADA+TOKAN: A Toolkit for Arabic Tokenization, Diacritization, Morphological Disambiguation, POS Tagging, Stemming and Lemmatization. In Proceedings of the 2nd International Conference on Arabic Language Resources and Tools (MEDAR), Cairo, Egypt, 2009. BibTeX
Roth, Ryan, Owen Rambow, Nizar Habash, Mona Diab, and Cynthia Rudin. Arabic Morphological Tagging, Diacritization, and Lemmatization Using Lexeme Models and Feature Ranking. In Proceedings of Association for Computational Linguistics (ACL), Columbus, Ohio. 2008. BibTeX
Habash, Nizar and Owen Rambow. Arabic Diacritization through Full Morphological Tagging. In Proceedings of the North American chapter of the Association for Computational Linguistics (NAACL), Rochester, New York, 2007. BibTeX
Habash, Nizar. Arabic Morphological Representations for Machine Translation. Book Chapter. In Arabic Computational Morphology: Knowledge-based and Empirical Methods. Editors Antal van den Bosch and Abdelhadi Soudi, 2007. BibTeX
Habash, Nizar and Owen Rambow. Arabic Tokenization, Morphological Analysis, and Part-of-Speech Tagging in One Fell Swoop. In Proceedings of the Conference of American Association for Computational Linguistics (ACL'05). BibTeX
Habash, Nizar. Large Scale Lexeme Based Arabic Morphological Generation. In Proceedings of Traitement Automatique du Langage Naturel (TALN-04). Fez, Morocco, 2004. BibTeX
Habash, Nizar. Introduction to Arabic Natural Language Processing. Synthesis Lectures On Human Language Technologies. Morgan & Claypool Publisher Series, 2010. BibTeX