Arabic dialects identification

Arabic is the fourth most used language on the Internet and the official language of more than 20 countries around the world. It has three main varieties, Modern Standard Arabic, which is used in books, news and education, local Dialects that vary from region to another, and Classical Arabic, the written language of the Quran. Maghrebi dialect is the Arabic dialect language used in North African countries, where internet users from these countries feel more comfortable using local slangs than native Arabic. In this study, we present a large dataset of regional dialects of three countries, namely Algeria, Tunisia, and Morocco, then we investigate the identification of each dialect using a machine learning classifiers with TF-IDF features. The approach shows promising results, where we achieved accuracy up to 96%.

Authors:
Berrimi Mohamed, Moussaoui Abdelouahab, Oussalah Mourad, Saidi Mohamed

Publication type:
A4 Article in conference proceedings

Place of publication:
3rd Conference on Informatics and Applied Mathematics, IAM 2020

Keywords:
Arabic dialects, Arabic text processing, feature extraction, Text classification

Published:

Full citation:
Berrimi, M., Moussaoui, A., Oussalah, M., Saidi, M., Arabic dialects identification : North African dialects case study, 3rd Conference on Informatics and Applied Mathematics, IAM 2020, ISSN: 1613-0073, p. 64-72

DOI:
http://ceur-ws.org/Vol-2748/

Read the publication here:
http://urn.fi/urn:nbn:fi-fe202102154776