SHAMI-MT: A Syrian Arabic Dialect to Modern Standard Arabic Bidirectional Machine Translation System
SHAMI-MT: A Syrian Arabic Dialect to Modern Standard Arabic Bidirectional Machine Translation System
Serry Sibaee,Omer Nacar,2 Authors,Wadii Boulila
TLDR
This work provides a crucial, high-fidelity tool for a previously underserved language pair, advancing the field of dialectal Arabic translation and offering significant applications in content localization, cultural heritage, and intercultural communication.
Abstract
The rich linguistic landscape of the Arab world is characterized by a significant gap between Modern Standard Arabic (MSA), the language of formal communication, and the diverse regional dialects used in everyday life. This diglossia presents a formidable challenge for natural language processing, particularly machine translation. This paper introduces \textbf{SHAMI-MT}, a bidirectional machine translation system specifically engineered to bridge the communication gap between MSA and the Syrian dialect. We present two specialized models, one for MSA-to-Shami and another for Shami-to-MSA translation, both built upon the state-of-the-art AraT5v2-base-1024 architecture. The models were fine-tuned on the comprehensive Nabra dataset and rigorously evaluated on unseen data from the MADAR corpus. Our MSA-to-Shami model achieved an outstanding average quality score of \textbf{4.01 out of 5.0} when judged by OPENAI model GPT-4.1, demonstrating its ability to produce translations that are not only accurate but also dialectally authentic. This work provides a crucial, high-fidelity tool for a previously underserved language pair, advancing the field of dialectal Arabic translation and offering significant applications in content localization, cultural heritage, and intercultural communication.
