A hybrid approach to translate Moroccan Arabic dialect

Similar documents
arxiv: v1 [cs.cl] 2 Apr 2017

Learning Methods in Multilingual Speech Recognition

Exploiting Phrasal Lexica and Additional Morpho-syntactic Language Resources for Statistical Machine Translation with Scarce Training Data

Modeling full form lexica for Arabic

Florida Reading Endorsement Alignment Matrix Competency 1

The Karlsruhe Institute of Technology Translation Systems for the WMT 2011

Parallel Evaluation in Stratal OT * Adam Baker University of Arizona

1 st Quarter (September, October, November) August/September Strand Topic Standard Notes Reading for Literature

Books Effective Literacy Y5-8 Learning Through Talk Y4-8 Switch onto Spelling Spelling Under Scrutiny

ELA/ELD Standards Correlation Matrix for ELD Materials Grade 1 Reading

NCU IISR English-Korean and English-Chinese Named Entity Transliteration Using Different Grapheme Segmentation Approaches

Language Model and Grammar Extraction Variation in Machine Translation

What the National Curriculum requires in reading at Y5 and Y6

Taught Throughout the Year Foundational Skills Reading Writing Language RF.1.2 Demonstrate understanding of spoken words,

Houghton Mifflin Reading Correlation to the Common Core Standards for English Language Arts (Grade1)

CLASSIFICATION OF PROGRAM Critical Elements Analysis 1. High Priority Items Phonemic Awareness Instruction

Opportunities for Writing Title Key Stage 1 Key Stage 2 Narrative

Re-evaluating the Role of Bleu in Machine Translation Research

A Minimalist Approach to Code-Switching. In the field of linguistics, the topic of bilingualism is a broad one. There are many

Noisy SMS Machine Translation in Low-Density Languages

Age Effects on Syntactic Control in. Second Language Learning

TINE: A Metric to Assess MT Adequacy

Speech Recognition at ICSI: Broadcast News and beyond

1. Introduction. 2. The OMBI database editor

First Grade Curriculum Highlights: In alignment with the Common Core Standards

English Language and Applied Linguistics. Module Descriptions 2017/18

Specification and Evaluation of Machine Translation Toy Systems - Criteria for laboratory assignments

Derivational and Inflectional Morphemes in Pak-Pak Language

Problems of the Arabic OCR: New Attitudes

Program Matrix - Reading English 6-12 (DOE Code 398) University of Florida. Reading

Linguistics. Undergraduate. Departmental Honors. Graduate. Faculty. Linguistics 1

Coast Academies Writing Framework Step 4. 1 of 7

Test Blueprint. Grade 3 Reading English Standards of Learning

A Quantitative Method for Machine Translation Evaluation

Phonological Processing for Urdu Text to Speech System

Linking Task: Identifying authors and book titles in verbose queries

Sentiment Analysis of Tunisian Dialect: Linguistic Resources and Experiments

Arabic Orthography vs. Arabic OCR

ARNE - A tool for Namend Entity Recognition from Arabic Text

LING 329 : MORPHOLOGY

To appear in The TESOL encyclopedia of ELT (Wiley-Blackwell) 1 RECASTING. Kazuya Saito. Birkbeck, University of London

The Internet as a Normative Corpus: Grammar Checking with a Search Engine

Cross Language Information Retrieval

A heuristic framework for pivot-based bilingual dictionary induction

Modeling function word errors in DNN-HMM based LVCSR systems

Dickinson ISD ELAR Year at a Glance 3rd Grade- 1st Nine Weeks

The development of a new learner s dictionary for Modern Standard Arabic: the linguistic corpus approach

Correspondence between the DRDP (2015) and the California Preschool Learning Foundations. Foundations (PLF) in Language and Literacy

Language. Name: Period: Date: Unit 3. Cultural Geography

AQUA: An Ontology-Driven Question Answering System

The NICT Translation System for IWSLT 2012

Using SAM Central With iread

The ABCs of O-G. Materials Catalog. Skills Workbook. Lesson Plans for Teaching The Orton-Gillingham Approach in Reading and Spelling

have to be modeled) or isolated words. Output of the system is a grapheme-tophoneme conversion system which takes as its input the spelling of words,

CROSS-LANGUAGE INFORMATION RETRIEVAL USING PARAFAC2

TEKS Comments Louisiana GLE

DIBELS Next BENCHMARK ASSESSMENTS

CEFR Overall Illustrative English Proficiency Scales

1.2 Interpretive Communication: Students will demonstrate comprehension of content from authentic audio and visual resources.

National Literacy and Numeracy Framework for years 3/4

Constructing Parallel Corpus from Movie Subtitles

C a l i f o r n i a N o n c r e d i t a n d A d u l t E d u c a t i o n. E n g l i s h a s a S e c o n d L a n g u a g e M o d e l

IMPROVING SPEAKING SKILL OF THE TENTH GRADE STUDENTS OF SMK 17 AGUSTUS 1945 MUNCAR THROUGH DIRECT PRACTICE WITH THE NATIVE SPEAKER

Reading Horizons. A Look At Linguistic Readers. Nicholas P. Criscuolo APRIL Volume 10, Issue Article 5

Parsing of part-of-speech tagged Assamese Texts

Primary English Curriculum Framework

THE ROLE OF DECISION TREES IN NATURAL LANGUAGE PROCESSING

8 The Growth of English Language Learning in Morocco: Culture, Class, and Status Competition

Using dialogue context to improve parsing performance in dialogue systems

Software Maintenance

Experiments with SMS Translation and Stochastic Gradient Descent in Spanish Text Author Profiling

FOREWORD.. 5 THE PROPER RUSSIAN PRONUNCIATION. 8. УРОК (Unit) УРОК (Unit) УРОК (Unit) УРОК (Unit) 4 80.

Using a Native Language Reference Grammar as a Language Learning Tool

BULATS A2 WORDLIST 2

Reading Grammar Section and Lesson Writing Chapter and Lesson Identify a purpose for reading W1-LO; W2- LO; W3- LO; W4- LO; W5-

Acquisition vs. Learning of a Second Language: English Negation

Curriculum and Assessment Guide (CAG) Elementary California Treasures First Grade

The College Board Redesigned SAT Grade 12

Grade 11 Language Arts (2 Semester Course) CURRICULUM. Course Description ENGLISH 11 (2 Semester Course) Duration: 2 Semesters Prerequisite: None

A Corpus and Phonetic Dictionary for Tunisian Arabic Speech Recognition

Modeling function word errors in DNN-HMM based LVCSR systems

MARK 12 Reading II (Adaptive Remediation)

PAGE(S) WHERE TAUGHT If sub mission ins not a book, cite appropriate location(s))

Automatic Assessment of Spoken Modern Standard Arabic

Machine Translation on the Medical Domain: The Role of BLEU/NIST and METEOR in a Controlled Vocabulary Setting

Procedia - Social and Behavioral Sciences 141 ( 2014 ) WCLTA Using Corpus Linguistics in the Development of Writing

ROSETTA STONE PRODUCT OVERVIEW

MARK¹² Reading II (Adaptive Remediation)

South Carolina English Language Arts

SLINGERLAND: A Multisensory Structured Language Instructional Approach

Grade 4. Common Core Adoption Process. (Unpacked Standards)

Understanding and Supporting Dyslexia Godstone Village School. January 2017

Longest Common Subsequence: A Method for Automatic Evaluation of Handwritten Essays

Name of Course: French 1 Middle School. Grade Level(s): 7 and 8 (half each) Unit 1

Listening and Speaking Skills of English Language of Adolescents of Government and Private Schools

Author: Fatima Lemtouni, Wayzata High School, Wayzata, MN

Word-based dialect identification with georeferenced rules

Detecting English-French Cognates Using Orthographic Edit Distance

Yoshida Honmachi, Sakyo-ku, Kyoto, Japan 1 Although the label set contains verb phrases, they

Teachers: Use this checklist periodically to keep track of the progress indicators that your learners have displayed.

Transcription:

A hybrid approach to translate Moroccan Arabic dialect Ridouane Tachicart Mohammadia school of Engineers Mohamed Vth Agdal University, Rabat, Morocco tachicart@gmail.com Karim Bouzoubaa Mohammadia school of Engineers Mohamed Vth Agdal University, Rabat, Morocco karim.bouzoubaa@emi.ac.ma Abstract Today, several tools exist for automatic processing of standard Arabic. In comparison, resources and tools for Arabic dialects especially Moroccan dialect are still lacking. Given the proximity of the Moroccan dialect with standard Arabic, a way is to provide a translation of the first to the second by combining a rule-based approach and a statistical approach, using tools designed for Arabic standard and adapting these tools to Moroccan dialect. We describe in this paper an architecture for such a translation. Keywords Automatic translation, Moroccan dialect, Standard Arabic, language model, translation model, bilingual dictionary. I. INTRODUCTION Arabic-speaking world is characterized by diglossia [15]. On the one hand, modern standard Arabic (MSA) is shared by the entire Arab world but it s not the mother tongue of any Arab country. It is used in the press, in broadcast news, in writing official government documents, etc. On the other hand, several Arabic dialects exist and are mother tongues. These dialects are usually not written and therefore have no standard spelling convention although initiatives have been launched to this end [1]. This particular situation is problematic for the automatic processing of Arabic dialects, and especially machine translation since resources for these languages are almost nonexistent. In addition, direct integration of resources designed for MSA in translation systems of Arabic dialects produces very low performance [1]. This leads to the necessity of either developing new Arabic dialect tools and resources or making adjustments to those existing for MSA. Scientific researches of Arabic dialects translation systems started only in the 2000s and are still in the early stages. Moroccan dialect (Darija) in turn, is not concerned in any research, which makes it isolated and remains difficult to understand for the majority of other Arab countries. In this paper we present an approach that involves translating the Moroccan dialect to the MSA by using tools designed to MSA and adaptating them to Moroccan dialect. On one hand, this translation aims to make a first step towards the automatic processing of Moroccan dialect. On the other hand, due to the increase of the content of this dialect on the web, it may facilitate cultural exchanges with people of the other Arab countries. We present in this paper the architecture of the future translation system combining a rule-based (analysis, transfer and generation) and a statistical approach to benefit from the advantages that these approaches provide. We proceed to the development of this system as soon as its architecture will be validated. In the first step, we use linguistic resources (morphological analyzer and bilingual dictionary) to try to produce possible translations of the source text of the Moroccan dialect, then we will use statistical tools for improving the results of this translation. Finally, evaluation of the quality of our system will be carried out using metrics which compare the result with the correct expected product results. The content of this article is as follows. We describe in section 2 the characteristics of the Moroccan dialect. Then we present in section 3 the existing approaches for the realization of a translation system. In section 4, we discuss previous works concerning automatic translation of Arabic dialects. We present in section 5 our approach. Finally, section 6 concludes the paper and suggests extensions and future work. II. MOROCCAN DIALECT Moroccan dialect, along with the Amazigh language, is the mother tongue of the Moroccan people. This is an evolving dialect as it continues to strongly integrate French, English االبيیسي as: and Spanish words especially in technical areas such /PC/, برووكراامم /program/ etc. We describe the characteristics of the dialect in the following. A. Transcription There are many ways to transcribe the sounds of the Moroccan dialect. First, very often a simplified writing using MSA characters is used.. This writing has sometimes unnecessary cases. For example silent letters such as ةة (end of a word) is pronounced اا in Moroccan dialect since this letter has no phonological or grammatical function, which is not the

case for the standard Arabic. In addition, this script cannot transcribe specific phenomena in Darija such as the pronunciation of G, V and P which we transcribe with االبوططو /sit/, جلس such as,بب and,فف,كك respectivly the letters /bar/ and االفيیداانج /drain/. A second method to transcribe Moroccan Arabic uses the Latin script. Its main drawback is that several letters transcribe the same phoneme as in the case of K, Q, and QU which can transcribe the letters قق and.طط Vice versa, ثث and تت can be represented by T. The last transpcription is the alphanumeric writing that involves writing the Darija using the Latin alphabet and some numbers in order to note /عع/ phonemes that do not exist in the French language such as and.قق For example the word عقوبة /sanction/ is transcribed with 3o9oba. This writing, in addition to its aberrant shape, has reading difficulties and cannot represent numbers with emphatic consonants such as emphatic 'S' or emphatic 'D'. The example below illustrates this: script: Arabic ما لقيیتش بحالو Latin script: Malkitch bhalo Alphanumeric script: Mal9itch b7alo B. Phonology Pronunciation of Darija and MSA is similar, but with two major differences. On the one hand, vowels are often omitted in Darija, especially when they are in end of an open syllable. This is probably due to interference with the Amazigh language [17]. For example, the word كتبت /to write/ is pronounced in MSA as 'Katabto', while in Darija it is pronounced as 'Ktbt. On the other hand, Moroccan dialect is characterized by the pronunciation of consonants G, V and W [16] that don t exist in the MSA. For example, 'Gouli', which.قولي means 'tell me', its equivalent in MSA is C. Vocabulary The vocabulary of Moroccan Arabic imports its words from several languages, but it is strongly influenced by the vocabulary of the MSA. The majority of its words are imported from the MSA. We can also find in the Moroccan dialect words of French origin, and a few words of Spanish and Amazigh origins. Table I shows the sources of some words of the Moroccan dialect. TABLE I. VOCABULARY OF MOROCCAN DIALECT Word Origin Darija equivalent شربت MSA شربت ا أتمشى MSA كنتمشى فهھمتني MSA فهھمتني اابجاوو Amazigh اابجاوو موشش Amazigh مش تماررةة Amazigh تماررةة Autobus French ططوبيیس Fromage French فرماجج Fourchette French فوررشيیطا Cuerda Spanish كورردداا Roueda Spanish رروويیداا Cucina Spanish كوززيینة D. Morpho-syntax The morphology of the Moroccan dialect is much less complex compared to the MSA. First, the composition of a sentence in Darija [17] is dominated by the order subject-verbobject. Then the words of MSA origin keep their morphology, or have some modifications in their patterns or their affixes. غا or غادديي in MSA is converted to سس For example, the prefix in Darija. In addition, the conjugation of Moroccan dialect verbs is possible using a list of prefixes and suffixes with a slight modification in the pattern of the word. In addition and unlike the MSA the conjugation of verbs in the female plural and dual don t exist, but it is replaced by the simple plural. Table II compares the conjugation of a verb in both languages. TABLE II. VERBS CONJUGATION MSA MD Dual ضربوهھھھما ضربوهھھھم Plural ضربوهھھھم ضربوهھھھم Female Plural ضربوهھھھن ضربوهھھھم Finally, negation in Moroccan Arabic dialects is similar to the Arabic dialects of North Africa, where the verb is always مانكتبش such as شش and the suffix ما included between the prefix /I don t write/. III. EXISTENT APPROACHES FOR AUTOMATIC TRANSLATION The development of a system for translating the Moroccan dialect requires an approach of machine translation. In fact, there are two basic strategies. On the one hand, the rule-based approach uses a morphological analysis by exploiting linguistic information of the source and target languages. It takes place in three phases (Figure 1), where the first is the analysis that produces a series of units to determine the grammatical structure of each word, knowing that often a morphological analyzer is introduced in this phase. Then each previous analysis is associated with one or more analyzes of the target language using a bilingual dictionary in the transfer phase. Finally, a target language text is produced in the generation phase with an appropriate order.

to build these systems. First, some have built their translation systems using rules [6, 7]. Others [8, 9] were based on statistical tools. The third approach, which is a hybrid, served some others for the implementation of their systems [4]. Nizar Habash and Owen Rambow [2] developed the MAGEAD morphological analyzer for Arabic dialects which can output the root, the pattern and the affixes for each entry. In the same principle, the authors of [3] rely on the use of the morphological analyzer Buckwalter [14] to translate Egyptian dialect to the MSA with diacritics combining a statistical approach and knowledge of linguistic rules. Wael Salloum and Fig. 1. Rule-based translation On the other hand, the statistical approach, which is used to reduce the cost of developing translation systems, uses two analyzes: the first is an analysis of a parallel corpus made by a translation model; the second is an analysis of a monolingual corpus made by a language model. These analyses give some parameters to be used by the decoder in order to calculate probabilities of the translation. In fact, the language model introduces the constraints imposed by the syntax of the target language and estimates the probability of a sentence of the language, while the translation model models the process of generating a source sentence from a target sentence. The output of these two models is used by a decoder that maximizes the likelihood of the translation of the source sentence to the target sentence in an acceptable time (Figure 2) Nizar Habash have used ADAM [4], a morphological Arabic dialects analyzer, to create ELISSA a rule-based system, which translates Arabic dialects to the MSA. Finally, Yahya alamlahi [5] introduced a system of translation of Yemeni dialect to the MSA without the use of tools, but only using an algorithm that analyzes the words of this dialect based on the list of affixes. III. APPROACH OF OUR SYSTEM A. Adopted approach A rules-based approach may be particularly suited to certain phrases, while other linguistic phenomena are properly addressed by a corpus-based approach (statistical method). We opt for a hybrid approach to leverage the strengths of each approach and produce a satisfactory translation. Indeed, we proceed through several stages (Figure 3) to reach our goal and translate the Moroccan dialect to the MSA. Fig. 2. Statistical translation II. PREVIOUS WORKS The number of research concerning the automatic translation between Arabic dialects and MSA has recently increased but remains low compared to research on the MSA. Indeed, researchers usually rely on three types of approaches

Fig. 3. Architecture of our system dialect corpus. In order to prepare the corpus, we will use the writings of some television productions scenarios and we use some MSA dictionaries to produce a bilingual dictionary. The extension of the bilingual dictionary will be made by collecting other resources on the web to ensure maximum coverage of the vocabulary of the Moroccan dialect. The second phase is to develop the system of translation in several steps. In the first step of identification, our system separates the content of the source text to distinguish the content of the dialect in MSA in order to facilitate the translation. Then, in the step of analyzing, the source text is segmented into annotated dialectal units using the Alkhalil morphological analyzer [10]. In addition of being free and open source, the advantage of this analyzer is its performance (ability to analyze ten words per second). It also provides for each word all possible analyzes based on morphological and syntactic rules. It uses independent database which contain a large linguistic resources, which offers the possibility to adapt the database with our needs without modifying the body of the morphological analyzer. In fact, we extend the list of prefixes and suffixes of Alkhalil to cover the Moroccan dialect and we adapt its dictionary with our dictionary (Moroccan dialect versus MSA) which includes words from several origins (French, Spanish, Amazigh...). In the transfer step, each unit produced above will be linked to one or more corresponding units in the target language (MSA) using our bilingual dictionary. Then, the system uses the previous analyses to generate different MSA phrases in the generation step. After that, it must optimize the result to produce the most fluent sentences using a language model built on the basis of SRILM tool [11]. Let us note that SRLIM is developed under Linux as scripts, it s free to use and quickly allows building language models. These models rely on the analysis of the n-grams model and allow representing and processing texts in several languages. Finally, to measure the quality of the translation of our system, we use usual and conventional indicators such as Recall, Precision, METEOR [12] and BLUE [13]. The latter TABLE III. TRANSLATION EXAMPLE 1 Input دداابا ماشوفتوهھھھمش You did not see them now? 2 ماشوفتوهھھھمش Analysis شش ووهھھھم تت شوفف ما 3 حرفف نفي ضميیرر االغاي بيینن تاء االمخاططب االجذعع حرفف نفي دداابا دداابا ظظرفف ززمانن 4 Transfert االا نن ظظرفف ززمانن لم حرفف نفي ررا أىى االجذعع تت تاء االمخاططب ووهھھھم ضميیرر االغاي بيینن 5 Generation ررا أيیتموهھھھم لم االا نن 6 االا نن لم ررا أيیتموهھھھم 7 Language Model لم ترووهھھھم االا نن 8 Output لم ترووهھھھم االا نن You did not see them now? Our approach is divided into two phases. The first is to collect and produce the linguistic resources necessary for the development of such a translation system, namely: a bilingual dictionary MSA versus Moroccan dialect and a Moroccan has gained the status of automatic measurement of reference within the community of machine translation, and is characterized by a brevity penalty to penalize systems that try

to artificially boost their scores by producing deliberately short sentences, while METEOR which is referred to the improvement of the correlation between the translation system and the translation of human translation level segments is characterized by the harmonic mean of the uni-grams of precision and recall. B. Example of translation Table III shows an example of translation of the text in Moroccan dialect دداابا ماشوفتوهھھھمش /you did not see them now/ to MSA according to the approach that we have just explained. In fact, in the analysis phase, the system uses the Alkhalil morphological analyzer to split this sentence in two words: دداابا and ماشوفتوهھھھمش in order to produce annotated units of each word as shown in the third row of table III. Indeed, the word دداابا is analyzed as ظظرفف ززمانن /adverb of time/, while the word is analyzed as a verbal unit and ماشوفتوهھھھمش decomposed to the prefix ما and the suffixes ووهھھھم,تت and,شش and the stem.شوفف Then, in the transfer phase (line 4), each unit produced in the previous phase is linked to its equivalent in MSA using the bilingual dictionary already created. In our ما+شش, دداابا example, the system translates respectively the units.ووهھھھم and االا نن, لم, ررا أىى, تت to its MSA equivalents ووهھھھم and شوفف, تت االا نن لم Then, in the generation phase (line 5 and 6), the phrase analysis. in MSA is generated from the previous ررا أيیتموهھھھم Finally, the component of the language model (line 7) uses the SRILM tool to improve the quality of translation and leads to the final sentence لم ترووهھھھم االا نن that is more fluent than the sentence االا نن لم ررا أيیتموهھھھم produced in the generation phase. IV. CONCLUSION We have presented an approach to translate the Moroccan dialect to MSA by exploiting processing tools for MSA. This is the first translation work concerning the Moroccan dialect. Among its extensions, we plan to introduce in the identification phase a grammar and spelling corrector. REFERENCES [1] Habash N., Diab M., Rambow O. (2011). Conventional Orthography for Dialectal Arabic. In: Proc. Eighth International Conference on Language Resources and Evaluation, LREC 2012, Istanbul, Turkey, May 21-27. [2] Habash N., Rambow O. (2011). MAGEAD: a morphological analyzer and generator for the Arabic dialects (2006). In: Proc. 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics, Stroudsburg, USA, July 2006. [3] Abo Bakr H., Shaalan K., Ziedan I. (2008). A Transferring Egyptian Colloquial Dialect into Modern Standard Arabic. In: Proc. International Conference on Recent Advances in Natural Language Processing, RANLP 2007, Borovets, Bulgaria, September 27-29. [4] Salloum W., Habash N. (2012). Elissa: A Dialectal to Standard Arabic Machine Translation System. In: Proc. 24th International Conference on Computational Linguistics, COLING 2012, Mumbai, India, December 2012. [5] Almalahi Y., Fateh, A., (2007). Sana ani Dialect to Modern Standard Arabic: Rule-based Direct Machine Translation. [6] Abo Bakr H., Shaalan K., Ziedan I. (2008). A Hybrid Approach for Converting Written Egyptian Colloquial Dialect into Diacritized Arabic. In: Proc. 6th International Conference on Informatics and Systems, INFOS 2008, Cairo, Egypt, Mars 27-28. [7] Mohamed E., Mohit B., and Oflazer K., (2012) Transforming Standard Arabic to Colloquial Arabic. In: Proc. 50th Annual Meeting of the Association for Computational Linguistics, ACL 2012, Jeju, Korea, July 8-14. [8] Zbib R., Malchiod E., Devlin J., Stallard D., Matsoukas S. Schwartz R., Makhoul J., Zaidan O., Callison-Burch C. (2012) Machine translation of Arabic dialects. In: Proc. 12th Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL HLT 2012, Montreal, Canada, June 3-8. [9] Salloum W., Habash N. Plamondon, L., (2011). Dialectal to Standard Arabic Paraphrasing to Improve Arabic-English Statistical Machine Translation. In: Proc. Conference on Empirical Methods in Natural Language Processing, EMNLP 2011, Edinburgh, Scotland, UK, July 27-31. [10] Boudlal A., Lakhouaja A., Mazroui A., Meziane A., Ould abdallahi ould bebah A., Shoul M. (2010).Alkhalil Morpho Sys1: A Morphosyntactic analysis system for Arabic texts. In: Proc. The International Arab Conference on Information Technology, ACIT 2010, Benghazi, Libya, December 14-15. [11] Stolcke A. (2002) SRILM An Extensible language modeling toolkit. In Proc. 7th International Conference on Spoken Language Processing, ICSLP 2002, Denver, Colorado, USA, September 16-20. [12] Denkowski M., Lavie A. (2011). Meteor 1.3: Automatic Metric for Reliable Optimization and Evaluation of Machine Translation Systems. In: Proc. 6th Workshop on Statistical Machine Translation, EMNLP 2011, Edinburgh, Scotland, UK, July 30 31, 2011. [13] Papenini K., Roukos S., Ward T., Zhu W. (2002). BLEU: a method for automatic evaluation of machine translation. In: Proc. 40th Annual Meeting on Association for Computational Linguistics, ACL 2002, Pennsylvania, Philadelphia, USA, July 7-12. [14] Tim Buckwalter, "Buckwalter Arabic Morphological Analyzer Version 1.0. ", Tim Buckwalter ed Linguistic Data Consortium: University of Pennsylvania, 2002. [15] Ferguson, Charles (1959). "Diglossia". Word 15. [16] Ennaji Moha: Multilingualism, cultural identity, and education in Morocco. p 130-134. 1989. [17] Ait cherif A., Boukbout M., Mahmoudi M., and Ouhmouch A. (2011). Moroccan Arabic Textbook - Peace corps Morocco.