A Comparative Study on Applying Hierarchical Phrase-based and Phrase-based on Thai-Chinese Translation

2012 Seventh International Conference on Knowledge, Information and Creativity Support Systems A Comparative Study on Applying Hierarchical Phrase-based and Phrase-based on Thai-Chinese Translation Prasert Luekhong 1,2, Rattasit Sukhauta 2, Peerachet Porkaew 3, Taneth Ruangrajitpakorn 3 and Thepchai Supnithi 3 1 College of Integrated Science and Technology, Rajamangala University of Technology Lanna, Chiang Mai, Thailand e-mail: prasert@rmutl.ac.th 2 Computer Science Department, Faculty of Science, Chiang Mai University, Chiang Mai, Thailand e-mail: rattasit.s@cmu.ac.th 3 Language and Semantic Technology Laboratory, National Electronics and Computer Technology Center, Thailand e-mail: {peerachet.porkaew, taneth.rua, thepchai}@nectec.or.th Abstract To set an appropriate goal of SMT research for Thaibased translation, the comparative study of potential and suitability between phrase-based translation (PBT) and hierarchical phrase-based translation (HPBT) becomes an initial question. Thai-Chinese language-pair is chosen as experimental subject since they share most of common syntactic pattern and Chinese resource is numerous. Based on standard setting, we gain a result that 3-gram HPBT gains significantly better BLEU point over 3-gram PBT while 3-gram HPBT is approximately equal to 5-gram PBT. Moreover, from the results, a Chinese-to- Thai translation obtains better accuracies than a Thai-to-Chinese translation from every approach. Keywords- hierarchical phrase-based translation; SMT; Thai- Chinese translation I. INTRODUCTION In the past decades, many researches on statistical machine translation (SMT) have been conducted and they result in several methods and approaches. The major approaches of SMT can be categorized as word-based approach, phrasebased approach and tree-based approach [1]. With the high demand on SMT development, various softwares were developed to help on implementing SMT such as Moses[2], Phrasal[3], Cdec[4], Joshua[5] and Jane[6]. Moses and Phrasal gains our focus since they both are open-source and can effectively generate all three above mentioned approaches while the other cannot. However, Moses receives more public attention over Phrasal in terms of popularity since it has been applied as a baseline in several consortiums such as ACL since 2007, Coling, EMNLP, and so on. With tool such as Moses, SMT developer at least requires a parallel corpus of any language-pair to conveniently implement a baseline of statistical translation. Various language-pairs were trialed and applied to SMT in the past such as English-French, English-Spanish, English- German, and they eventually gained much impressive accuracy results [7] since they have sufficient and well-made data for training, for instance The Linguistic Data Consortium (LDC) [8], The parallel corpus for statistical machine translation (Europarl) [9], The JRC-Acquis [10] and English- Persian Parallel Corpus[11]. Unfortunately for low-resource language such as Thai, the researchers suffer from insufficient data to conduct a full-scale experiment on SMT thus the translation accuracies with any other languages are obviously low, for example simple phrase-based SMT on English-to- Thai gained BLEU score around 13.11% [12]. Furthermore, Thai currently lacks of sufficient resource on syntactic treebank to effort on the tree-based approach hence an SMT research on Thai is limited to word-based approach and phrase-based approach. Since phrase-based SMT has been evidentially claimed to overcome the translation result from word-based approach [1], the development of word-based SMT for Thai is dismissed. With the limited resource for experimenting complete Thai tree-based SMT by Moses, hierarchical phrase-based translation (HPBT) becomes more interesting since its accuracies on other language-pairs are severally reported to be higher than simple phrase-based translation approach (PBT) [13]. Though a high potential of HPBT is renowned, none of any experiment on HPBT for Thai has been yet submitted. In the contrary, there are also some documents claiming the negative results of HPBT as well, for example the BLEU score result of Arabic and English translation using HPBT is reported to gain 0.6 BLEU point lower than the PBT [14]. Therefore, this raises a question that which approach is more suitable for Thai language. From the linguistic point of view, it is clearly that SMT works better with the language-pairs from the same typology since the impressive BLEU points are noticeably obtained from European language-pairs [7]. Therefore, to test a suitability of different approaches on Thai, Chinese is selected as a language-pair in the test because of its resourcefulness and resemblance of Thai structures. In this work, a comparative study between Chinese-Thai translation based on HPBT and PBT approach will be conducted to set as flag-ship for further researches on Thai SMT. Moreover, different surrounding words (3-grams and 5- grams) as factor will also be studied to compare as how they are affected to a translation result. 978-0-7695-4861-6/12 $26.00 2012 IEEE DOI 10.1109/KICSS.2012.23 126

The rest of this paper is organized as follows. Section II gives a background on past document relevant to HPBT and PBT translation result. Section III explains the methods and set-ups on HPBT and PBT implementation for Thai-Chinese pair. Section IV gives detail on experiment setting and shows the experiment result with discussion. Lastly, Section V gives a conclusion and a list of further plans for improvement. II. BACKGROUND Statistical machine translation (SMT) is a machine translation paradigm where translations are generated on the basis of statistical models whose parameters are derived from the analysis of bilingual text corpora. The idea behind statistical machine translation comes from information theory. A document is translated according to the probability distribution Defined by Brown[15] that a string in the target language (for example, English) is the translation of a string in the source language (for example, French). SMT can be divided to 3 categorical as Word-Based, Phrase-Based and Tree-Based models. A. Word-Based Model Word-based Model based on lexical translation. The translation of words in isolation requires bilingual dictionary that maps word from one language to another. The main issue on this approach is greatly caused by a lexicon complexity. Generally in natural language, lexicons with same surface are not solely referred to single concept but multiple entities. Even though they can be all defined in the dictionary, they still are not sufficient to cover the actual meaning while they are in different contexts. For example, Thai word (koh) can be translated into either island (noun) or to stick (verb) in English. An example of a word-based translation system is the freely available GIZA++ [16] package (GPLed), which includes the training program for IBM models1-5 that follows the description by Brawn [15] and hidden Markov model [17]. Recently, the attention in the word-based approach has been fading since it is proved to have unreliably low result and they are several methods that overcome its capability. B. Phrase-Based Model As of its name, phrase-based model performs a translation based on phrasal unit. It gains advantage over simple wordbased model in terms of appropriateness on selecting translation by the surrounding context. Koehn states that the currently best performing statistical machine translation systems are based on phrase-based models [1]. The capability to translate small word sequences at a time is arguably the advantage of phrase-based translation. Though many SMT systems [18][19][20][21][22] are developed regarding to this approach and show adequate outcome, the remaining limitation is that it is a pure statisticalbased method without linguistic knowledge and can return unexpected error caused by sparseness and insufficient amount of training data. Nevertheless, this model still gains much favorable since it is simple to implement with the plain parallel corpora. C. Tree-Based Model Tree-based model can be defined as the usage of syntactic tree for assisting on mapping different linguist structure and contextual word translation by using synchronous grammar [23][24][25]. Nevertheless, it requires treebank[26] as a resource for total translation process. Therefore, less informative model, such as tree to string[27][28] and string to tree[29], or a model without linguistic information, such as hierarchical phrase-base model, were proposed. For rich-resource languages with comfortable treebank, implementing a tree-based model with full linguistic information can be planed. Otherwise, less informative model or model without linguistic information are only options for low-resource language. III. DEVELOPMENT OF THAI-CHINESE SMT This work aims to learn compatibility to Thai language from two famous approaches of SMT, i.e. phrase-based translation (PBT) and hierarchical phrase-based translation (HPBT). We design the system architecture to experiment as show in Figure 1. From Figure 1, the machine translation process starts with training process. From a parallel corpus, rules for HPBT and phrases for PBT are separately extracted into tables while the data in a parallel corpus will also be used in training for genrating a language model. In a summary, a training process returns three mandatory outputs for testing process as rule Figure 1. System architecture 127

table for HPBT, phrase table for PBT, and language model for both. For testing process, input sentence for tranlation is needed. As the systemmanage input once a sentence, input is designed to one sentence per line. To translate based on HPBT and PBT, each decoder is executed separately and return a translation result. For more details, each process is described in the following sections. A. Phrase-based Translation (PBT) The statistical phrase-based MT is an improvement of the statistical word-based MT. The word-based approach use word-to-word translation probability to translate source sentence. The phrase-based approach allows the system to divide the source sentence into segments before translating those segments. Because segmented translation pair (so called phrase translation pair) can capture a local reordering and can reduce translation alternatives, the quality of output from phrase-based approach is generally higher than wordbased approach. It should be noted that phrase pairs are automatically extracted from corpus and they are not defined as same as traditional linguistic phrases. As a baseline for a comparison with HPBT, PBT is developed based on both 3-gram and 5-gram. In this work, phrase-based model translation proposed in [1] is implemented as follows. 1) Phrase Extraction Algorithm The process of producing phrase translation model starts from phrase extraction algorithm. Below is an overview of phrase extraction algorithm. 1) Collect word translation probabilities of source-totarget (forward model) and target-to-source (backward model) by using IBM model 4. 2) Use the forward model and backward model from step 1) to align words for source sentence-to-target pair and target-to-source pair respectively. Only the highest probability is choose for each word. 3) Intersect both forward and backward word alignment point to get highly accurate alignment point. 4) Fill additional alignment points using heuristic growing procedure. 5) Collect consistence phrase pair from step 4) 2) Phrase-based Model Given is the set of possible translation results and is the source sentence. Finding the best translation result can be done by maximize the using Bayes s rule. (1) Where is a translation model and is the target language model. Target Language model can be trained from a monolingual corpus of the target language. (1) can be written in a form of Log Linear Model to add a given customized features. For each phrase-pair, five features are introduced i.e. forward and backward phrase translation probability distribution, forward and backward lexical weight, and phrase penalty. According to these five features, (1) can be summarized as follows: (2) In (2) is a phrase segmentation of. The terms and are the phrase-level conditional probabilities for forward and backward probability distribution with feature weight and respectively. and are lexical weight scores for phrase pair with weights and. These lexical weights for each pair are calculated from forward and backward word alignment probabilities. The term is phrase penalty with feature weight. The phrase penalty score support fewer and longer phrase pairs to be selected. is the language model with weight. The phrase-level conditional probabilities or phrase translation probabilities can be obtained from phrase extraction process. (3) The lexical weight is applied to check the quality of an extracted phrase pair. For a given phrase pair with an alignment, lexical weight is the joint probability of every word alignment. For a source word that aligns to more than one target word, the average probability is used. (4) Where is lexical translation probability of the word pair and is number of word in phrase. 3) Decoding The decoder is used to search the most likely translation according to the source sentence, phrase translation model and the target language model. The search algorithm can be performed by beam-search[30]. The main algorithm of beam search starts from an initial hypothesis. The next hypothesis can be expanded from the initial hypothesis which is not necessary to be the next phrase segmentation of the source sentence. Words in the path of hypothesis expansion are marked. The system produces a translation alternative when a path covers all words. The scores of each alternative are calculated and the sentence with highest score is selected. Some techniques such as hypothesis recombination and heuristic pruning can be applied to overcome the exponential size of search space. B. Hierarchical Phrase-based Translation (HPBT) Chiang [13] proposed hierarchical phrase-based translation (HPBT) in his work. It is a statistical machine translation model that uses hierarchical phrases. Hierarchical phrases are defined as phrases consisting of two or more sub-phrases that hierarchically link to each other. To create hierarchical phrase 128

model, a synchronous context-free grammar (aka. a syntaxdirected transduction grammar [31]) is learned from a parallel text without any syntactic annotations. A synchronous CFG derivation begins with a pair of linked start symbols. At each step, two linked non-terminals are rewritten using the two components of a single rule. When denoting links with boxed indices, they was re-indexed the newly introduced symbols apart from the symbols already present. In this work, we follow the implement instruction based on Chiang [13]. The methodology can be summarized as follows. Since a grammar in a synchronous CFG is elementary structures which rewrite rules with aligned pairs of right-hand sides, it can be defined as: (5) Where is a non-terminal, and are both strings of terminals and non-terminals, and is a one-to-one correspondence between nonterminal occurrences in and nonterminal occurrences in. 1) Rule Extraction Algorithm The extraction process begins with a word-aligned corpus: a set of triples, where is a source sentence, is an target sentence, and is a (many-to-many) binary relation between positions of and positions of. The word alignments are obtained by running GIZA++ [16] on the corpus in both directions, and forming the union of the two sets of word alignments. Each word-aligned sentence from the two sets of word alignments is extracted into a pair of a set of rules that are consistent with the word alignments. This can be listed in two main steps. 1) Identify initial phrase pairs using the same criterion as most phrase-based systems [22], namely, there must be at least one word inside one phrase aligned to a word inside the other, but no word inside one phrase can be aligned to a word outside the other phrase. 2) In order to obtain rules from the phrases, they look for phrases that contain other phrases and replace the sub-phrases with nonterminal symbols. 2) Hierarchical-phrase-based Model Chiang [13] explained hierarchical-phrase-based model that Given a source sentence, a synchronous CFG will have many derivations that yield on the source side, and therefore many possible target translations. With such explanation, a model over derivations is defined to predict which translations are more likely than others. Following the log-linear model [32] over derivations D, the calculation is obtained as: (6) Where the are features defined on derivations and the are feature weights. One of the features is an -gram language model ; the remainder of the features will define as products of functions on the rules used in a derivation: Thus we can rewrite as (7) (8) The factors other than the language model factor can be put into a particularly convenient form. A weighted synchronous CFG is a synchronous CFG together with a function that assigns weights to rules. This function induces a weight function over derivations: If we define then the probability model becomes 3) Training (9) (10) (11) On the attempt to estimate the parameters of the phrase translation and lexical-weighting features, frequencies of phrases are necessary for the extracted rules. For each sentence pair in the training data, more than one derivation of the sentence pair use the several rules extracted from it. They are following Och and others, to use heuristics to hypothesize a distribution of possible rules as though then observed them in the training data, a distribution that does not necessarily maximize the likelihood of the training data. Och s method [22]gives a count of one to each extracted phrase pair occurrence. They give a count of one to each initial phrase pair occurrence, and then distribute its weight equally among the rules obtained by subtracting sub-phrases from it. Treating this distribution data, They use relative-frequency estimation to obtain and. Finally, the parameters of the log-linear model (16) are learned by minimum-error-rate training[33], which tries to set the parameters so as to maximize the BLEU score [34] of a development set. This gives a weighted synchronous CFG according to (6) that is ready to be used by the decoder. 4) Decoding We applied CKY parser as a decoder. We also exploited beam search in the post-process for mapping source and target derivation. Given a source sentence, it finds the target yield of the single best derivation that has source yield: (12) They find not only the best derivation for a source sentence but also a list of the k-best derivations. These k-best derivations are utilized for minimum-error-rate training to rescore a language model, and they also use to reduce searching space by cube pruning[35]. 129

Figure 2. A sentence example of Chinese-to-Thai language with the alignment of words 5) Example of the Hierarchical Translation Process In order to explain the process of hierarchical translation, translation processes by steps are demonstrated using Thai and Chinese as example. Figure 2 exemplifies pair of Chinese and Thai sentence with the word alignment for reader s understandability. From Figure 2, a synchronous CFG extracted from parallel corpus is selected according to the given words. With the highest probability for each word, a list of rule is gained as shown in Figure 3. Rule Rule Number (1) (2) (3) (4) (5) (6) (7) (8) (9) (10) Figure 3. The hierarchical rule extracted from our example sentence. In Figure 3, where a notation in the left hand side, is non-terminal and S is start-symbol. The right hand side contains two set of CFG rules separated by comma. The left set of rule is a set of both terminal and non-terminal rule of source target which is Chinese in the example while another side contains Thai set of rule. These hierarchical rules will be utilized in derivation within decoding process as demonstrated in Figure 3. Following the example in Figure 3, the derivation of synchronous CFG shown in Figure 4 is processed in topdown manner by expanding from top non-terminal node with the immediate child non-terminal node until finding a terminal node of the leftmost node. A number notation above arrow in Figure 4 refers to rule number in Figure 3. From the actual data, examples of rule table of HPBT and phrase table of PBT gained from learning of Thai-Chinese parallel corpus are shown in Figure 5 and Figure 6, respectively. The difference between Figure 5 and Figure 6 is the [X] notation in Figure 5 to indicate a slot for other word or phrase to derive as a tree. IV. EXPERIMENT A. Data Preparation To experiment a Thai-Chinese translation, parallel corpus was gathered from two sources: BTEC (Basic Travel Expression Corpus) [36] and HIT London Olympic Corpus [37]. The former and latter consists of 26,544 English- Chinese sentences and 62,733 English-Chinese sentence pairs, respectively. All English sentences in the both corpora were carefully translated into Thai by professional linguists and translators. In total, we gain Thai-Chinese parallel corpus with 89,277 sentence pairs. In the preprocess, Chinese sentences were word-segmented by using Stanford Chinese Word Segmentation Tool [38] while Thai sentences were segmented by exploiting SWATH [39]. Since both languages do not have a reliable explicit word boundary. We manually selected 877 sentence pairs as a development set and randomly chose 1,000 sentence pairs as a test set. The remaining sentence pairs were applied as a training data set. B. Experiment Setting This work aims to compare the quality of Thai-Chinese SMT between phrase-based translation (PBT) approach and hierarchical phrase-based translation (HPBT) approach. language modeling tool SRILM [40] was exploited to generate 3-gram and 5-gram language model of Chinese and Thai. Moses [2] is chosen to function on phrase extraction, ruletable generation and decoding and the minimum-error-rate training (MERT) function was applied for tuning weights of both models. The results of Moses for HPBT and PBT are rule 130

Figure 4. A derivational process of translation of Figure 2 with rules from Figure 3 [X][X] [X][X] [X] [X][X] [X][X] [X] 1 0.00475625 1 0.0118885 2.718 0-0 3-1 0.296266 0.296266 [X][X] [X][X] [X] [X][X] [X][X] [X] 1 0.00017094 1 0.000250724 2.718 0-0 3-1 0.289322 0.289322 [X][X] [X] [X][X] [X] 1 0.00123544 1 0.00481737 2.718 0-0 0.296266 0.296266 [X][X] [X] [X][X] [X] 0.600001 0.0502778 0.5 0.000183608 2.718 0-0 0.277778 0.333334 [X][X] [X] [X][X] [X] 0.600001 0.0502778 0.5 0.00441388 2.718 0-0 0.277778 0.333334 [X][X] [X][X] [X] [X][X] [X][X] [X] 0.692308 0.0502778 0.5 0.000183608 2.718 0-0 3-4 0.1625 0.225 [X][X] [X][X] [X] [X][X] [X][X] [X] 0.692308 0.0502778 0.5 0.00441388 2.718 0-0 3-3 0.1625 0.225 [X][X] [X] [X][X] [X] 0.399999 0.00363924 0.5 0.000183608 2.718 0-0 0.277778 0.222222 [X][X] [X] [X][X] [X] 0.399999 0.00363924 0.5 0.00441388 2.718 0-0 0.277778 0.222222 Figure 5. An example of rule table from HBPT of Thai-to-Chinese (0) (0) (0,1) 0.5 0.00489796 1 0.0153313 2.718 (1) (1) (0) (2) (0,1) 0.5 0.00143784 1 0.00932736 2.718 (2) (2) (1) (0) (3) (2) (0,1) 1 0.00134869 1 0.00670588 2.718 (1) (1,2) () (0,1) (1) 0.5 0.0418451 0.5 0.000641047 2.718 (0) (0,1) (0,1) (1) 0.5 0.0418451 0.5 0.0036914 2.718 (0) (5) (4) (1) (0,1,2,3) (0) (0) (0,4,5,6) (3,4) (4) (4) (2) (1) 1 1.2249e-07 1 4.95564e-07 2.718 (1) (1,2) () () (0,1) (1) 0.5 0.0202101 0.5 0.000641047 2.718 (0) (0,1) () (0,1) (1) 0.5 0.0202101 0.5 0.0036914 2.718 Figure 6. An example of phrase table from PBT of Thai-to-Chinese table and phrase table, respectively. Examples of both tables are given in Figure 5 and Figure 6. The difference from the tables is that HPBT rule table includes translations of terminal and non-terminal nodes to clarify hierarchy while PBT phrase table informs translation pairs of phrase with word order. C. Results and Discussion We evaluate the system from both directions, Chinese-to- Thai and Thai-to-Chinese. Table 1 shows the experimental results in term of BLEU score [34]. The evaluation involves in 3-gram PBT, 5-gram PBT and 3-gram HPBT. We evaluate the system from both direction, Chinese-to-Thai and Thai-to- Chinese. Table I shows the experimental results on translation accuracy in term of BLEU score [34]. From the BLEU point result shown in Table I, the best result is obtained with 3-gram HPBT from every test. TABLE I. Source-to-target language THAI-CHINESE TRANSLATION EXPERIMENT RESULT BLEU score PBT 3-gram PBT 5-gram HPBT 3-gram Chinese-to-Thai 17.52 21.83 21.38 Thai-to-Chinese 12.85 13.76 15.00 For only Chinese-to-Thai translation, 3-gram HBPT overcomes 3-gram PBT for about 3.9 BLEU point, and 3-gram HBPT and 5-gram PBT are approximately equal. In case of Thai-to-Chinese, 3-gram PBT returns the lowest result and 3- gram HBPT gains the best BLEU point and defeats both 3- gram PBT and 5-gram PBT for 2.2 and 1.3 BLEU point, respectively. For the viewpoint of language base, Chinese-to- 131

Thai translation is obviously better than Thai-to-Chinese translation. Since Chinese-to-Thai translation results of both 3-gram HPBT model and 5-gram PBT model are slightly different, it is better to focus on working for 3-gram HPBT model. With less n-gram, generated rules are much smaller and the required size of corpus does not have to cover the sparseness of the surrounding words. V. CONCLUSION AND FUTURE WORK In this work, we studied on applying 3-gram PBT model, 5- gram PBT model and 3-gram HPBT model to translate Thaito-Chinese and Chinese-to-Thai. By comparing the results, we found that 3-gram HPBT shows potential on translating of both directions since the BLEU points of 3-gram HPBT are the best on Thai-to-Chinese translation. In case of Chinese-to- Thai, both 3-gram HPBT model and 5-gram PBT model return approximately equal BELU result which is greater than 3- gram PBT model about 4 BLEU point. From the experiment, results on Chinese-to-Thai are obviously better than Thai-to- Chinese results. To improve this work, we plan to add little linguistic information to the training data for reducing the currently large amount of synchronous CFG rules. Moreover, we plan to experiment a 3-gram HPBT model on different sentence length of Thai to separately study accuracy ratio based on sentence length since Thai sentence is naturally long. To cover all available SMT approach, tree-to-string model will be tested on Chinese-to-Thai. Lastly, English-Thai language pair will be tested with HPBT. ACKNOWLEDGEMENT The Authors would like to thank the Office of the Higher Education Commission, Thailand, for funding support under the program Strategic Scholarships for Frontier Research Network for the Ph.D. Program. Prasert Luekhong also thanks Graduate School, Chiang Mai University, Thailand and Rajamangala University of Technology Lanna, Thailand for their funding. Prasert Luekhong is grateful to Dr. Liu Qun for an opportunity as visiting researcher at The ICT Natural Language Processing Research Group, China Academic of Science, Beijing, China. REFERENCES [1] P. Koehn, Statistical machine translation. Cambridge University Press, 2010, p. 433. [2] P. Koehn et al., Moses: Open source toolkit for statistical machine translation, in Proceedings of the 45th Annual Meeting of the ACL on Interactive Poster and Demonstration Sessions, 2007, no. June, pp. 177 180. [3] D. Cer, M. Galley, and D. Jurafsky, Phrasal: A toolkit for statistical machine translation with facilities for extraction and incorporation of arbitrary model features, Proceedings of the NAACL, no. June, pp. 9-12, 2010. [4] C. Dyer, J. Weese, H. Setiawan, and A. Lopez, cdec: A decoder, alignment, and learning framework for finite-state and context-free translation models, in Proceedings of the ACL, 2010, no. July, pp. 7-12. [5] L. Schwartz, W. Thornton, and J. Weese, Joshua: An open source toolkit for parsing-based machine translation, Machine Translation, no. March, pp. 135-139, 2009. [6] D. Vilar, D. Stein, and M. Huck, Jane: Open source hierarchical translation, extended with reordering and lexicon models, in on Statistical Machine Translation and and Metrics MATR (WMT 2010), 2010, no. July, pp. 262-270. [7] Matrix Euro. [Online]. Available: http://matrix.statmt.org/. [Accessed: 29-May-2012]. [8] M. Liberman and C. Cieri, The Creation, Distribution and Use of Linguistic Data: The Case of the Linguistic Data Consortium, in proceedings of the 1st International Conference on Language Resources and Evaluation (LREC), 1998. [9] P. Koehn, Europarl: A parallel corpus for statistical machine translation, in MT summit, 2005, vol. 11. [10] R. Steinberger, B. Pouliquen, and A. Widiger, The JRC-Acquis: A multilingual aligned parallel corpus with 20+ languages, Arxiv preprint cs/0609058, vol. 4, no. 1, 2006. [11] M. V. Yazdchi and H. Faili, Generating english-persian parallel corpus using an automatic anchor finding sentence aligner, in Natural Language Processing and Knowledge Engineering (NLP- KE), 2010 International Conference on, 2010, pp. 1 6. [12] P. Porkaew and T. Ruangrajitpakorn, Translation of Noun Phrase from English to Thai using Phrase-based SMT with CCG Reordering Rules, in Design, 2001. [13] D. Chiang, Hierarchical Phrase-Based Translation, Computational Linguistics, vol. 33, no. 2, pp. 201-228, Jun. 2007. [14] M. Huck, M. Ratajczak, P. Lehnen, and H. Ney, A comparison of various types of extended lexicon models for statistical machine translation, in Conf. of the Assoc. for Machine Translation in the Americas (AMTA), Denver, CO, 2010. [15] P. F. Brown, V. J. D. Pietra, S. A. D. Pietra, and R. L. Mercer, The mathematics of statistical machine translation: Parameter estimation, Computational linguistics, vol. 19, no. 2, pp. 263 311, 1993. [16] F. J. Och and H. Ney, Giza++: Training of statistical translation models. Internal report, RWTH Aachen University, http://www. i6. informatik. rwth-aachen. de, 2000. [17] S. Vogel, H. Ney, and C. Tillmann, HMM-based word alignment in statistical translation, Proceedings of the 16th conference on Computational linguistics, vol. 2, p. 836, 1996. [18] F. J. Och and H. Weber, Improving statistical natural language translation with categories and rules, Proceedings of the 36th annual meeting on Association for Computational Linguistics -, pp. 985-989, 1998. [19] F. J. Och, C. Tillmann, and H. Ney, Improved alignment models for statistical machine translation, in Proc. of the Joint SIGDAT Conf. on Empirical Methods in Natural Language Processing and Very Large Corpora, 1999, pp. 20 28. [20] F. J. Och, Statistical machine translation: from single-word models to alignment templates, Citeseer, 2002. [21] P. Koehn, F. J. F. J. Och, and D. Marcu, Statistical phrase-based translation, in Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology-Volume 1, 2003, pp. 48 54. [22] F. J. Och and H. Ney, The Alignment Template Approach to Statistical Machine Translation, Computational Linguistics, vol. 30, no. 4, pp. 417-449, Dec. 2004. [23] S. M. Shieber and Y. Schabes, Generation and synchronous treeadjoining grammars, Computational Intelligence, vol. 7, no. 4, pp. 220-228, Nov. 1991. [24] D. Chiang and K. Knight, An introduction to synchronous grammars, Tutorial on ACL-06, no. June, pp. 1-16, 2006. [25] P. Blunsom, T. Cohn, and M. Osborne, Bayesian synchronous grammar induction, in Advances in Neural Information Processing Systems, 2009, vol. 21, pp. 161 168. [26] M. P. Marcus, M. A. Marcinkiewicz, and B. Santorini, Building a large annotated corpus of English: The Penn Treebank, Computational linguistics, vol. 19, no. 2, pp. 313 330, 1993. [27] Y. Liu, Q. Liu, and S. Lin, Tree-to-string alignment template for statistical machine translation, in Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics, 2006, no. 6, pp. 609 616. [28] L. Huang and H. Mi, Efficient incremental decoding for tree-tostring translation, in Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing, 2010, pp. 273 283. 132

[29] R. S. Used, The UOT System: Improve String-to-Tree translation Using Head-Driven Phrase Structure Grammar and Predicate- Argument Structures, in mt-archive.info, 2009, pp. 99-106. [30] P. Koehn, Pharaoh: a beam search decoder for phrase-based statistical machine translation models, Machine translation: From real users to research, pp. 115 124, 2004. [31] P. L. II and S. R. E., Syntax-directed transduction, Journal of the ACM (JACM), vol. l, no. 3, pp. 465-488, 1968. [32] F. J. Och and H. Ney, Discriminative Training and Maximum Entropy Models for Statistical Machine Translation, in Proceedings of the 40th Annual Meeting on Association for Computational Linguistics, 2002, no. July, pp. 295-302. [33] F. J. Och, Minimum error rate training in statistical machine translation, in Proceedings of the 41st Annual Meeting on Association for Computational Linguistics-Volume 1, 2003, vol. 1001, no. 1, pp. 160 167. [34] K. Papineni, S. Roukos, T. Ward, and W.-jing Zhu, BLEU : a Method for Automatic Evaluation of Machine Translation, Computational Linguistics, no. July, pp. 311-318, 2002. [35] Y. Feng, H. Mi, Y. Liu, and Q. Liu, An efficient shift-reduce decoding algorithm for phrased-based machine translation, in Proceedings of the 23rd International Conference on Computational Linguistics: Posters, 2010, pp. 285 293. [36] BTEC Task International Workshop on Spoken Language Translation. [Online]. Available: http://iwslt2010.fbk.eu/node/32. [Accessed: 27-May-2012]. [37] M. Yang, H. Jiang, and T. Zhao, Construct trilingual parallel corpus on demand, in Chinese Spoken Language Processing, 2006, pp. 760-767. [38] P. Chang, M. Galley, and C. D. Manning, Optimizing Chinese word segmentation for machine translation performance, in Proceedings of the Third Workshop on Statistical Machine Translation, 2008. [39] P. Charoenpornsawat, SWATH: Smart Word Analysis for Thai. 2003. [40] A. Stolcke, SRILM-an extensible language modeling toolkit, in Seventh International Conference on Spoken Language Processing, 2002. 133