Outline. Factored translation models. Outline. Factored translation models N-gram-based translation models Hiero Syntax-based translation systems
|
|
- Rosanna Harrell
- 6 years ago
- Views:
Transcription
1 Maxim Khalilov TAU Labs Amsterdam Marta R. Costa-jussà Barcelona Media Barcelona Outline Factored translation models N-gram-based translation models yntax-based translation systems RuIR 2012 August 5-10, of 61 Outline Factored translation models N-gram-based translation models yntax-based translation systems Factored translation models Factored translation models are an extension to phrasebased models where every word is substituted by a vector of factors. (word) = (word, lemma, Po, morphology,...) The translation is now a combination of pure translation (T) and generation (G) steps: lemma f Po f morphology f word f T T T G lemma e Po e morphology e word e 2 of 61 4 of 61
2 Factored translation models What differs in factored translation models (as compared to standard phrase-based models) The parallel corpus must be annotated beforehand. Extra language models for every factor can also be used. Translation steps are accomplished in a similar way. Generation steps imply a training only on the target side of the corpus. Models corresponding to the different factors and components are combined in a log-linear fashion. Factored translation models English German Model BLEU best published result 18.15% baseline (surface) 18.04% surface + PO 18.15% surface + PO + morph 18.22% English panish Model BLEU baseline (surface) 23.41% surface + morph 24.66% surface + PO + morph 24.25% English Czech Model BLEU baseline (surface) 25.82% surface + all morph 27.04% surface + case/number/gender 27.45% surface + CNG/verb/prepositions 27.62% 5 of 61 7 of 61 Factored translation models Outline Factored Representation Input Output word lemma word lemma Factored Model: transfer and generation word lemma Input Output word lemma Factored translation models N-gram-based translation models yntax-based translation systems PO PO PO PO morphology morphology morphology morphology word class word class word class word class 6 of 61 8 of 61
3 N-gram-based translation models N-gram-based translation models Log-linear combination of feature functions: { M } ˆt I 1 = arg max t I 1 m=1 λ m h m (s J 1,t I 1) Bilingual N-gram translation language model target language model word bonus model source target lexicon model (ibm1 probs.) target source lexicon model (ibm1 probs.) target PO language model 9 of 61 Tuples are extracted from word alignment A unique, monotonous segmentation of each sentence pair is produced. No word in a tuple is aligned to words outside of it No smaller tuples can be extracted without violating the previous constraints 11 of 61 N-gram-based translation models N-gram-based translation models h BM (s J 1,t I 1) = log K p((s,t) i (s,t) i N+1,...,(s,t) i 1 ) i=1 Given a word alignment, tuples T i = (s,t) i are those bilingual units: having minimal length describing a monotonic segmentation of each sentence pair K h BM (s J 1,t I 1) = log p((s,t) i (s,t) i N+1,...,(s,t) i 1 ) i=1 Given a word alignment, tuples T i = (s,t) i are those bilingual units: having minimal length describing a monotonic segmentation of each sentence pair Constraints define a unique possible segmentation Except: NULL-source tuples Constraints define a unique possible segmentation Except: NULL-source tuples 10 of of 61
4 N-gram-based translation models Feature functions: target language model: h TM (s,t) = h TM (t) = log K p(w k w k N+1,...,w k 1 ) k=1 word bonus model: h WB (s,t) = h WB (t) = K source target lexicon model: h LE (s,t) = log 1 (I + 1) J J j=1 i=0 target source lexicon model (analogous) 13 of 61 I p IBM1 (t n j s n i ) N-gram-based translation models WMT 09 (large corpus, Es<->En): panish-to-english English-to-panish ystem BLEU Constr. Rank. ystem BLEU Constr. Rank. GOOGLE 0.29 NO.70 GOOGLE 0.28 NO.65 UEDIN 0.26 YE.56 NU 0.25 YE.59 UPC-TALP 0.26 (2-3) YE.59 (2) UEDIN 0.25 YE.66 NICT 0.22 YE.37 UPC-TALP 0.25 (2-4) YE.58 (5) RBMT 0.20 NO.55 RBMT 0.22 NO.64 AAR 0.20 NO.51 RWTH 0.22 YE AAR 0.20 NO.48 IWLT 08 (small amount of data, Zh<->En): ystem Zh2Es Zh2(En)2Es Clean AR Clean AR ystem BLEU BLEU Rank. BLEU BLEU Rank. TCH TCH FBK FBK DCU UPC-TALP (3) (3-4) (3) TTK NICT NICT DCU PT TTK UPC-TALP (7) (6) (6) GREYC GREYC QMUL of 61 N-gram-based translation models N-gram-based translation models Decoding: freely available MARIE decoder [Crego et al., 2005] (beam search with hypothesis recombination, threshold and histogram pruning) no rescoring module (1-best output used) monotone and reordered search José B. Mariño, Rafael Banchs, Josep M. Crego, Adrià de Gispert, Patrik Lambert, José A. R. Fonollosa, Marta R. Costa-jussà. N-gram-based Machine Translation. Computational Linguistics, 2006 Feature function weights optimization: Downhill implex Method 14 of of 61
5 Outline Factored translation models N-gram-based translation models yntax-based translation systems A Chinese sentence: Why the difference in word order? Aozhou shi yu Beihan you bangjiao de shaoshu guojia zhiyi Australia is with North Korea have dipl. rels. that few The English translation: countries one of Australia is one of the few countries that have diplomatic relations with North Korea shaoshuguojiazhiyi oneofthefewcountries 17 of of 61 A Motivating Example from Chiang 2007 A Chinese sentence: Aozhou shi yu Beihan you bangjiao de shaoshu guojia zhiyi Why the difference in word order? A Chinese sentence: Aozhou shi yu Beihan you bangjiao de shaoshu guojia zhiyi Australia is with North Korea have dipl. rels. that few countries one of Australia is with North Korea have dipl. rels. that few countries one of The English translation: Australia is one of the few countries that have diplomatic relations with North Korea The English translation: Australia is one of the few countries that have diplomatic relations with North Korea shaoshuguojiazhiyi oneofthefewcountries yubeihan youbangjiao de that have diplomatic relations with North Korea 18 of of 61
6 A Chinese sentence: Why the difference in word order? Aozhou shi yu Beihan you bangjiao de shaoshu guojia zhiyi A Chinese sentence: Why the difference in word order? Aozhou shi yu Beihan you bangjiao de shaoshu guojia zhiyi Australia is with North Korea have dipl. rels. that few countries one of Australia is with North Korea have dipl. rels. that few countries one of The English translation: Australia is one of the few countries that have diplomatic relations with North Korea shaoshu guojia zhiyi one of the few countries yubeihan youbangjiao de that have diplomatic relations with North Korea The English translation: Australia is one of the few countries that have diplomatic relations with North Korea shaoshu guojia zhiyi one of the few countries yubeihan youbangjiao de that have diplomatic relations with North Korea 21 of of 61 A Motivating Example from Chiang 2007 A Chinese sentence: Aozhou shi yu Beihan you bangjiao de shaoshu guojia zhiyi A olution: Hierarchical Phrases A Chinese sentence: Aozhou shi yu Beihan you bangjiao de shaoshu guojia zhiyi Australia is with North Korea have dipl. rels. that few countries one of Australia is with North Korea have dipl. rels. that few countries one of The English translation: Australia is one of the few countries that have diplomatic relations with North Korea Output from a phrase-based system: [Aozhou] [shi] 1 [yubeihan] 2 [you] [bangjiao] [deshaoshuguojiazhiyi] [Australia] [has] [dipl.rels.] [withnorthkorea] 2 [is] 1 [oneofthefewcountries] Hierarchical phrases needed for this example: yu 1 you 2,have 2 with 1 1 de 2,the 2 that 1 1 zhiyi,oneof 1 (We ll see how to formalize this next.) 22 of of 61
7 Examples of s-cfg Rules(from Chiang 2007) yu 1 you 2,have 2 with 1 1 de 2,the 2 that 1 1 zhiyi,oneof 1 An invalid s-cfg rule: Examples of s-cfg Rules V P PP 1 you NP 2,have NP 2 1 This rule is invalid because a PP corresponds to an. Nonterminals that correspond to each other must be the same. Note:theserulesmakeuseofasinglenon-terminal, We use subscripts such as 1, 2 to specify which non-terminals correspond to each other. 25 of of 61 Another valid s-cfg rule: Examples of s-cfg Rules V P PP 1 you NP 2,have NP 2 PP 1 In this case three non-terminals, NP, PP, and V P are used. The above rule is perfectly valid in an s-cfg. However, Chiang s grammar only makes use of two non-terminals: and. Intuition Behind Translation with an s-cfg Firststep:wecanreadoffaCFGforChinesefromthes-CFG, andparsethechinesewiththiscfg For example, yu 1 you 2,have 2 with 1 implies the Chinese-only context-free rule and yu you bangjiao,diplomaticrelations implies the Chinese-only context-free rule bangjiao 26 of of 61
8 Intuition Behind Translation with an s-cfg The resulting CFG for Chinese: yu you de zhiyi Aozhu Beihan shi bangjiao shaoshuguojia Intuition Behind Translation with an s-cfg Firststep:wecanreadoffaCFGforChinesefromthes-CFG, andparsethechinesewiththiscfg econdstep:weusethesynchronousrulestomapthechinese parsetreetoanenglishparsetree 29 of of 61 Aparsetreeforourexample: tart bottom-up. For example, Aozhu, Australia gives: shi zhiyi shi zhiyi Aozhou Australia de de yu you shaoshu guojia yu you shaoshu guojia Beihan bangjiao Beihan bangjiao 30 of of 61
9 The tree after all the lowest-level rules are applied: Use 1 de 2,the 2 that 1 toget: Australia is zhiyi Australia is zhiyi de yu you few countries the that N. Korea dipl. rels. few countries have with 33 of of 61 Next, apply higher-level rules. For example, use yu 1 you 2,have 2 with 1 toget: Use 1 zhiyi,oneof 1 toget: Australia is one of Australia is zhiyi the that few countries have with de dipl. rels. N. Korea have with few countries dipl. rels. N. Korea 34 of of 61
10 What is missing here, but can be found in the paper: Derivation in CFG Learning a CFG grammar Probability calculation But, here are some results: Results from Chiang 2007 Outline Factored translation models N-gram-based translation models yntax-based translation systems MT03 MT04 MT05 ATA Results are for translation from Chinese to English. MT03, MT04,andMT05are3differenttestsets.AllscoresareBleu scores. 37 of of 61 yntax-based MT Two camps: yntax will improve translation (K. Knight) impler data-driven models will always win (F. Och) Joshua: open-source toolkit for parsing-based MT 38 of of 61
11 yntax-based MT NP E There MD will AU be JJR more ADJP VP NN divisiveness ADJP IN than JJ positive NN effects Elle aura de les effets plus destructifs que positifs VP PP Fox (2002) NP Gloss: It will have effects more destructive than positive Phrases are not coherent in bitexts 41 of 61 yntax-based MT WHY???? 43 of 61 yntax-based MT IBM Model PBMT PBMT w/syntactic phrases k 20k 40k 80k 160k 320k Koehn et al (2003) yntax-based MT WHAT HOULD WE DO???? 42 of of 61
12 yntax-based MT yntactic translation models incorporate syntax to the source and/or target languages. yntax-based MT Example of a string-to-tree translation system: interlingua yntactic phrase-based based on tree trasducers: foreign semantics foreign syntax english semantics english syntax Tree-to-string. Build mappings from target parse trees to source strings. tring-to-tree. Build mappings from target strings to source parse trees. Tree-to-tree. Mappings from parse trees to parse trees. foreign words english words Use of English syntax trees[yamada and Knight, 2001] exploit rich resources on the English side obtained with statistical parser[collins, 1997] flattened tree to allow more reorderings works well with syntactic language model 45 of of 61 yntax-based MT Advantages of syntax-based translation: yntax-based MT Example of a string-to-tree translation system: Reordering for syntactic reasons PRP VB VB1 VB2 reorder PRP VB VB2 VB1 e.g., move German object to end of sentence he adores VB TO he TO VB adores Better explanation for function words listening TO MN to music MN music TO to listening e.g., prepositions, determiners Conditioning to syntactically related words VB PRP VB2 VB1 he ha TO VB ga adores desu VB insert PRP VB2 VB1 kare ha TO VB ga daisuki desu translation of verb may depend on subject or object MN music TO to listening no translate MN ongaku TO wo kiku no Use of syntactic language models take leaves Kare ha ongaku wo kiku no ga daisuki desu 46 of of 61
13 yntax-based MT yntax-based MT Decoding as parsing: Original Order Reordering p(reorder original) PRP VB1 VB2 PRP VB1 VB PRPVB1VB2 PRPVB2VB PRP VB1 VB2 VB1 PRP VB PRP VB1 VB2 VB1 VB2 PRP PRP VB1 VB2 VB2 PRP VB PRP VB1 VB2 VB2 VB1 PRP VB TO VB TO VB TO TO VB Chart Parsing PRP NN TO he music to kare ha ongaku wo kiku no ga daisuki desu Pick Japanese words Translate into tree stumps TO NN TO NN TO NN NN TO of of 61 yntax-based MT Decoding as parsing: yntax-based MT Decoding as parsing: Chart Parsing PRP he kare ha ongaku wo kiku no ga daisuki desu PP PRP NN TO he music to kare ha ongaku wo kiku no ga daisuki desu Pick Japanese words Adding some more entries... Translate into tree stumps 50 of of 61
14 yntax-based MT Decoding as parsing: yntax-based MT Decoding as parsing: PP PRP NN TO VB he music to listening kare ha ongaku wo kiku no ga daisuki desu Combine entries VB2 PP PRP NN TO VB VB1 he music to listening adores kare ha ongaku wo kiku no ga daisuki desu 53 of of 61 yntax-based MT Decoding as parsing: yntax-based MT Decoding as parsing: VB VB2 VB2 PP PRP NN TO VB he music to listening kare ha ongaku wo kiku no ga daisuki desu PP PRP NN TO VB VB1 he music to listening adores kare ha ongaku wo kiku no ga daisuki desu Finished when all foreign words covered 54 of of 61
15 yntax-based MT How realistic is this model? Do English trees match foreign strings? Crossings between French-English[Fox, 2002] per sentence, depending on how it is measured Canbereducedby flattening tree, as done by[yamada and Knight, 2001] detecting phrasal translation special treatment for small number of constructions Most coherence between dependency structures yntax-based MT Other syntax-based systems: U.Alberta (Microsoft): treelet translation Translating from English Using dependency parser in English Project dependency treeinto language for training Map parts of the dependency tree ( treelets ) into foreign language Reranking phrase-based MT output with syntactic features Create n-best lists with phrase-based MT PO-tag and parse candidate translation Rerank with sybtactic features 57 of of 61 yntax-based MT Other syntax-based systems: yntax-aided phrase-based MT (Koehn, 2005) tick with phrase-based systems pecial treatment for special syntactic problems (NP treatment, clause restructuring) II: extended work of Yamada and Knight More complex rules Perfromance approaching phrase-based Prague: translation via dependency structures Parallel Czech-English treebank Tecto-grammatical translation model yntax-based MT o, syntax: does it help? Not yet best systems still phrase-based, treat words as tokens Well,maybe... work on reordering German automatically trained tree transfer systems promising Whynotyet? ifrealsyntax,weneedgoodparsers aretheygoodenough? syntactic annotations add a level of complexity difficult to handle, slow to train and decode few researchers good at statistical modeling and understand syntactic theories 58 of of 61
16 Next session Practical workshop: let s make our hands dirty!!! 61 of 61
Context Free Grammars. Many slides from Michael Collins
Context Free Grammars Many slides from Michael Collins Overview I An introduction to the parsing problem I Context free grammars I A brief(!) sketch of the syntax of English I Examples of ambiguous structures
More informationTraining and evaluation of POS taggers on the French MULTITAG corpus
Training and evaluation of POS taggers on the French MULTITAG corpus A. Allauzen, H. Bonneau-Maynard LIMSI/CNRS; Univ Paris-Sud, Orsay, F-91405 {allauzen,maynard}@limsi.fr Abstract The explicit introduction
More informationThe MSR-NRC-SRI MT System for NIST Open Machine Translation 2008 Evaluation
The MSR-NRC-SRI MT System for NIST Open Machine Translation 2008 Evaluation AUTHORS AND AFFILIATIONS MSR: Xiaodong He, Jianfeng Gao, Chris Quirk, Patrick Nguyen, Arul Menezes, Robert Moore, Kristina Toutanova,
More informationCS 598 Natural Language Processing
CS 598 Natural Language Processing Natural language is everywhere Natural language is everywhere Natural language is everywhere Natural language is everywhere!"#$%&'&()*+,-./012 34*5665756638/9:;< =>?@ABCDEFGHIJ5KL@
More informationLanguage Model and Grammar Extraction Variation in Machine Translation
Language Model and Grammar Extraction Variation in Machine Translation Vladimir Eidelman, Chris Dyer, and Philip Resnik UMIACS Laboratory for Computational Linguistics and Information Processing Department
More informationImproved Reordering for Shallow-n Grammar based Hierarchical Phrase-based Translation
Improved Reordering for Shallow-n Grammar based Hierarchical Phrase-based Translation Baskaran Sankaran and Anoop Sarkar School of Computing Science Simon Fraser University Burnaby BC. Canada {baskaran,
More informationThe KIT-LIMSI Translation System for WMT 2014
The KIT-LIMSI Translation System for WMT 2014 Quoc Khanh Do, Teresa Herrmann, Jan Niehues, Alexandre Allauzen, François Yvon and Alex Waibel LIMSI-CNRS, Orsay, France Karlsruhe Institute of Technology,
More information11/29/2010. Statistical Parsing. Statistical Parsing. Simple PCFG for ATIS English. Syntactic Disambiguation
tatistical Parsing (Following slides are modified from Prof. Raymond Mooney s slides.) tatistical Parsing tatistical parsing uses a probabilistic model of syntax in order to assign probabilities to each
More informationNoisy SMS Machine Translation in Low-Density Languages
Noisy SMS Machine Translation in Low-Density Languages Vladimir Eidelman, Kristy Hollingshead, and Philip Resnik UMIACS Laboratory for Computational Linguistics and Information Processing Department of
More informationEnhancing Unlexicalized Parsing Performance using a Wide Coverage Lexicon, Fuzzy Tag-set Mapping, and EM-HMM-based Lexical Probabilities
Enhancing Unlexicalized Parsing Performance using a Wide Coverage Lexicon, Fuzzy Tag-set Mapping, and EM-HMM-based Lexical Probabilities Yoav Goldberg Reut Tsarfaty Meni Adler Michael Elhadad Ben Gurion
More informationSyntax Parsing 1. Grammars and parsing 2. Top-down and bottom-up parsing 3. Chart parsers 4. Bottom-up chart parsing 5. The Earley Algorithm
Syntax Parsing 1. Grammars and parsing 2. Top-down and bottom-up parsing 3. Chart parsers 4. Bottom-up chart parsing 5. The Earley Algorithm syntax: from the Greek syntaxis, meaning setting out together
More informationBANGLA TO ENGLISH TEXT CONVERSION USING OPENNLP TOOLS
Daffodil International University Institutional Repository DIU Journal of Science and Technology Volume 8, Issue 1, January 2013 2013-01 BANGLA TO ENGLISH TEXT CONVERSION USING OPENNLP TOOLS Uddin, Sk.
More informationarxiv: v1 [cs.cl] 2 Apr 2017
Word-Alignment-Based Segment-Level Machine Translation Evaluation using Word Embeddings Junki Matsuo and Mamoru Komachi Graduate School of System Design, Tokyo Metropolitan University, Japan matsuo-junki@ed.tmu.ac.jp,
More informationExploiting Phrasal Lexica and Additional Morpho-syntactic Language Resources for Statistical Machine Translation with Scarce Training Data
Exploiting Phrasal Lexica and Additional Morpho-syntactic Language Resources for Statistical Machine Translation with Scarce Training Data Maja Popović and Hermann Ney Lehrstuhl für Informatik VI, Computer
More informationTarget Language Preposition Selection an Experiment with Transformation-Based Learning and Aligned Bilingual Data
Target Language Preposition Selection an Experiment with Transformation-Based Learning and Aligned Bilingual Data Ebba Gustavii Department of Linguistics and Philology, Uppsala University, Sweden ebbag@stp.ling.uu.se
More informationThe Karlsruhe Institute of Technology Translation Systems for the WMT 2011
The Karlsruhe Institute of Technology Translation Systems for the WMT 2011 Teresa Herrmann, Mohammed Mediani, Jan Niehues and Alex Waibel Karlsruhe Institute of Technology Karlsruhe, Germany firstname.lastname@kit.edu
More informationGrammars & Parsing, Part 1:
Grammars & Parsing, Part 1: Rules, representations, and transformations- oh my! Sentence VP The teacher Verb gave the lecture 2015-02-12 CS 562/662: Natural Language Processing Game plan for today: Review
More informationPrediction of Maximal Projection for Semantic Role Labeling
Prediction of Maximal Projection for Semantic Role Labeling Weiwei Sun, Zhifang Sui Institute of Computational Linguistics Peking University Beijing, 100871, China {ws, szf}@pku.edu.cn Haifeng Wang Toshiba
More informationChunk Parsing for Base Noun Phrases using Regular Expressions. Let s first let the variable s0 be the sentence tree of the first sentence.
NLP Lab Session Week 8 October 15, 2014 Noun Phrase Chunking and WordNet in NLTK Getting Started In this lab session, we will work together through a series of small examples using the IDLE window and
More informationDetecting English-French Cognates Using Orthographic Edit Distance
Detecting English-French Cognates Using Orthographic Edit Distance Qiongkai Xu 1,2, Albert Chen 1, Chang i 1 1 The Australian National University, College of Engineering and Computer Science 2 National
More informationDomain Adaptation in Statistical Machine Translation of User-Forum Data using Component-Level Mixture Modelling
Domain Adaptation in Statistical Machine Translation of User-Forum Data using Component-Level Mixture Modelling Pratyush Banerjee, Sudip Kumar Naskar, Johann Roturier 1, Andy Way 2, Josef van Genabith
More informationDeveloping a TT-MCTAG for German with an RCG-based Parser
Developing a TT-MCTAG for German with an RCG-based Parser Laura Kallmeyer, Timm Lichte, Wolfgang Maier, Yannick Parmentier, Johannes Dellert University of Tübingen, Germany CNRS-LORIA, France LREC 2008,
More informationBasic Parsing with Context-Free Grammars. Some slides adapted from Julia Hirschberg and Dan Jurafsky 1
Basic Parsing with Context-Free Grammars Some slides adapted from Julia Hirschberg and Dan Jurafsky 1 Announcements HW 2 to go out today. Next Tuesday most important for background to assignment Sign up
More informationWhat Can Neural Networks Teach us about Language? Graham Neubig a2-dlearn 11/18/2017
What Can Neural Networks Teach us about Language? Graham Neubig a2-dlearn 11/18/2017 Supervised Training of Neural Networks for Language Training Data Training Model this is an example the cat went to
More informationOverview of the 3rd Workshop on Asian Translation
Overview of the 3rd Workshop on Asian Translation Toshiaki Nakazawa Chenchen Ding and Hideya Mino Japan Science and National Institute of Technology Agency Information and nakazawa@pa.jst.jp Communications
More informationNatural Language Processing. George Konidaris
Natural Language Processing George Konidaris gdk@cs.brown.edu Fall 2017 Natural Language Processing Understanding spoken/written sentences in a natural language. Major area of research in AI. Why? Humans
More informationAccurate Unlexicalized Parsing for Modern Hebrew
Accurate Unlexicalized Parsing for Modern Hebrew Reut Tsarfaty and Khalil Sima an Institute for Logic, Language and Computation, University of Amsterdam Plantage Muidergracht 24, 1018TV Amsterdam, The
More information2/15/13. POS Tagging Problem. Part-of-Speech Tagging. Example English Part-of-Speech Tagsets. More Details of the Problem. Typical Problem Cases
POS Tagging Problem Part-of-Speech Tagging L545 Spring 203 Given a sentence W Wn and a tagset of lexical categories, find the most likely tag T..Tn for each word in the sentence Example Secretariat/P is/vbz
More informationChinese Language Parsing with Maximum-Entropy-Inspired Parser
Chinese Language Parsing with Maximum-Entropy-Inspired Parser Heng Lian Brown University Abstract The Chinese language has many special characteristics that make parsing difficult. The performance of state-of-the-art
More informationUNIVERSITY OF OSLO Department of Informatics. Dialog Act Recognition using Dependency Features. Master s thesis. Sindre Wetjen
UNIVERSITY OF OSLO Department of Informatics Dialog Act Recognition using Dependency Features Master s thesis Sindre Wetjen November 15, 2013 Acknowledgments First I want to thank my supervisors Lilja
More informationThe NICT Translation System for IWSLT 2012
The NICT Translation System for IWSLT 2012 Andrew Finch Ohnmar Htun Eiichiro Sumita Multilingual Translation Group MASTAR Project National Institute of Information and Communications Technology Kyoto,
More informationAn Introduction to the Minimalist Program
An Introduction to the Minimalist Program Luke Smith University of Arizona Summer 2016 Some findings of traditional syntax Human languages vary greatly, but digging deeper, they all have distinct commonalities:
More informationCross Language Information Retrieval
Cross Language Information Retrieval RAFFAELLA BERNARDI UNIVERSITÀ DEGLI STUDI DI TRENTO P.ZZA VENEZIA, ROOM: 2.05, E-MAIL: BERNARDI@DISI.UNITN.IT Contents 1 Acknowledgment.............................................
More informationEdIt: A Broad-Coverage Grammar Checker Using Pattern Grammar
EdIt: A Broad-Coverage Grammar Checker Using Pattern Grammar Chung-Chi Huang Mei-Hua Chen Shih-Ting Huang Jason S. Chang Institute of Information Systems and Applications, National Tsing Hua University,
More informationCross-Lingual Dependency Parsing with Universal Dependencies and Predicted PoS Labels
Cross-Lingual Dependency Parsing with Universal Dependencies and Predicted PoS Labels Jörg Tiedemann Uppsala University Department of Linguistics and Philology firstname.lastname@lingfil.uu.se Abstract
More informationLinking Task: Identifying authors and book titles in verbose queries
Linking Task: Identifying authors and book titles in verbose queries Anaïs Ollagnier, Sébastien Fournier, and Patrice Bellot Aix-Marseille University, CNRS, ENSAM, University of Toulon, LSIS UMR 7296,
More informationSpecification and Evaluation of Machine Translation Toy Systems - Criteria for laboratory assignments
Specification and Evaluation of Machine Translation Toy Systems - Criteria for laboratory assignments Cristina Vertan, Walther v. Hahn University of Hamburg, Natural Language Systems Division Hamburg,
More informationParsing of part-of-speech tagged Assamese Texts
IJCSI International Journal of Computer Science Issues, Vol. 6, No. 1, 2009 ISSN (Online): 1694-0784 ISSN (Print): 1694-0814 28 Parsing of part-of-speech tagged Assamese Texts Mirzanur Rahman 1, Sufal
More informationProof Theory for Syntacticians
Department of Linguistics Ohio State University Syntax 2 (Linguistics 602.02) January 5, 2012 Logics for Linguistics Many different kinds of logic are directly applicable to formalizing theories in syntax
More informationImpact of Controlled Language on Translation Quality and Post-editing in a Statistical Machine Translation Environment
Impact of Controlled Language on Translation Quality and Post-editing in a Statistical Machine Translation Environment Takako Aikawa, Lee Schwartz, Ronit King Mo Corston-Oliver Carmen Lozano Microsoft
More information1/20 idea. We ll spend an extra hour on 1/21. based on assigned readings. so you ll be ready to discuss them in class
If we cancel class 1/20 idea We ll spend an extra hour on 1/21 I ll give you a brief writing problem for 1/21 based on assigned readings Jot down your thoughts based on your reading so you ll be ready
More informationApplications of memory-based natural language processing
Applications of memory-based natural language processing Antal van den Bosch and Roser Morante ILK Research Group Tilburg University Prague, June 24, 2007 Current ILK members Principal investigator: Antal
More informationSemi-supervised methods of text processing, and an application to medical concept extraction. Yacine Jernite Text-as-Data series September 17.
Semi-supervised methods of text processing, and an application to medical concept extraction Yacine Jernite Text-as-Data series September 17. 2015 What do we want from text? 1. Extract information 2. Link
More informationEvaluation of a Simultaneous Interpretation System and Analysis of Speech Log for User Experience Assessment
Evaluation of a Simultaneous Interpretation System and Analysis of Speech Log for User Experience Assessment Akiko Sakamoto, Kazuhiko Abe, Kazuo Sumita and Satoshi Kamatani Knowledge Media Laboratory,
More informationLearning Methods in Multilingual Speech Recognition
Learning Methods in Multilingual Speech Recognition Hui Lin Department of Electrical Engineering University of Washington Seattle, WA 98125 linhui@u.washington.edu Li Deng, Jasha Droppo, Dong Yu, and Alex
More informationSEMAFOR: Frame Argument Resolution with Log-Linear Models
SEMAFOR: Frame Argument Resolution with Log-Linear Models Desai Chen or, The Case of the Missing Arguments Nathan Schneider SemEval July 16, 2010 Dipanjan Das School of Computer Science Carnegie Mellon
More informationLTAG-spinal and the Treebank
LTAG-spinal and the Treebank a new resource for incremental, dependency and semantic parsing Libin Shen (lshen@bbn.com) BBN Technologies, 10 Moulton Street, Cambridge, MA 02138, USA Lucas Champollion (champoll@ling.upenn.edu)
More informationWhat is NLP? CS 188: Artificial Intelligence Spring Why is Language Hard? The Big Open Problems. Information Extraction. Machine Translation
C 188: Artificial Intelligence pring 2006 What is NLP? Lecture 27: NLP 4/27/2006 Dan Klein UC Berkeley Fundamental goal: deep understand of broad language Not just string processing or keyword matching!
More informationAnnotation Projection for Discourse Connectives
SFB 833 / Univ. Tübingen Penn Discourse Treebank Workshop Annotation projection Basic idea: Given a bitext E/F and annotation for F, how would the annotation look for E? Examples: Word Sense Disambiguation
More informationLING 329 : MORPHOLOGY
LING 329 : MORPHOLOGY TTh 10:30 11:50 AM, Physics 121 Course Syllabus Spring 2013 Matt Pearson Office: Vollum 313 Email: pearsonm@reed.edu Phone: 7618 (off campus: 503-517-7618) Office hrs: Mon 1:30 2:30,
More informationMETHODS FOR EXTRACTING AND CLASSIFYING PAIRS OF COGNATES AND FALSE FRIENDS
METHODS FOR EXTRACTING AND CLASSIFYING PAIRS OF COGNATES AND FALSE FRIENDS Ruslan Mitkov (R.Mitkov@wlv.ac.uk) University of Wolverhampton ViktorPekar (v.pekar@wlv.ac.uk) University of Wolverhampton Dimitar
More informationControl and Boundedness
Control and Boundedness Having eliminated rules, we would expect constructions to follow from the lexical categories (of heads and specifiers of syntactic constructions) alone. Combinatory syntax simply
More informationTowards a Machine-Learning Architecture for Lexical Functional Grammar Parsing. Grzegorz Chrupa la
Towards a Machine-Learning Architecture for Lexical Functional Grammar Parsing Grzegorz Chrupa la A dissertation submitted in fulfilment of the requirements for the award of Doctor of Philosophy (Ph.D.)
More informationLanguage Independent Passage Retrieval for Question Answering
Language Independent Passage Retrieval for Question Answering José Manuel Gómez-Soriano 1, Manuel Montes-y-Gómez 2, Emilio Sanchis-Arnal 1, Luis Villaseñor-Pineda 2, Paolo Rosso 1 1 Polytechnic University
More informationMachine Learning from Garden Path Sentences: The Application of Computational Linguistics
Machine Learning from Garden Path Sentences: The Application of Computational Linguistics http://dx.doi.org/10.3991/ijet.v9i6.4109 J.L. Du 1, P.F. Yu 1 and M.L. Li 2 1 Guangdong University of Foreign Studies,
More informationThe Strong Minimalist Thesis and Bounded Optimality
The Strong Minimalist Thesis and Bounded Optimality DRAFT-IN-PROGRESS; SEND COMMENTS TO RICKL@UMICH.EDU Richard L. Lewis Department of Psychology University of Michigan 27 March 2010 1 Purpose of this
More informationParallel Evaluation in Stratal OT * Adam Baker University of Arizona
Parallel Evaluation in Stratal OT * Adam Baker University of Arizona tabaker@u.arizona.edu 1.0. Introduction The model of Stratal OT presented by Kiparsky (forthcoming), has not and will not prove uncontroversial
More informationGreedy Decoding for Statistical Machine Translation in Almost Linear Time
in: Proceedings of HLT-NAACL 23. Edmonton, Canada, May 27 June 1, 23. This version was produced on April 2, 23. Greedy Decoding for Statistical Machine Translation in Almost Linear Time Ulrich Germann
More informationThe presence of interpretable but ungrammatical sentences corresponds to mismatches between interpretive and productive parsing.
Lecture 4: OT Syntax Sources: Kager 1999, Section 8; Legendre et al. 1998; Grimshaw 1997; Barbosa et al. 1998, Introduction; Bresnan 1998; Fanselow et al. 1999; Gibson & Broihier 1998. OT is not a theory
More informationEnsemble Technique Utilization for Indonesian Dependency Parser
Ensemble Technique Utilization for Indonesian Dependency Parser Arief Rahman Institut Teknologi Bandung Indonesia 23516008@std.stei.itb.ac.id Ayu Purwarianti Institut Teknologi Bandung Indonesia ayu@stei.itb.ac.id
More informationThe stages of event extraction
The stages of event extraction David Ahn Intelligent Systems Lab Amsterdam University of Amsterdam ahn@science.uva.nl Abstract Event detection and recognition is a complex task consisting of multiple sub-tasks
More informationParsing with Treebank Grammars: Empirical Bounds, Theoretical Models, and the Structure of the Penn Treebank
Parsing with Treebank Grammars: Empirical Bounds, Theoretical Models, and the Structure of the Penn Treebank Dan Klein and Christopher D. Manning Computer Science Department Stanford University Stanford,
More informationThe Smart/Empire TIPSTER IR System
The Smart/Empire TIPSTER IR System Chris Buckley, Janet Walz Sabir Research, Gaithersburg, MD chrisb,walz@sabir.com Claire Cardie, Scott Mardis, Mandar Mitra, David Pierce, Kiri Wagstaff Department of
More informationAdding syntactic structure to bilingual terminology for improved domain adaptation
Adding syntactic structure to bilingual terminology for improved domain adaptation Mikel Artetxe 1, Gorka Labaka 1, Chakaveh Saedi 2, João Rodrigues 2, João Silva 2, António Branco 2, Eneko Agirre 1 1
More informationhave to be modeled) or isolated words. Output of the system is a grapheme-tophoneme conversion system which takes as its input the spelling of words,
A Language-Independent, Data-Oriented Architecture for Grapheme-to-Phoneme Conversion Walter Daelemans and Antal van den Bosch Proceedings ESCA-IEEE speech synthesis conference, New York, September 1994
More informationThe taming of the data:
The taming of the data: Using text mining in building a corpus for diachronic analysis Stefania Degaetano-Ortlieb, Hannah Kermes, Ashraf Khamis, Jörg Knappen, Noam Ordan and Elke Teich Background Big data
More informationConstraining X-Bar: Theta Theory
Constraining X-Bar: Theta Theory Carnie, 2013, chapter 8 Kofi K. Saah 1 Learning objectives Distinguish between thematic relation and theta role. Identify the thematic relations agent, theme, goal, source,
More informationDEVELOPMENT OF A MULTILINGUAL PARALLEL CORPUS AND A PART-OF-SPEECH TAGGER FOR AFRIKAANS
DEVELOPMENT OF A MULTILINGUAL PARALLEL CORPUS AND A PART-OF-SPEECH TAGGER FOR AFRIKAANS Julia Tmshkina Centre for Text Techitology, North-West University, 253 Potchefstroom, South Africa 2025770@puk.ac.za
More informationIntroduction to HPSG. Introduction. Historical Overview. The HPSG architecture. Signature. Linguistic Objects. Descriptions.
to as a linguistic theory to to a member of the family of linguistic frameworks that are called generative grammars a grammar which is formalized to a high degree and thus makes exact predictions about
More informationUsing dialogue context to improve parsing performance in dialogue systems
Using dialogue context to improve parsing performance in dialogue systems Ivan Meza-Ruiz and Oliver Lemon School of Informatics, Edinburgh University 2 Buccleuch Place, Edinburgh I.V.Meza-Ruiz@sms.ed.ac.uk,
More informationLQVSumm: A Corpus of Linguistic Quality Violations in Multi-Document Summarization
LQVSumm: A Corpus of Linguistic Quality Violations in Multi-Document Summarization Annemarie Friedrich, Marina Valeeva and Alexis Palmer COMPUTATIONAL LINGUISTICS & PHONETICS SAARLAND UNIVERSITY, GERMANY
More informationProduct Feature-based Ratings foropinionsummarization of E-Commerce Feedback Comments
Product Feature-based Ratings foropinionsummarization of E-Commerce Feedback Comments Vijayshri Ramkrishna Ingale PG Student, Department of Computer Engineering JSPM s Imperial College of Engineering &
More informationModeling Attachment Decisions with a Probabilistic Parser: The Case of Head Final Structures
Modeling Attachment Decisions with a Probabilistic Parser: The Case of Head Final Structures Ulrike Baldewein (ulrike@coli.uni-sb.de) Computational Psycholinguistics, Saarland University D-66041 Saarbrücken,
More informationChapter 4: Valence & Agreement CSLI Publications
Chapter 4: Valence & Agreement Reminder: Where We Are Simple CFG doesn t allow us to cross-classify categories, e.g., verbs can be grouped by transitivity (deny vs. disappear) or by number (deny vs. denies).
More informationA Minimalist Approach to Code-Switching. In the field of linguistics, the topic of bilingualism is a broad one. There are many
Schmidt 1 Eric Schmidt Prof. Suzanne Flynn Linguistic Study of Bilingualism December 13, 2013 A Minimalist Approach to Code-Switching In the field of linguistics, the topic of bilingualism is a broad one.
More informationDerivational: Inflectional: In a fit of rage the soldiers attacked them both that week, but lost the fight.
Final Exam (120 points) Click on the yellow balloons below to see the answers I. Short Answer (32pts) 1. (6) The sentence The kinder teachers made sure that the students comprehended the testable material
More informationConstructing Parallel Corpus from Movie Subtitles
Constructing Parallel Corpus from Movie Subtitles Han Xiao 1 and Xiaojie Wang 2 1 School of Information Engineering, Beijing University of Post and Telecommunications artex.xh@gmail.com 2 CISTR, Beijing
More informationLearning Structural Correspondences Across Different Linguistic Domains with Synchronous Neural Language Models
Learning Structural Correspondences Across Different Linguistic Domains with Synchronous Neural Language Models Stephan Gouws and GJ van Rooyen MIH Medialab, Stellenbosch University SOUTH AFRICA {stephan,gvrooyen}@ml.sun.ac.za
More informationModeling full form lexica for Arabic
Modeling full form lexica for Arabic Susanne Alt Amine Akrout Atilf-CNRS Laurent Romary Loria-CNRS Objectives Presentation of the current standardization activity in the domain of lexical data modeling
More informationBeyond the Pipeline: Discrete Optimization in NLP
Beyond the Pipeline: Discrete Optimization in NLP Tomasz Marciniak and Michael Strube EML Research ggmbh Schloss-Wolfsbrunnenweg 33 69118 Heidelberg, Germany http://www.eml-research.de/nlp Abstract We
More informationUnderlying and Surface Grammatical Relations in Greek consider
0 Underlying and Surface Grammatical Relations in Greek consider Sentences Brian D. Joseph The Ohio State University Abbreviated Title Grammatical Relations in Greek consider Sentences Brian D. Joseph
More informationProject in the framework of the AIM-WEST project Annotation of MWEs for translation
Project in the framework of the AIM-WEST project Annotation of MWEs for translation 1 Agnès Tutin LIDILEM/LIG Université Grenoble Alpes 30 october 2014 Outline 2 Why annotate MWEs in corpora? A first experiment
More informationSpeech Recognition at ICSI: Broadcast News and beyond
Speech Recognition at ICSI: Broadcast News and beyond Dan Ellis International Computer Science Institute, Berkeley CA Outline 1 2 3 The DARPA Broadcast News task Aspects of ICSI
More informationInformatics 2A: Language Complexity and the. Inf2A: Chomsky Hierarchy
Informatics 2A: Language Complexity and the Chomsky Hierarchy September 28, 2010 Starter 1 Is there a finite state machine that recognises all those strings s from the alphabet {a, b} where the difference
More informationM55205-Mastering Microsoft Project 2016
M55205-Mastering Microsoft Project 2016 Course Number: M55205 Category: Desktop Applications Duration: 3 days Certification: Exam 70-343 Overview This three-day, instructor-led course is intended for individuals
More informationENGBG1 ENGBL1 Campus Linguistics. Meeting 2. Chapter 7 (Morphology) and chapter 9 (Syntax) Pia Sundqvist
Meeting 2 Chapter 7 (Morphology) and chapter 9 (Syntax) Today s agenda Repetition of meeting 1 Mini-lecture on morphology Seminar on chapter 7, worksheet Mini-lecture on syntax Seminar on chapter 9, worksheet
More informationAQUA: An Ontology-Driven Question Answering System
AQUA: An Ontology-Driven Question Answering System Maria Vargas-Vera, Enrico Motta and John Domingue Knowledge Media Institute (KMI) The Open University, Walton Hall, Milton Keynes, MK7 6AA, United Kingdom.
More informationSome Principles of Automated Natural Language Information Extraction
Some Principles of Automated Natural Language Information Extraction Gregers Koch Department of Computer Science, Copenhagen University DIKU, Universitetsparken 1, DK-2100 Copenhagen, Denmark Abstract
More informationThe Interface between Phrasal and Functional Constraints
The Interface between Phrasal and Functional Constraints John T. Maxwell III* Xerox Palo Alto Research Center Ronald M. Kaplan t Xerox Palo Alto Research Center Many modern grammatical formalisms divide
More informationTowards a MWE-driven A* parsing with LTAGs [WG2,WG3]
Towards a MWE-driven A* parsing with LTAGs [WG2,WG3] Jakub Waszczuk, Agata Savary To cite this version: Jakub Waszczuk, Agata Savary. Towards a MWE-driven A* parsing with LTAGs [WG2,WG3]. PARSEME 6th general
More informationCOMPUTATIONAL COMPLEXITY OF LEFT-ASSOCIATIVE GRAMMAR
COMPUTATIONAL COMPLEXITY OF LEFT-ASSOCIATIVE GRAMMAR ROLAND HAUSSER Institut für Deutsche Philologie Ludwig-Maximilians Universität München München, West Germany 1. CHOICE OF A PRIMITIVE OPERATION The
More informationSemantic Inference at the Lexical-Syntactic Level for Textual Entailment Recognition
Semantic Inference at the Lexical-Syntactic Level for Textual Entailment Recognition Roy Bar-Haim,Ido Dagan, Iddo Greental, Idan Szpektor and Moshe Friedman Computer Science Department, Bar-Ilan University,
More informationTINE: A Metric to Assess MT Adequacy
TINE: A Metric to Assess MT Adequacy Miguel Rios, Wilker Aziz and Lucia Specia Research Group in Computational Linguistics University of Wolverhampton Stafford Street, Wolverhampton, WV1 1SB, UK {m.rios,
More informationThree New Probabilistic Models. Jason M. Eisner. CIS Department, University of Pennsylvania. 200 S. 33rd St., Philadelphia, PA , USA
Three New Probabilistic Models for Dependency Parsing: An Exploration Jason M. Eisner CIS Department, University of Pennsylvania 200 S. 33rd St., Philadelphia, PA 19104-6389, USA jeisner@linc.cis.upenn.edu
More informationCharacter Stream Parsing of Mixed-lingual Text
Character Stream Parsing of Mixed-lingual Text Harald Romsdorfer and Beat Pfister Speech Processing Group Computer Engineering and Networks Laboratory ETH Zurich {romsdorfer,pfister}@tik.ee.ethz.ch Abstract
More informationInleiding Taalkunde. Docent: Paola Monachesi. Blok 4, 2001/ Syntax 2. 2 Phrases and constituent structure 2. 3 A minigrammar of Italian 3
Inleiding Taalkunde Docent: Paola Monachesi Blok 4, 2001/2002 Contents 1 Syntax 2 2 Phrases and constituent structure 2 3 A minigrammar of Italian 3 4 Trees 3 5 Developing an Italian lexicon 4 6 S(emantic)-selection
More informationPOS tagging of Chinese Buddhist texts using Recurrent Neural Networks
POS tagging of Chinese Buddhist texts using Recurrent Neural Networks Longlu Qin Department of East Asian Languages and Cultures longlu@stanford.edu Abstract Chinese POS tagging, as one of the most important
More informationClickthrough-Based Translation Models for Web Search: from Word Models to Phrase Models
Clickthrough-Based Translation Models for Web Search: from Word Models to Phrase Models Jianfeng Gao Microsoft Research One Microsoft Way Redmond, WA 98052 USA jfgao@microsoft.com Xiaodong He Microsoft
More informationVocabulary Usage and Intelligibility in Learner Language
Vocabulary Usage and Intelligibility in Learner Language Emi Izumi, 1 Kiyotaka Uchimoto 1 and Hitoshi Isahara 1 1. Introduction In verbal communication, the primary purpose of which is to convey and understand
More informationUniversity of Alberta. Large-Scale Semi-Supervised Learning for Natural Language Processing. Shane Bergsma
University of Alberta Large-Scale Semi-Supervised Learning for Natural Language Processing by Shane Bergsma A thesis submitted to the Faculty of Graduate Studies and Research in partial fulfillment of
More information