Outline. Factored translation models. Outline. Factored translation models N-gram-based translation models Hiero Syntax-based translation systems

Size: px
Start display at page:

Download "Outline. Factored translation models. Outline. Factored translation models N-gram-based translation models Hiero Syntax-based translation systems"

Transcription

1 Maxim Khalilov TAU Labs Amsterdam Marta R. Costa-jussà Barcelona Media Barcelona Outline Factored translation models N-gram-based translation models yntax-based translation systems RuIR 2012 August 5-10, of 61 Outline Factored translation models N-gram-based translation models yntax-based translation systems Factored translation models Factored translation models are an extension to phrasebased models where every word is substituted by a vector of factors. (word) = (word, lemma, Po, morphology,...) The translation is now a combination of pure translation (T) and generation (G) steps: lemma f Po f morphology f word f T T T G lemma e Po e morphology e word e 2 of 61 4 of 61

2 Factored translation models What differs in factored translation models (as compared to standard phrase-based models) The parallel corpus must be annotated beforehand. Extra language models for every factor can also be used. Translation steps are accomplished in a similar way. Generation steps imply a training only on the target side of the corpus. Models corresponding to the different factors and components are combined in a log-linear fashion. Factored translation models English German Model BLEU best published result 18.15% baseline (surface) 18.04% surface + PO 18.15% surface + PO + morph 18.22% English panish Model BLEU baseline (surface) 23.41% surface + morph 24.66% surface + PO + morph 24.25% English Czech Model BLEU baseline (surface) 25.82% surface + all morph 27.04% surface + case/number/gender 27.45% surface + CNG/verb/prepositions 27.62% 5 of 61 7 of 61 Factored translation models Outline Factored Representation Input Output word lemma word lemma Factored Model: transfer and generation word lemma Input Output word lemma Factored translation models N-gram-based translation models yntax-based translation systems PO PO PO PO morphology morphology morphology morphology word class word class word class word class 6 of 61 8 of 61

3 N-gram-based translation models N-gram-based translation models Log-linear combination of feature functions: { M } ˆt I 1 = arg max t I 1 m=1 λ m h m (s J 1,t I 1) Bilingual N-gram translation language model target language model word bonus model source target lexicon model (ibm1 probs.) target source lexicon model (ibm1 probs.) target PO language model 9 of 61 Tuples are extracted from word alignment A unique, monotonous segmentation of each sentence pair is produced. No word in a tuple is aligned to words outside of it No smaller tuples can be extracted without violating the previous constraints 11 of 61 N-gram-based translation models N-gram-based translation models h BM (s J 1,t I 1) = log K p((s,t) i (s,t) i N+1,...,(s,t) i 1 ) i=1 Given a word alignment, tuples T i = (s,t) i are those bilingual units: having minimal length describing a monotonic segmentation of each sentence pair K h BM (s J 1,t I 1) = log p((s,t) i (s,t) i N+1,...,(s,t) i 1 ) i=1 Given a word alignment, tuples T i = (s,t) i are those bilingual units: having minimal length describing a monotonic segmentation of each sentence pair Constraints define a unique possible segmentation Except: NULL-source tuples Constraints define a unique possible segmentation Except: NULL-source tuples 10 of of 61

4 N-gram-based translation models Feature functions: target language model: h TM (s,t) = h TM (t) = log K p(w k w k N+1,...,w k 1 ) k=1 word bonus model: h WB (s,t) = h WB (t) = K source target lexicon model: h LE (s,t) = log 1 (I + 1) J J j=1 i=0 target source lexicon model (analogous) 13 of 61 I p IBM1 (t n j s n i ) N-gram-based translation models WMT 09 (large corpus, Es<->En): panish-to-english English-to-panish ystem BLEU Constr. Rank. ystem BLEU Constr. Rank. GOOGLE 0.29 NO.70 GOOGLE 0.28 NO.65 UEDIN 0.26 YE.56 NU 0.25 YE.59 UPC-TALP 0.26 (2-3) YE.59 (2) UEDIN 0.25 YE.66 NICT 0.22 YE.37 UPC-TALP 0.25 (2-4) YE.58 (5) RBMT 0.20 NO.55 RBMT 0.22 NO.64 AAR 0.20 NO.51 RWTH 0.22 YE AAR 0.20 NO.48 IWLT 08 (small amount of data, Zh<->En): ystem Zh2Es Zh2(En)2Es Clean AR Clean AR ystem BLEU BLEU Rank. BLEU BLEU Rank. TCH TCH FBK FBK DCU UPC-TALP (3) (3-4) (3) TTK NICT NICT DCU PT TTK UPC-TALP (7) (6) (6) GREYC GREYC QMUL of 61 N-gram-based translation models N-gram-based translation models Decoding: freely available MARIE decoder [Crego et al., 2005] (beam search with hypothesis recombination, threshold and histogram pruning) no rescoring module (1-best output used) monotone and reordered search José B. Mariño, Rafael Banchs, Josep M. Crego, Adrià de Gispert, Patrik Lambert, José A. R. Fonollosa, Marta R. Costa-jussà. N-gram-based Machine Translation. Computational Linguistics, 2006 Feature function weights optimization: Downhill implex Method 14 of of 61

5 Outline Factored translation models N-gram-based translation models yntax-based translation systems A Chinese sentence: Why the difference in word order? Aozhou shi yu Beihan you bangjiao de shaoshu guojia zhiyi Australia is with North Korea have dipl. rels. that few The English translation: countries one of Australia is one of the few countries that have diplomatic relations with North Korea shaoshuguojiazhiyi oneofthefewcountries 17 of of 61 A Motivating Example from Chiang 2007 A Chinese sentence: Aozhou shi yu Beihan you bangjiao de shaoshu guojia zhiyi Why the difference in word order? A Chinese sentence: Aozhou shi yu Beihan you bangjiao de shaoshu guojia zhiyi Australia is with North Korea have dipl. rels. that few countries one of Australia is with North Korea have dipl. rels. that few countries one of The English translation: Australia is one of the few countries that have diplomatic relations with North Korea The English translation: Australia is one of the few countries that have diplomatic relations with North Korea shaoshuguojiazhiyi oneofthefewcountries yubeihan youbangjiao de that have diplomatic relations with North Korea 18 of of 61

6 A Chinese sentence: Why the difference in word order? Aozhou shi yu Beihan you bangjiao de shaoshu guojia zhiyi A Chinese sentence: Why the difference in word order? Aozhou shi yu Beihan you bangjiao de shaoshu guojia zhiyi Australia is with North Korea have dipl. rels. that few countries one of Australia is with North Korea have dipl. rels. that few countries one of The English translation: Australia is one of the few countries that have diplomatic relations with North Korea shaoshu guojia zhiyi one of the few countries yubeihan youbangjiao de that have diplomatic relations with North Korea The English translation: Australia is one of the few countries that have diplomatic relations with North Korea shaoshu guojia zhiyi one of the few countries yubeihan youbangjiao de that have diplomatic relations with North Korea 21 of of 61 A Motivating Example from Chiang 2007 A Chinese sentence: Aozhou shi yu Beihan you bangjiao de shaoshu guojia zhiyi A olution: Hierarchical Phrases A Chinese sentence: Aozhou shi yu Beihan you bangjiao de shaoshu guojia zhiyi Australia is with North Korea have dipl. rels. that few countries one of Australia is with North Korea have dipl. rels. that few countries one of The English translation: Australia is one of the few countries that have diplomatic relations with North Korea Output from a phrase-based system: [Aozhou] [shi] 1 [yubeihan] 2 [you] [bangjiao] [deshaoshuguojiazhiyi] [Australia] [has] [dipl.rels.] [withnorthkorea] 2 [is] 1 [oneofthefewcountries] Hierarchical phrases needed for this example: yu 1 you 2,have 2 with 1 1 de 2,the 2 that 1 1 zhiyi,oneof 1 (We ll see how to formalize this next.) 22 of of 61

7 Examples of s-cfg Rules(from Chiang 2007) yu 1 you 2,have 2 with 1 1 de 2,the 2 that 1 1 zhiyi,oneof 1 An invalid s-cfg rule: Examples of s-cfg Rules V P PP 1 you NP 2,have NP 2 1 This rule is invalid because a PP corresponds to an. Nonterminals that correspond to each other must be the same. Note:theserulesmakeuseofasinglenon-terminal, We use subscripts such as 1, 2 to specify which non-terminals correspond to each other. 25 of of 61 Another valid s-cfg rule: Examples of s-cfg Rules V P PP 1 you NP 2,have NP 2 PP 1 In this case three non-terminals, NP, PP, and V P are used. The above rule is perfectly valid in an s-cfg. However, Chiang s grammar only makes use of two non-terminals: and. Intuition Behind Translation with an s-cfg Firststep:wecanreadoffaCFGforChinesefromthes-CFG, andparsethechinesewiththiscfg For example, yu 1 you 2,have 2 with 1 implies the Chinese-only context-free rule and yu you bangjiao,diplomaticrelations implies the Chinese-only context-free rule bangjiao 26 of of 61

8 Intuition Behind Translation with an s-cfg The resulting CFG for Chinese: yu you de zhiyi Aozhu Beihan shi bangjiao shaoshuguojia Intuition Behind Translation with an s-cfg Firststep:wecanreadoffaCFGforChinesefromthes-CFG, andparsethechinesewiththiscfg econdstep:weusethesynchronousrulestomapthechinese parsetreetoanenglishparsetree 29 of of 61 Aparsetreeforourexample: tart bottom-up. For example, Aozhu, Australia gives: shi zhiyi shi zhiyi Aozhou Australia de de yu you shaoshu guojia yu you shaoshu guojia Beihan bangjiao Beihan bangjiao 30 of of 61

9 The tree after all the lowest-level rules are applied: Use 1 de 2,the 2 that 1 toget: Australia is zhiyi Australia is zhiyi de yu you few countries the that N. Korea dipl. rels. few countries have with 33 of of 61 Next, apply higher-level rules. For example, use yu 1 you 2,have 2 with 1 toget: Use 1 zhiyi,oneof 1 toget: Australia is one of Australia is zhiyi the that few countries have with de dipl. rels. N. Korea have with few countries dipl. rels. N. Korea 34 of of 61

10 What is missing here, but can be found in the paper: Derivation in CFG Learning a CFG grammar Probability calculation But, here are some results: Results from Chiang 2007 Outline Factored translation models N-gram-based translation models yntax-based translation systems MT03 MT04 MT05 ATA Results are for translation from Chinese to English. MT03, MT04,andMT05are3differenttestsets.AllscoresareBleu scores. 37 of of 61 yntax-based MT Two camps: yntax will improve translation (K. Knight) impler data-driven models will always win (F. Och) Joshua: open-source toolkit for parsing-based MT 38 of of 61

11 yntax-based MT NP E There MD will AU be JJR more ADJP VP NN divisiveness ADJP IN than JJ positive NN effects Elle aura de les effets plus destructifs que positifs VP PP Fox (2002) NP Gloss: It will have effects more destructive than positive Phrases are not coherent in bitexts 41 of 61 yntax-based MT WHY???? 43 of 61 yntax-based MT IBM Model PBMT PBMT w/syntactic phrases k 20k 40k 80k 160k 320k Koehn et al (2003) yntax-based MT WHAT HOULD WE DO???? 42 of of 61

12 yntax-based MT yntactic translation models incorporate syntax to the source and/or target languages. yntax-based MT Example of a string-to-tree translation system: interlingua yntactic phrase-based based on tree trasducers: foreign semantics foreign syntax english semantics english syntax Tree-to-string. Build mappings from target parse trees to source strings. tring-to-tree. Build mappings from target strings to source parse trees. Tree-to-tree. Mappings from parse trees to parse trees. foreign words english words Use of English syntax trees[yamada and Knight, 2001] exploit rich resources on the English side obtained with statistical parser[collins, 1997] flattened tree to allow more reorderings works well with syntactic language model 45 of of 61 yntax-based MT Advantages of syntax-based translation: yntax-based MT Example of a string-to-tree translation system: Reordering for syntactic reasons PRP VB VB1 VB2 reorder PRP VB VB2 VB1 e.g., move German object to end of sentence he adores VB TO he TO VB adores Better explanation for function words listening TO MN to music MN music TO to listening e.g., prepositions, determiners Conditioning to syntactically related words VB PRP VB2 VB1 he ha TO VB ga adores desu VB insert PRP VB2 VB1 kare ha TO VB ga daisuki desu translation of verb may depend on subject or object MN music TO to listening no translate MN ongaku TO wo kiku no Use of syntactic language models take leaves Kare ha ongaku wo kiku no ga daisuki desu 46 of of 61

13 yntax-based MT yntax-based MT Decoding as parsing: Original Order Reordering p(reorder original) PRP VB1 VB2 PRP VB1 VB PRPVB1VB2 PRPVB2VB PRP VB1 VB2 VB1 PRP VB PRP VB1 VB2 VB1 VB2 PRP PRP VB1 VB2 VB2 PRP VB PRP VB1 VB2 VB2 VB1 PRP VB TO VB TO VB TO TO VB Chart Parsing PRP NN TO he music to kare ha ongaku wo kiku no ga daisuki desu Pick Japanese words Translate into tree stumps TO NN TO NN TO NN NN TO of of 61 yntax-based MT Decoding as parsing: yntax-based MT Decoding as parsing: Chart Parsing PRP he kare ha ongaku wo kiku no ga daisuki desu PP PRP NN TO he music to kare ha ongaku wo kiku no ga daisuki desu Pick Japanese words Adding some more entries... Translate into tree stumps 50 of of 61

14 yntax-based MT Decoding as parsing: yntax-based MT Decoding as parsing: PP PRP NN TO VB he music to listening kare ha ongaku wo kiku no ga daisuki desu Combine entries VB2 PP PRP NN TO VB VB1 he music to listening adores kare ha ongaku wo kiku no ga daisuki desu 53 of of 61 yntax-based MT Decoding as parsing: yntax-based MT Decoding as parsing: VB VB2 VB2 PP PRP NN TO VB he music to listening kare ha ongaku wo kiku no ga daisuki desu PP PRP NN TO VB VB1 he music to listening adores kare ha ongaku wo kiku no ga daisuki desu Finished when all foreign words covered 54 of of 61

15 yntax-based MT How realistic is this model? Do English trees match foreign strings? Crossings between French-English[Fox, 2002] per sentence, depending on how it is measured Canbereducedby flattening tree, as done by[yamada and Knight, 2001] detecting phrasal translation special treatment for small number of constructions Most coherence between dependency structures yntax-based MT Other syntax-based systems: U.Alberta (Microsoft): treelet translation Translating from English Using dependency parser in English Project dependency treeinto language for training Map parts of the dependency tree ( treelets ) into foreign language Reranking phrase-based MT output with syntactic features Create n-best lists with phrase-based MT PO-tag and parse candidate translation Rerank with sybtactic features 57 of of 61 yntax-based MT Other syntax-based systems: yntax-aided phrase-based MT (Koehn, 2005) tick with phrase-based systems pecial treatment for special syntactic problems (NP treatment, clause restructuring) II: extended work of Yamada and Knight More complex rules Perfromance approaching phrase-based Prague: translation via dependency structures Parallel Czech-English treebank Tecto-grammatical translation model yntax-based MT o, syntax: does it help? Not yet best systems still phrase-based, treat words as tokens Well,maybe... work on reordering German automatically trained tree transfer systems promising Whynotyet? ifrealsyntax,weneedgoodparsers aretheygoodenough? syntactic annotations add a level of complexity difficult to handle, slow to train and decode few researchers good at statistical modeling and understand syntactic theories 58 of of 61

16 Next session Practical workshop: let s make our hands dirty!!! 61 of 61

Context Free Grammars. Many slides from Michael Collins

Context Free Grammars. Many slides from Michael Collins Context Free Grammars Many slides from Michael Collins Overview I An introduction to the parsing problem I Context free grammars I A brief(!) sketch of the syntax of English I Examples of ambiguous structures

More information

Training and evaluation of POS taggers on the French MULTITAG corpus

Training and evaluation of POS taggers on the French MULTITAG corpus Training and evaluation of POS taggers on the French MULTITAG corpus A. Allauzen, H. Bonneau-Maynard LIMSI/CNRS; Univ Paris-Sud, Orsay, F-91405 {allauzen,maynard}@limsi.fr Abstract The explicit introduction

More information

The MSR-NRC-SRI MT System for NIST Open Machine Translation 2008 Evaluation

The MSR-NRC-SRI MT System for NIST Open Machine Translation 2008 Evaluation The MSR-NRC-SRI MT System for NIST Open Machine Translation 2008 Evaluation AUTHORS AND AFFILIATIONS MSR: Xiaodong He, Jianfeng Gao, Chris Quirk, Patrick Nguyen, Arul Menezes, Robert Moore, Kristina Toutanova,

More information

CS 598 Natural Language Processing

CS 598 Natural Language Processing CS 598 Natural Language Processing Natural language is everywhere Natural language is everywhere Natural language is everywhere Natural language is everywhere!"#$%&'&()*+,-./012 34*5665756638/9:;< =>?@ABCDEFGHIJ5KL@

More information

Language Model and Grammar Extraction Variation in Machine Translation

Language Model and Grammar Extraction Variation in Machine Translation Language Model and Grammar Extraction Variation in Machine Translation Vladimir Eidelman, Chris Dyer, and Philip Resnik UMIACS Laboratory for Computational Linguistics and Information Processing Department

More information

Improved Reordering for Shallow-n Grammar based Hierarchical Phrase-based Translation

Improved Reordering for Shallow-n Grammar based Hierarchical Phrase-based Translation Improved Reordering for Shallow-n Grammar based Hierarchical Phrase-based Translation Baskaran Sankaran and Anoop Sarkar School of Computing Science Simon Fraser University Burnaby BC. Canada {baskaran,

More information

The KIT-LIMSI Translation System for WMT 2014

The KIT-LIMSI Translation System for WMT 2014 The KIT-LIMSI Translation System for WMT 2014 Quoc Khanh Do, Teresa Herrmann, Jan Niehues, Alexandre Allauzen, François Yvon and Alex Waibel LIMSI-CNRS, Orsay, France Karlsruhe Institute of Technology,

More information

11/29/2010. Statistical Parsing. Statistical Parsing. Simple PCFG for ATIS English. Syntactic Disambiguation

11/29/2010. Statistical Parsing. Statistical Parsing. Simple PCFG for ATIS English. Syntactic Disambiguation tatistical Parsing (Following slides are modified from Prof. Raymond Mooney s slides.) tatistical Parsing tatistical parsing uses a probabilistic model of syntax in order to assign probabilities to each

More information

Noisy SMS Machine Translation in Low-Density Languages

Noisy SMS Machine Translation in Low-Density Languages Noisy SMS Machine Translation in Low-Density Languages Vladimir Eidelman, Kristy Hollingshead, and Philip Resnik UMIACS Laboratory for Computational Linguistics and Information Processing Department of

More information

Enhancing Unlexicalized Parsing Performance using a Wide Coverage Lexicon, Fuzzy Tag-set Mapping, and EM-HMM-based Lexical Probabilities

Enhancing Unlexicalized Parsing Performance using a Wide Coverage Lexicon, Fuzzy Tag-set Mapping, and EM-HMM-based Lexical Probabilities Enhancing Unlexicalized Parsing Performance using a Wide Coverage Lexicon, Fuzzy Tag-set Mapping, and EM-HMM-based Lexical Probabilities Yoav Goldberg Reut Tsarfaty Meni Adler Michael Elhadad Ben Gurion

More information

Syntax Parsing 1. Grammars and parsing 2. Top-down and bottom-up parsing 3. Chart parsers 4. Bottom-up chart parsing 5. The Earley Algorithm

Syntax Parsing 1. Grammars and parsing 2. Top-down and bottom-up parsing 3. Chart parsers 4. Bottom-up chart parsing 5. The Earley Algorithm Syntax Parsing 1. Grammars and parsing 2. Top-down and bottom-up parsing 3. Chart parsers 4. Bottom-up chart parsing 5. The Earley Algorithm syntax: from the Greek syntaxis, meaning setting out together

More information

BANGLA TO ENGLISH TEXT CONVERSION USING OPENNLP TOOLS

BANGLA TO ENGLISH TEXT CONVERSION USING OPENNLP TOOLS Daffodil International University Institutional Repository DIU Journal of Science and Technology Volume 8, Issue 1, January 2013 2013-01 BANGLA TO ENGLISH TEXT CONVERSION USING OPENNLP TOOLS Uddin, Sk.

More information

arxiv: v1 [cs.cl] 2 Apr 2017

arxiv: v1 [cs.cl] 2 Apr 2017 Word-Alignment-Based Segment-Level Machine Translation Evaluation using Word Embeddings Junki Matsuo and Mamoru Komachi Graduate School of System Design, Tokyo Metropolitan University, Japan matsuo-junki@ed.tmu.ac.jp,

More information

Exploiting Phrasal Lexica and Additional Morpho-syntactic Language Resources for Statistical Machine Translation with Scarce Training Data

Exploiting Phrasal Lexica and Additional Morpho-syntactic Language Resources for Statistical Machine Translation with Scarce Training Data Exploiting Phrasal Lexica and Additional Morpho-syntactic Language Resources for Statistical Machine Translation with Scarce Training Data Maja Popović and Hermann Ney Lehrstuhl für Informatik VI, Computer

More information

Target Language Preposition Selection an Experiment with Transformation-Based Learning and Aligned Bilingual Data

Target Language Preposition Selection an Experiment with Transformation-Based Learning and Aligned Bilingual Data Target Language Preposition Selection an Experiment with Transformation-Based Learning and Aligned Bilingual Data Ebba Gustavii Department of Linguistics and Philology, Uppsala University, Sweden ebbag@stp.ling.uu.se

More information

The Karlsruhe Institute of Technology Translation Systems for the WMT 2011

The Karlsruhe Institute of Technology Translation Systems for the WMT 2011 The Karlsruhe Institute of Technology Translation Systems for the WMT 2011 Teresa Herrmann, Mohammed Mediani, Jan Niehues and Alex Waibel Karlsruhe Institute of Technology Karlsruhe, Germany firstname.lastname@kit.edu

More information

Grammars & Parsing, Part 1:

Grammars & Parsing, Part 1: Grammars & Parsing, Part 1: Rules, representations, and transformations- oh my! Sentence VP The teacher Verb gave the lecture 2015-02-12 CS 562/662: Natural Language Processing Game plan for today: Review

More information

Prediction of Maximal Projection for Semantic Role Labeling

Prediction of Maximal Projection for Semantic Role Labeling Prediction of Maximal Projection for Semantic Role Labeling Weiwei Sun, Zhifang Sui Institute of Computational Linguistics Peking University Beijing, 100871, China {ws, szf}@pku.edu.cn Haifeng Wang Toshiba

More information

Chunk Parsing for Base Noun Phrases using Regular Expressions. Let s first let the variable s0 be the sentence tree of the first sentence.

Chunk Parsing for Base Noun Phrases using Regular Expressions. Let s first let the variable s0 be the sentence tree of the first sentence. NLP Lab Session Week 8 October 15, 2014 Noun Phrase Chunking and WordNet in NLTK Getting Started In this lab session, we will work together through a series of small examples using the IDLE window and

More information

Detecting English-French Cognates Using Orthographic Edit Distance

Detecting English-French Cognates Using Orthographic Edit Distance Detecting English-French Cognates Using Orthographic Edit Distance Qiongkai Xu 1,2, Albert Chen 1, Chang i 1 1 The Australian National University, College of Engineering and Computer Science 2 National

More information

Domain Adaptation in Statistical Machine Translation of User-Forum Data using Component-Level Mixture Modelling

Domain Adaptation in Statistical Machine Translation of User-Forum Data using Component-Level Mixture Modelling Domain Adaptation in Statistical Machine Translation of User-Forum Data using Component-Level Mixture Modelling Pratyush Banerjee, Sudip Kumar Naskar, Johann Roturier 1, Andy Way 2, Josef van Genabith

More information

Developing a TT-MCTAG for German with an RCG-based Parser

Developing a TT-MCTAG for German with an RCG-based Parser Developing a TT-MCTAG for German with an RCG-based Parser Laura Kallmeyer, Timm Lichte, Wolfgang Maier, Yannick Parmentier, Johannes Dellert University of Tübingen, Germany CNRS-LORIA, France LREC 2008,

More information

Basic Parsing with Context-Free Grammars. Some slides adapted from Julia Hirschberg and Dan Jurafsky 1

Basic Parsing with Context-Free Grammars. Some slides adapted from Julia Hirschberg and Dan Jurafsky 1 Basic Parsing with Context-Free Grammars Some slides adapted from Julia Hirschberg and Dan Jurafsky 1 Announcements HW 2 to go out today. Next Tuesday most important for background to assignment Sign up

More information

What Can Neural Networks Teach us about Language? Graham Neubig a2-dlearn 11/18/2017

What Can Neural Networks Teach us about Language? Graham Neubig a2-dlearn 11/18/2017 What Can Neural Networks Teach us about Language? Graham Neubig a2-dlearn 11/18/2017 Supervised Training of Neural Networks for Language Training Data Training Model this is an example the cat went to

More information

Overview of the 3rd Workshop on Asian Translation

Overview of the 3rd Workshop on Asian Translation Overview of the 3rd Workshop on Asian Translation Toshiaki Nakazawa Chenchen Ding and Hideya Mino Japan Science and National Institute of Technology Agency Information and nakazawa@pa.jst.jp Communications

More information

Natural Language Processing. George Konidaris

Natural Language Processing. George Konidaris Natural Language Processing George Konidaris gdk@cs.brown.edu Fall 2017 Natural Language Processing Understanding spoken/written sentences in a natural language. Major area of research in AI. Why? Humans

More information

Accurate Unlexicalized Parsing for Modern Hebrew

Accurate Unlexicalized Parsing for Modern Hebrew Accurate Unlexicalized Parsing for Modern Hebrew Reut Tsarfaty and Khalil Sima an Institute for Logic, Language and Computation, University of Amsterdam Plantage Muidergracht 24, 1018TV Amsterdam, The

More information

2/15/13. POS Tagging Problem. Part-of-Speech Tagging. Example English Part-of-Speech Tagsets. More Details of the Problem. Typical Problem Cases

2/15/13. POS Tagging Problem. Part-of-Speech Tagging. Example English Part-of-Speech Tagsets. More Details of the Problem. Typical Problem Cases POS Tagging Problem Part-of-Speech Tagging L545 Spring 203 Given a sentence W Wn and a tagset of lexical categories, find the most likely tag T..Tn for each word in the sentence Example Secretariat/P is/vbz

More information

Chinese Language Parsing with Maximum-Entropy-Inspired Parser

Chinese Language Parsing with Maximum-Entropy-Inspired Parser Chinese Language Parsing with Maximum-Entropy-Inspired Parser Heng Lian Brown University Abstract The Chinese language has many special characteristics that make parsing difficult. The performance of state-of-the-art

More information

UNIVERSITY OF OSLO Department of Informatics. Dialog Act Recognition using Dependency Features. Master s thesis. Sindre Wetjen

UNIVERSITY OF OSLO Department of Informatics. Dialog Act Recognition using Dependency Features. Master s thesis. Sindre Wetjen UNIVERSITY OF OSLO Department of Informatics Dialog Act Recognition using Dependency Features Master s thesis Sindre Wetjen November 15, 2013 Acknowledgments First I want to thank my supervisors Lilja

More information

The NICT Translation System for IWSLT 2012

The NICT Translation System for IWSLT 2012 The NICT Translation System for IWSLT 2012 Andrew Finch Ohnmar Htun Eiichiro Sumita Multilingual Translation Group MASTAR Project National Institute of Information and Communications Technology Kyoto,

More information

An Introduction to the Minimalist Program

An Introduction to the Minimalist Program An Introduction to the Minimalist Program Luke Smith University of Arizona Summer 2016 Some findings of traditional syntax Human languages vary greatly, but digging deeper, they all have distinct commonalities:

More information

Cross Language Information Retrieval

Cross Language Information Retrieval Cross Language Information Retrieval RAFFAELLA BERNARDI UNIVERSITÀ DEGLI STUDI DI TRENTO P.ZZA VENEZIA, ROOM: 2.05, E-MAIL: BERNARDI@DISI.UNITN.IT Contents 1 Acknowledgment.............................................

More information

EdIt: A Broad-Coverage Grammar Checker Using Pattern Grammar

EdIt: A Broad-Coverage Grammar Checker Using Pattern Grammar EdIt: A Broad-Coverage Grammar Checker Using Pattern Grammar Chung-Chi Huang Mei-Hua Chen Shih-Ting Huang Jason S. Chang Institute of Information Systems and Applications, National Tsing Hua University,

More information

Cross-Lingual Dependency Parsing with Universal Dependencies and Predicted PoS Labels

Cross-Lingual Dependency Parsing with Universal Dependencies and Predicted PoS Labels Cross-Lingual Dependency Parsing with Universal Dependencies and Predicted PoS Labels Jörg Tiedemann Uppsala University Department of Linguistics and Philology firstname.lastname@lingfil.uu.se Abstract

More information

Linking Task: Identifying authors and book titles in verbose queries

Linking Task: Identifying authors and book titles in verbose queries Linking Task: Identifying authors and book titles in verbose queries Anaïs Ollagnier, Sébastien Fournier, and Patrice Bellot Aix-Marseille University, CNRS, ENSAM, University of Toulon, LSIS UMR 7296,

More information

Specification and Evaluation of Machine Translation Toy Systems - Criteria for laboratory assignments

Specification and Evaluation of Machine Translation Toy Systems - Criteria for laboratory assignments Specification and Evaluation of Machine Translation Toy Systems - Criteria for laboratory assignments Cristina Vertan, Walther v. Hahn University of Hamburg, Natural Language Systems Division Hamburg,

More information

Parsing of part-of-speech tagged Assamese Texts

Parsing of part-of-speech tagged Assamese Texts IJCSI International Journal of Computer Science Issues, Vol. 6, No. 1, 2009 ISSN (Online): 1694-0784 ISSN (Print): 1694-0814 28 Parsing of part-of-speech tagged Assamese Texts Mirzanur Rahman 1, Sufal

More information

Proof Theory for Syntacticians

Proof Theory for Syntacticians Department of Linguistics Ohio State University Syntax 2 (Linguistics 602.02) January 5, 2012 Logics for Linguistics Many different kinds of logic are directly applicable to formalizing theories in syntax

More information

Impact of Controlled Language on Translation Quality and Post-editing in a Statistical Machine Translation Environment

Impact of Controlled Language on Translation Quality and Post-editing in a Statistical Machine Translation Environment Impact of Controlled Language on Translation Quality and Post-editing in a Statistical Machine Translation Environment Takako Aikawa, Lee Schwartz, Ronit King Mo Corston-Oliver Carmen Lozano Microsoft

More information

1/20 idea. We ll spend an extra hour on 1/21. based on assigned readings. so you ll be ready to discuss them in class

1/20 idea. We ll spend an extra hour on 1/21. based on assigned readings. so you ll be ready to discuss them in class If we cancel class 1/20 idea We ll spend an extra hour on 1/21 I ll give you a brief writing problem for 1/21 based on assigned readings Jot down your thoughts based on your reading so you ll be ready

More information

Applications of memory-based natural language processing

Applications of memory-based natural language processing Applications of memory-based natural language processing Antal van den Bosch and Roser Morante ILK Research Group Tilburg University Prague, June 24, 2007 Current ILK members Principal investigator: Antal

More information

Semi-supervised methods of text processing, and an application to medical concept extraction. Yacine Jernite Text-as-Data series September 17.

Semi-supervised methods of text processing, and an application to medical concept extraction. Yacine Jernite Text-as-Data series September 17. Semi-supervised methods of text processing, and an application to medical concept extraction Yacine Jernite Text-as-Data series September 17. 2015 What do we want from text? 1. Extract information 2. Link

More information

Evaluation of a Simultaneous Interpretation System and Analysis of Speech Log for User Experience Assessment

Evaluation of a Simultaneous Interpretation System and Analysis of Speech Log for User Experience Assessment Evaluation of a Simultaneous Interpretation System and Analysis of Speech Log for User Experience Assessment Akiko Sakamoto, Kazuhiko Abe, Kazuo Sumita and Satoshi Kamatani Knowledge Media Laboratory,

More information

Learning Methods in Multilingual Speech Recognition

Learning Methods in Multilingual Speech Recognition Learning Methods in Multilingual Speech Recognition Hui Lin Department of Electrical Engineering University of Washington Seattle, WA 98125 linhui@u.washington.edu Li Deng, Jasha Droppo, Dong Yu, and Alex

More information

SEMAFOR: Frame Argument Resolution with Log-Linear Models

SEMAFOR: Frame Argument Resolution with Log-Linear Models SEMAFOR: Frame Argument Resolution with Log-Linear Models Desai Chen or, The Case of the Missing Arguments Nathan Schneider SemEval July 16, 2010 Dipanjan Das School of Computer Science Carnegie Mellon

More information

LTAG-spinal and the Treebank

LTAG-spinal and the Treebank LTAG-spinal and the Treebank a new resource for incremental, dependency and semantic parsing Libin Shen (lshen@bbn.com) BBN Technologies, 10 Moulton Street, Cambridge, MA 02138, USA Lucas Champollion (champoll@ling.upenn.edu)

More information

What is NLP? CS 188: Artificial Intelligence Spring Why is Language Hard? The Big Open Problems. Information Extraction. Machine Translation

What is NLP? CS 188: Artificial Intelligence Spring Why is Language Hard? The Big Open Problems. Information Extraction. Machine Translation C 188: Artificial Intelligence pring 2006 What is NLP? Lecture 27: NLP 4/27/2006 Dan Klein UC Berkeley Fundamental goal: deep understand of broad language Not just string processing or keyword matching!

More information

Annotation Projection for Discourse Connectives

Annotation Projection for Discourse Connectives SFB 833 / Univ. Tübingen Penn Discourse Treebank Workshop Annotation projection Basic idea: Given a bitext E/F and annotation for F, how would the annotation look for E? Examples: Word Sense Disambiguation

More information

LING 329 : MORPHOLOGY

LING 329 : MORPHOLOGY LING 329 : MORPHOLOGY TTh 10:30 11:50 AM, Physics 121 Course Syllabus Spring 2013 Matt Pearson Office: Vollum 313 Email: pearsonm@reed.edu Phone: 7618 (off campus: 503-517-7618) Office hrs: Mon 1:30 2:30,

More information

METHODS FOR EXTRACTING AND CLASSIFYING PAIRS OF COGNATES AND FALSE FRIENDS

METHODS FOR EXTRACTING AND CLASSIFYING PAIRS OF COGNATES AND FALSE FRIENDS METHODS FOR EXTRACTING AND CLASSIFYING PAIRS OF COGNATES AND FALSE FRIENDS Ruslan Mitkov (R.Mitkov@wlv.ac.uk) University of Wolverhampton ViktorPekar (v.pekar@wlv.ac.uk) University of Wolverhampton Dimitar

More information

Control and Boundedness

Control and Boundedness Control and Boundedness Having eliminated rules, we would expect constructions to follow from the lexical categories (of heads and specifiers of syntactic constructions) alone. Combinatory syntax simply

More information

Towards a Machine-Learning Architecture for Lexical Functional Grammar Parsing. Grzegorz Chrupa la

Towards a Machine-Learning Architecture for Lexical Functional Grammar Parsing. Grzegorz Chrupa la Towards a Machine-Learning Architecture for Lexical Functional Grammar Parsing Grzegorz Chrupa la A dissertation submitted in fulfilment of the requirements for the award of Doctor of Philosophy (Ph.D.)

More information

Language Independent Passage Retrieval for Question Answering

Language Independent Passage Retrieval for Question Answering Language Independent Passage Retrieval for Question Answering José Manuel Gómez-Soriano 1, Manuel Montes-y-Gómez 2, Emilio Sanchis-Arnal 1, Luis Villaseñor-Pineda 2, Paolo Rosso 1 1 Polytechnic University

More information

Machine Learning from Garden Path Sentences: The Application of Computational Linguistics

Machine Learning from Garden Path Sentences: The Application of Computational Linguistics Machine Learning from Garden Path Sentences: The Application of Computational Linguistics http://dx.doi.org/10.3991/ijet.v9i6.4109 J.L. Du 1, P.F. Yu 1 and M.L. Li 2 1 Guangdong University of Foreign Studies,

More information

The Strong Minimalist Thesis and Bounded Optimality

The Strong Minimalist Thesis and Bounded Optimality The Strong Minimalist Thesis and Bounded Optimality DRAFT-IN-PROGRESS; SEND COMMENTS TO RICKL@UMICH.EDU Richard L. Lewis Department of Psychology University of Michigan 27 March 2010 1 Purpose of this

More information

Parallel Evaluation in Stratal OT * Adam Baker University of Arizona

Parallel Evaluation in Stratal OT * Adam Baker University of Arizona Parallel Evaluation in Stratal OT * Adam Baker University of Arizona tabaker@u.arizona.edu 1.0. Introduction The model of Stratal OT presented by Kiparsky (forthcoming), has not and will not prove uncontroversial

More information

Greedy Decoding for Statistical Machine Translation in Almost Linear Time

Greedy Decoding for Statistical Machine Translation in Almost Linear Time in: Proceedings of HLT-NAACL 23. Edmonton, Canada, May 27 June 1, 23. This version was produced on April 2, 23. Greedy Decoding for Statistical Machine Translation in Almost Linear Time Ulrich Germann

More information

The presence of interpretable but ungrammatical sentences corresponds to mismatches between interpretive and productive parsing.

The presence of interpretable but ungrammatical sentences corresponds to mismatches between interpretive and productive parsing. Lecture 4: OT Syntax Sources: Kager 1999, Section 8; Legendre et al. 1998; Grimshaw 1997; Barbosa et al. 1998, Introduction; Bresnan 1998; Fanselow et al. 1999; Gibson & Broihier 1998. OT is not a theory

More information

Ensemble Technique Utilization for Indonesian Dependency Parser

Ensemble Technique Utilization for Indonesian Dependency Parser Ensemble Technique Utilization for Indonesian Dependency Parser Arief Rahman Institut Teknologi Bandung Indonesia 23516008@std.stei.itb.ac.id Ayu Purwarianti Institut Teknologi Bandung Indonesia ayu@stei.itb.ac.id

More information

The stages of event extraction

The stages of event extraction The stages of event extraction David Ahn Intelligent Systems Lab Amsterdam University of Amsterdam ahn@science.uva.nl Abstract Event detection and recognition is a complex task consisting of multiple sub-tasks

More information

Parsing with Treebank Grammars: Empirical Bounds, Theoretical Models, and the Structure of the Penn Treebank

Parsing with Treebank Grammars: Empirical Bounds, Theoretical Models, and the Structure of the Penn Treebank Parsing with Treebank Grammars: Empirical Bounds, Theoretical Models, and the Structure of the Penn Treebank Dan Klein and Christopher D. Manning Computer Science Department Stanford University Stanford,

More information

The Smart/Empire TIPSTER IR System

The Smart/Empire TIPSTER IR System The Smart/Empire TIPSTER IR System Chris Buckley, Janet Walz Sabir Research, Gaithersburg, MD chrisb,walz@sabir.com Claire Cardie, Scott Mardis, Mandar Mitra, David Pierce, Kiri Wagstaff Department of

More information

Adding syntactic structure to bilingual terminology for improved domain adaptation

Adding syntactic structure to bilingual terminology for improved domain adaptation Adding syntactic structure to bilingual terminology for improved domain adaptation Mikel Artetxe 1, Gorka Labaka 1, Chakaveh Saedi 2, João Rodrigues 2, João Silva 2, António Branco 2, Eneko Agirre 1 1

More information

have to be modeled) or isolated words. Output of the system is a grapheme-tophoneme conversion system which takes as its input the spelling of words,

have to be modeled) or isolated words. Output of the system is a grapheme-tophoneme conversion system which takes as its input the spelling of words, A Language-Independent, Data-Oriented Architecture for Grapheme-to-Phoneme Conversion Walter Daelemans and Antal van den Bosch Proceedings ESCA-IEEE speech synthesis conference, New York, September 1994

More information

The taming of the data:

The taming of the data: The taming of the data: Using text mining in building a corpus for diachronic analysis Stefania Degaetano-Ortlieb, Hannah Kermes, Ashraf Khamis, Jörg Knappen, Noam Ordan and Elke Teich Background Big data

More information

Constraining X-Bar: Theta Theory

Constraining X-Bar: Theta Theory Constraining X-Bar: Theta Theory Carnie, 2013, chapter 8 Kofi K. Saah 1 Learning objectives Distinguish between thematic relation and theta role. Identify the thematic relations agent, theme, goal, source,

More information

DEVELOPMENT OF A MULTILINGUAL PARALLEL CORPUS AND A PART-OF-SPEECH TAGGER FOR AFRIKAANS

DEVELOPMENT OF A MULTILINGUAL PARALLEL CORPUS AND A PART-OF-SPEECH TAGGER FOR AFRIKAANS DEVELOPMENT OF A MULTILINGUAL PARALLEL CORPUS AND A PART-OF-SPEECH TAGGER FOR AFRIKAANS Julia Tmshkina Centre for Text Techitology, North-West University, 253 Potchefstroom, South Africa 2025770@puk.ac.za

More information

Introduction to HPSG. Introduction. Historical Overview. The HPSG architecture. Signature. Linguistic Objects. Descriptions.

Introduction to HPSG. Introduction. Historical Overview. The HPSG architecture. Signature. Linguistic Objects. Descriptions. to as a linguistic theory to to a member of the family of linguistic frameworks that are called generative grammars a grammar which is formalized to a high degree and thus makes exact predictions about

More information

Using dialogue context to improve parsing performance in dialogue systems

Using dialogue context to improve parsing performance in dialogue systems Using dialogue context to improve parsing performance in dialogue systems Ivan Meza-Ruiz and Oliver Lemon School of Informatics, Edinburgh University 2 Buccleuch Place, Edinburgh I.V.Meza-Ruiz@sms.ed.ac.uk,

More information

LQVSumm: A Corpus of Linguistic Quality Violations in Multi-Document Summarization

LQVSumm: A Corpus of Linguistic Quality Violations in Multi-Document Summarization LQVSumm: A Corpus of Linguistic Quality Violations in Multi-Document Summarization Annemarie Friedrich, Marina Valeeva and Alexis Palmer COMPUTATIONAL LINGUISTICS & PHONETICS SAARLAND UNIVERSITY, GERMANY

More information

Product Feature-based Ratings foropinionsummarization of E-Commerce Feedback Comments

Product Feature-based Ratings foropinionsummarization of E-Commerce Feedback Comments Product Feature-based Ratings foropinionsummarization of E-Commerce Feedback Comments Vijayshri Ramkrishna Ingale PG Student, Department of Computer Engineering JSPM s Imperial College of Engineering &

More information

Modeling Attachment Decisions with a Probabilistic Parser: The Case of Head Final Structures

Modeling Attachment Decisions with a Probabilistic Parser: The Case of Head Final Structures Modeling Attachment Decisions with a Probabilistic Parser: The Case of Head Final Structures Ulrike Baldewein (ulrike@coli.uni-sb.de) Computational Psycholinguistics, Saarland University D-66041 Saarbrücken,

More information

Chapter 4: Valence & Agreement CSLI Publications

Chapter 4: Valence & Agreement CSLI Publications Chapter 4: Valence & Agreement Reminder: Where We Are Simple CFG doesn t allow us to cross-classify categories, e.g., verbs can be grouped by transitivity (deny vs. disappear) or by number (deny vs. denies).

More information

A Minimalist Approach to Code-Switching. In the field of linguistics, the topic of bilingualism is a broad one. There are many

A Minimalist Approach to Code-Switching. In the field of linguistics, the topic of bilingualism is a broad one. There are many Schmidt 1 Eric Schmidt Prof. Suzanne Flynn Linguistic Study of Bilingualism December 13, 2013 A Minimalist Approach to Code-Switching In the field of linguistics, the topic of bilingualism is a broad one.

More information

Derivational: Inflectional: In a fit of rage the soldiers attacked them both that week, but lost the fight.

Derivational: Inflectional: In a fit of rage the soldiers attacked them both that week, but lost the fight. Final Exam (120 points) Click on the yellow balloons below to see the answers I. Short Answer (32pts) 1. (6) The sentence The kinder teachers made sure that the students comprehended the testable material

More information

Constructing Parallel Corpus from Movie Subtitles

Constructing Parallel Corpus from Movie Subtitles Constructing Parallel Corpus from Movie Subtitles Han Xiao 1 and Xiaojie Wang 2 1 School of Information Engineering, Beijing University of Post and Telecommunications artex.xh@gmail.com 2 CISTR, Beijing

More information

Learning Structural Correspondences Across Different Linguistic Domains with Synchronous Neural Language Models

Learning Structural Correspondences Across Different Linguistic Domains with Synchronous Neural Language Models Learning Structural Correspondences Across Different Linguistic Domains with Synchronous Neural Language Models Stephan Gouws and GJ van Rooyen MIH Medialab, Stellenbosch University SOUTH AFRICA {stephan,gvrooyen}@ml.sun.ac.za

More information

Modeling full form lexica for Arabic

Modeling full form lexica for Arabic Modeling full form lexica for Arabic Susanne Alt Amine Akrout Atilf-CNRS Laurent Romary Loria-CNRS Objectives Presentation of the current standardization activity in the domain of lexical data modeling

More information

Beyond the Pipeline: Discrete Optimization in NLP

Beyond the Pipeline: Discrete Optimization in NLP Beyond the Pipeline: Discrete Optimization in NLP Tomasz Marciniak and Michael Strube EML Research ggmbh Schloss-Wolfsbrunnenweg 33 69118 Heidelberg, Germany http://www.eml-research.de/nlp Abstract We

More information

Underlying and Surface Grammatical Relations in Greek consider

Underlying and Surface Grammatical Relations in Greek consider 0 Underlying and Surface Grammatical Relations in Greek consider Sentences Brian D. Joseph The Ohio State University Abbreviated Title Grammatical Relations in Greek consider Sentences Brian D. Joseph

More information

Project in the framework of the AIM-WEST project Annotation of MWEs for translation

Project in the framework of the AIM-WEST project Annotation of MWEs for translation Project in the framework of the AIM-WEST project Annotation of MWEs for translation 1 Agnès Tutin LIDILEM/LIG Université Grenoble Alpes 30 october 2014 Outline 2 Why annotate MWEs in corpora? A first experiment

More information

Speech Recognition at ICSI: Broadcast News and beyond

Speech Recognition at ICSI: Broadcast News and beyond Speech Recognition at ICSI: Broadcast News and beyond Dan Ellis International Computer Science Institute, Berkeley CA Outline 1 2 3 The DARPA Broadcast News task Aspects of ICSI

More information

Informatics 2A: Language Complexity and the. Inf2A: Chomsky Hierarchy

Informatics 2A: Language Complexity and the. Inf2A: Chomsky Hierarchy Informatics 2A: Language Complexity and the Chomsky Hierarchy September 28, 2010 Starter 1 Is there a finite state machine that recognises all those strings s from the alphabet {a, b} where the difference

More information

M55205-Mastering Microsoft Project 2016

M55205-Mastering Microsoft Project 2016 M55205-Mastering Microsoft Project 2016 Course Number: M55205 Category: Desktop Applications Duration: 3 days Certification: Exam 70-343 Overview This three-day, instructor-led course is intended for individuals

More information

ENGBG1 ENGBL1 Campus Linguistics. Meeting 2. Chapter 7 (Morphology) and chapter 9 (Syntax) Pia Sundqvist

ENGBG1 ENGBL1 Campus Linguistics. Meeting 2. Chapter 7 (Morphology) and chapter 9 (Syntax) Pia Sundqvist Meeting 2 Chapter 7 (Morphology) and chapter 9 (Syntax) Today s agenda Repetition of meeting 1 Mini-lecture on morphology Seminar on chapter 7, worksheet Mini-lecture on syntax Seminar on chapter 9, worksheet

More information

AQUA: An Ontology-Driven Question Answering System

AQUA: An Ontology-Driven Question Answering System AQUA: An Ontology-Driven Question Answering System Maria Vargas-Vera, Enrico Motta and John Domingue Knowledge Media Institute (KMI) The Open University, Walton Hall, Milton Keynes, MK7 6AA, United Kingdom.

More information

Some Principles of Automated Natural Language Information Extraction

Some Principles of Automated Natural Language Information Extraction Some Principles of Automated Natural Language Information Extraction Gregers Koch Department of Computer Science, Copenhagen University DIKU, Universitetsparken 1, DK-2100 Copenhagen, Denmark Abstract

More information

The Interface between Phrasal and Functional Constraints

The Interface between Phrasal and Functional Constraints The Interface between Phrasal and Functional Constraints John T. Maxwell III* Xerox Palo Alto Research Center Ronald M. Kaplan t Xerox Palo Alto Research Center Many modern grammatical formalisms divide

More information

Towards a MWE-driven A* parsing with LTAGs [WG2,WG3]

Towards a MWE-driven A* parsing with LTAGs [WG2,WG3] Towards a MWE-driven A* parsing with LTAGs [WG2,WG3] Jakub Waszczuk, Agata Savary To cite this version: Jakub Waszczuk, Agata Savary. Towards a MWE-driven A* parsing with LTAGs [WG2,WG3]. PARSEME 6th general

More information

COMPUTATIONAL COMPLEXITY OF LEFT-ASSOCIATIVE GRAMMAR

COMPUTATIONAL COMPLEXITY OF LEFT-ASSOCIATIVE GRAMMAR COMPUTATIONAL COMPLEXITY OF LEFT-ASSOCIATIVE GRAMMAR ROLAND HAUSSER Institut für Deutsche Philologie Ludwig-Maximilians Universität München München, West Germany 1. CHOICE OF A PRIMITIVE OPERATION The

More information

Semantic Inference at the Lexical-Syntactic Level for Textual Entailment Recognition

Semantic Inference at the Lexical-Syntactic Level for Textual Entailment Recognition Semantic Inference at the Lexical-Syntactic Level for Textual Entailment Recognition Roy Bar-Haim,Ido Dagan, Iddo Greental, Idan Szpektor and Moshe Friedman Computer Science Department, Bar-Ilan University,

More information

TINE: A Metric to Assess MT Adequacy

TINE: A Metric to Assess MT Adequacy TINE: A Metric to Assess MT Adequacy Miguel Rios, Wilker Aziz and Lucia Specia Research Group in Computational Linguistics University of Wolverhampton Stafford Street, Wolverhampton, WV1 1SB, UK {m.rios,

More information

Three New Probabilistic Models. Jason M. Eisner. CIS Department, University of Pennsylvania. 200 S. 33rd St., Philadelphia, PA , USA

Three New Probabilistic Models. Jason M. Eisner. CIS Department, University of Pennsylvania. 200 S. 33rd St., Philadelphia, PA , USA Three New Probabilistic Models for Dependency Parsing: An Exploration Jason M. Eisner CIS Department, University of Pennsylvania 200 S. 33rd St., Philadelphia, PA 19104-6389, USA jeisner@linc.cis.upenn.edu

More information

Character Stream Parsing of Mixed-lingual Text

Character Stream Parsing of Mixed-lingual Text Character Stream Parsing of Mixed-lingual Text Harald Romsdorfer and Beat Pfister Speech Processing Group Computer Engineering and Networks Laboratory ETH Zurich {romsdorfer,pfister}@tik.ee.ethz.ch Abstract

More information

Inleiding Taalkunde. Docent: Paola Monachesi. Blok 4, 2001/ Syntax 2. 2 Phrases and constituent structure 2. 3 A minigrammar of Italian 3

Inleiding Taalkunde. Docent: Paola Monachesi. Blok 4, 2001/ Syntax 2. 2 Phrases and constituent structure 2. 3 A minigrammar of Italian 3 Inleiding Taalkunde Docent: Paola Monachesi Blok 4, 2001/2002 Contents 1 Syntax 2 2 Phrases and constituent structure 2 3 A minigrammar of Italian 3 4 Trees 3 5 Developing an Italian lexicon 4 6 S(emantic)-selection

More information

POS tagging of Chinese Buddhist texts using Recurrent Neural Networks

POS tagging of Chinese Buddhist texts using Recurrent Neural Networks POS tagging of Chinese Buddhist texts using Recurrent Neural Networks Longlu Qin Department of East Asian Languages and Cultures longlu@stanford.edu Abstract Chinese POS tagging, as one of the most important

More information

Clickthrough-Based Translation Models for Web Search: from Word Models to Phrase Models

Clickthrough-Based Translation Models for Web Search: from Word Models to Phrase Models Clickthrough-Based Translation Models for Web Search: from Word Models to Phrase Models Jianfeng Gao Microsoft Research One Microsoft Way Redmond, WA 98052 USA jfgao@microsoft.com Xiaodong He Microsoft

More information

Vocabulary Usage and Intelligibility in Learner Language

Vocabulary Usage and Intelligibility in Learner Language Vocabulary Usage and Intelligibility in Learner Language Emi Izumi, 1 Kiyotaka Uchimoto 1 and Hitoshi Isahara 1 1. Introduction In verbal communication, the primary purpose of which is to convey and understand

More information

University of Alberta. Large-Scale Semi-Supervised Learning for Natural Language Processing. Shane Bergsma

University of Alberta. Large-Scale Semi-Supervised Learning for Natural Language Processing. Shane Bergsma University of Alberta Large-Scale Semi-Supervised Learning for Natural Language Processing by Shane Bergsma A thesis submitted to the Faculty of Graduate Studies and Research in partial fulfillment of

More information