Factored models for phrase-based translation
|
|
- Valentine Goodman
- 6 years ago
- Views:
Transcription
1 Factored models for phrase-based translation LING 575 Lecture 7 Kristina Toutanova MSR & UW May 18, 2010 With slides mostly borrowed from Philip Koehn
2 Assignments Project updates due May 19 Guidelines on on web site A couple of paragraphs to one page update, not graded Paper reviews due May 26 People who did not do a paper presentation Guidelines on web page
3 Overview Motivation for factored models Example Model and Training Alternate Decoding Paths Decoding Applications Enriching output space Translating factored words Enriching input space
4 Statistical machine translation today Best performing methods based on phrases short sequences of words no use of explicit syntactic information no use of morphological information currently best performing method Progress in syntax-based translation tree transfer models using syntactic annotation still shallow representation of words and non-terminals active research, improving performance 2
5 One motivation: morphology 3 Models treat car and cars as completely different words training occurrences of car have no effect on learning translation of cars if we only see car, we do not know how to translate cars rich morphology (German, Arabic, Finnish, Czech,...) many word forms Better approach analyze surface word forms into lemma and morphology, e.g.: car +plural translate lemma and morphology separately generate target surface form
6 Factored represention of words Factored translation models 4 Input Output word lemma part-of-speech morphology word class word lemma part-of-speech morphology word class... Goals Generalization, e.g. by translating lemmas, not surface forms Richer model, e.g. using syntax for reordering, language modeling)...
7 5 Related work Back off to representations with richer statistics (lemma, etc.) [Nießen and Ney, 2001, Yang and Kirchhoff 2006, Talbot and Osborne 2006] Use of additional annotation in pre-processing (POS, syntax trees, etc.) [Collins et al., 2005, Crego et al, 2006] Use of additional annotation in re-ranking (morphological features, POS, syntax trees, etc.) [Och et al. 2004, Koehn and Knight, 2005] we pursue an integrated approach Use of syntactic tree structure [Wu 1997, Alshawi et al. 1998, Yamada and Knight 2001, Melamed 2004, Menezes and Quirk 2005, Chiang 2005, Galley et al. 2006] may be combined with our approach
8 Factored Translation Models 6 Motivation Example Model and Training Decoding Experiments
9 Decomposing translation: example Translate lemma and syntactic information separately 7 lemma lemma part-of-speech part-of-speech morphology morphology
10 Decomposing translation: example Generate surface form on target side 8 surface lemma part-of-speech morphology
11 9 Input: (Autos, Auto, NNS) Translation process: example 1. Translation step: lemma lemma (?, car,?), (?, auto,?) 2. Generation step: lemma part-of-speech (?, car, NN), (?, car, NNS), (?, auto, NN), (?, auto, NNS) 3. Translation step: part-of-speech part-of-speech (?, car, NN), (?, car, NNS), (?, auto, NNP), (?, auto, NNS) 4. Generation step: lemma,part-of-speech surface (car, car, NN), (cars, car, NNS), (auto, auto, NN), (autos, auto, NNS)
12 Factored Translation Models 10 Motivation Example Model and Training Decoding Experiments
13 Model 11 Extension of phrase model Mapping of foreign words into English words broken up into steps translation step: maps foreign factors into English factors (on the phrasal level) generation step: maps English factors into English factors (for each word) Each step is modeled by one or more feature functions fits nicely into log-linear model weight set by discriminative training method Order of mapping steps is chosen to optimize search
14 Phrase-based training Establish word alignment (GIZA++ and symmetrization) 12 natürlich hat john spass am spiel naturally john has fun with the game
15 Extract phrase Phrase-based training 13 natürlich hat john spass am spiel naturally john has fun with the game natürlich hat john naturally john has
16 Factored training Annotate training with factors, extract phrase 14 ADV V NNP NN P NN ADV NNP V NN P DET NN ADV V NNP ADV NNP V
17 Training of generation steps 15 Generation steps map target factors to target factors typically trained on target side of parallel corpus may be trained on additional monolingual data Example: The/det man/nn sleeps/vbz count collection - count(the,det)++ - count(man,nn)++ - count(sleeps,vbz)++ evidence for probability distributions (max. likelihood estimation) - p(det the), p(the det) - p(nn man), p(man nn) - p(vbz sleeps), p(sleeps vbz)
18 Model form In standard phrase-based MT we have scores of phrase-pairs score f 1 f m, e 1 e n = λ 1 P f 1 f m e 1 e n + λ 2 P e 1 e n f 1 f m + λ 3 P lex f 1... f m e 1... e n + λ 4 P lex e 1 e n f 1 f m + λ 5 Now the scores of phrase-pairs are decomposed into scores for translation and generation steps within the phrase pair Take this model:
19 Model form equation f j = f j lf j posf j e i = e i le i pose i me i score f 1 lf 1 posf 1 f m lf m posf m, e 1 le 1 me 1. e n le n pose n me n = score lf 1 lf m, le 1 le n + score(posf 1 posf m, posle 1 me 1 posle n me n )+ score gen e 1, posle 1 me score gen (e n, posle n me n )
20 Factored Translation Models 16 Motivation Example Model and Training Decoding Experiments
21 Phrase-based translation Task: translate this sentence from German into English 17 er geht ja nicht nach hause
22 18 Translation step 1 Task: translate this sentence from German into English er geht ja nicht nach hause er he Pick phrase in input, translate
23 19 Translation step 2 Task: translate this sentence from German into English er geht ja nicht nach hause er ja nicht he does not Pick phrase in input, translate it is allowed to pick words out of sequence (reordering) phrases may have multiple words: many-to-many translation
24 20 Translation step 3 Task: translate this sentence from German into English er geht ja nicht nach hause er geht ja nicht he does not go Pick phrase in input, translate
25 21 Translation step 4 Task: translate this sentence from German into English er geht ja nicht nach hause er geht ja nicht nach hause he does not go home Pick phrase in input, translate
26 Translation options 22 er geht ja nicht nach hause he it, it, he it is he will be it goes he goes is are goes go is are is after all does yes is, of course not is not are not is not a not is not does not do not not do not does not is not to following not after not to after to according to in home under house return home do not house home chamber at home Many translation options to choose from in Europarl phrase table: 2727 matching phrase pairs for this sentence by pruning to the top 20 per phrase, 202 translation options remain
27 Translation options 23 er geht ja nicht nach hause he it, it, he it is he will be it goes he goes is are goes go is are is after all does yes is, of course not is not are not is not a not is not does not do not not do not does not is not to following not after not to after to according to in home under house return home do not The machine translation decoder does not know the right answer Search problem solved by heuristic beam search house home chamber at home
28 Decoding process: precompute translation options er geht ja nicht nach hause 24
29 Decoding process: start with initial hypothesis er geht ja nicht nach hause 25
30 Decoding process: hypothesis expansion er geht ja nicht nach hause 26 are
31 Decoding process: hypothesis expansion er geht ja nicht nach hause 27 he are it
32 Decoding process: hypothesis expansion er geht ja nicht nach hause 28 yes he goes home are does not go home it to
33 Decoding process: find best path er geht ja nicht nach hause 29 yes he goes home are does not go home it to
34 Factored model decoding 30 Factored model decoding introduces additional complexity Hypothesis expansion not any more according to simple translation table, but by executing a number of mapping steps, e.g.: 1. translating of lemma lemma 2. translating of part-of-speech, morphology part-of-speech, morphology 3. generation of surface form Example: haus NN neutral plural nominative { houses house NN plural, homes home NN plural, buildings building NN plural, shells shell NN plural } Each time, a hypothesis is expanded, these mapping steps have to applied
35 Efficient factored model decoding Key insight: executing of mapping steps can be pre-computed and stored as translation options apply mapping steps to all input phrases store results as translation options decoding algorithm unchanged haus NN neutral plural nominative houses house NN plural homes home NN plural buildings building NN plural shells shell NN plural
36 Efficient factored model decoding 32 Problem: Explosion of translation options originally limited to 20 per input phrase even with simple model, now 1000s of mapping expansions possible Solution: Additional pruning of translation options keep only the best expanded translation options current default 50 per input phrase decoding only about 2-3 times slower than with surface model
37 Factored Translation Models 33 Motivation Example Model and Training Decoding Experiments
38 Adding linguistic markup to output 34 Input Output word word part-of-speech Generation of POS tags on the target side Use of high order language models over POS (7-gram, 9-gram) Motivation: syntactic tags should enforce syntactic sentence structure model not strong enough to support major restructuring
39 Some experiments English German, Europarl, 30 million word, test2006 Model BLEU best published result baseline (surface) surface + POS German English, News Commentary data (WMT 2007), 1 million word Model BLEU Baseline With POS LM Improvements under sparse data conditions Similar results with CCG supertags [Birch et al., 2007] 35
40 Sequence models over morphological tags 36 die hellen Sterne erleuchten das schwarze Himmel (the) (bright) (stars) (illuminate) (the) (black) (sky) fem fem fem - neutral neutral male plural plural plural plural sgl. sgl. sgl nom. nom. nom. - acc. acc. acc. Violation of noun phrase agreement in gender das schwarze and schwarze Himmel are perfectly fine bigrams but: das schwarze Himmel is not If relevant n-grams does not occur in the corpus, a lexical n-gram model would fail to detect this mistake Morphological sequence model: p(n-male J-male) > p(n-male J-neutral)
41 Local agreement (esp. within noun phrases) 37 Input Output word word part-of-speech morphology High order language models over POS and morphology Motivation DET-sgl NOUN-sgl good sequence DET-sgl NOUN-plural bad sequence
42 38 Agreement within noun phrases Experiment: 7-gram POS, morph LM in addition to 3-gram word LM Results Method Agreement errors in NP devtest test baseline 15% in NP 3 words BLEU BLEU factored model 4% in NP 3 words BLEU BLEU Example baseline:... zur zwischenstaatlichen methoden... factored model:... zu zwischenstaatlichen methoden... Example baseline:... das zweite wichtige änderung... factored model:... die zweite wichtige änderung...
43 Other result on enriching output [Koehn and Hoang 07] 40K training sent 20K training sent
44 Morphological generation model 39 Input Output word word lemma lemma part-of-speech part-of-speech morphology Our motivating example Translating lemma and morphological information more robust
45 40 Initial results Results on 1 million word News Commentary corpus (German English) System In-doman Out-of-domain Baseline With POS LM Morphgen model What went wrong? why back-off to lemma, when we know how to translate surface forms? loss of information
46 Solution: alternative decoding paths 41 Input Output word lemma or word lemma part-of-speech part-of-speech morphology Allow both surface form translation and morphgen model prefer surface model for known words morphgen model acts as back-off
47 42 Model now beats the baseline: Results System In-doman Out-of-domain Baseline With POS LM Morphgen model Both model paths
48 Specifying factored models in Moses: Example train-factored-phrase-model.perl --corpus factored-corpus/projsyndicate.1000 \ --root-dir pos-decomposed \ --f de --e en \ --lm 0:3:factored-corpus/surface.lm:0 \ --lm 1:3:factored-corpus/pos.lm:0 \ --translation-factors 0-0 \ --generation-factors 0-1 \ --decoding-steps t0,g0
49 Specifying factored models in Moses: Example train-factored-phrase-model.perl --f de --e en \ --lm 0:3:factored-corpus/surface.lm:0 \ --lm 2:3:factored-corpus/pos.lm:0 \ --translation-factors ,3\ --generation-factors 1,2,3-0 \ --decoding-steps t0,t1,g0 \
50 Specifying factored models in Moses: multiple decoding paths train-factored-phrase-model.perl --f de --e en \ --lm 0:3:factored-corpus/surface.lm:0 \ --lm 2:3:factored-corpus/pos.lm:0 \ --translation-factors ,3+0-0,2\ --generation-factors 1,2,3-0 \ --decoding-steps t0,t1,g0:t2 \
51 Adding annotation to the source Source words may lack sufficient information to map phrases English-German: what case for noun phrases? Chinese-English: plural or singular pronoun translation: what do they refer to? Idea: add additional information to the source that makes the required information available locally (where it is needed) see [Avramidis and Koehn, ACL 2008] for details 43
52 Error analysis for an English-Greek baseline phrasal system
53 Case Information for English Greek 44 Input Output word word subject/object case Detect in English, if noun phrase is subject/object (using parse tree) Map information into case morphology of Greek Use case morphology to generate correct word form
54 Obtaining Case Information Use syntactic parse of English input (method similar to semantic role labeling) 45
55 46 Results English-Greek Automatic BLEU scores System devtest test07 baseline enriched Improvement in verb inflection System Verb count Errors Missing baseline % 7.4% enriched % 2.7% Improvement in noun phrase inflection System NPs Errors Missing baseline % 3.2% enriched % 5.0% Also successfully applied to English-Czech
56 Summary Factored translation models make it possible to model words as a set of features (factors) We can use this to build pos-based language models for the target Good empirical improvements with 7-gram LMs over output syntactic factors We can use this to represent translation of phrases as translation of parts of words in the phrases e.g. lemma/morphology Using multiple decoding paths we can avoid the strong independence assumptions Good empirical improvement in small/medium data conditions We can enrich the word representation of an input language to aid translation into a morphologically richer language Good improvements on specific linguistic phenomena, not a huge boost to overall BLEU
57 References Factored Translation Models, Philipp Koehn and Hieu Hoang, EMNLP 2007, pdf. Enriching Morphologically Poor Languages for Statistical Machine Translation, Eleftherios Avramidis and Philipp Koehn, ACL 2008, pdf.
Language Model and Grammar Extraction Variation in Machine Translation
Language Model and Grammar Extraction Variation in Machine Translation Vladimir Eidelman, Chris Dyer, and Philip Resnik UMIACS Laboratory for Computational Linguistics and Information Processing Department
More informationThe Karlsruhe Institute of Technology Translation Systems for the WMT 2011
The Karlsruhe Institute of Technology Translation Systems for the WMT 2011 Teresa Herrmann, Mohammed Mediani, Jan Niehues and Alex Waibel Karlsruhe Institute of Technology Karlsruhe, Germany firstname.lastname@kit.edu
More informationNoisy SMS Machine Translation in Low-Density Languages
Noisy SMS Machine Translation in Low-Density Languages Vladimir Eidelman, Kristy Hollingshead, and Philip Resnik UMIACS Laboratory for Computational Linguistics and Information Processing Department of
More informationarxiv: v1 [cs.cl] 2 Apr 2017
Word-Alignment-Based Segment-Level Machine Translation Evaluation using Word Embeddings Junki Matsuo and Mamoru Komachi Graduate School of System Design, Tokyo Metropolitan University, Japan matsuo-junki@ed.tmu.ac.jp,
More informationThe MSR-NRC-SRI MT System for NIST Open Machine Translation 2008 Evaluation
The MSR-NRC-SRI MT System for NIST Open Machine Translation 2008 Evaluation AUTHORS AND AFFILIATIONS MSR: Xiaodong He, Jianfeng Gao, Chris Quirk, Patrick Nguyen, Arul Menezes, Robert Moore, Kristina Toutanova,
More informationCross-Lingual Dependency Parsing with Universal Dependencies and Predicted PoS Labels
Cross-Lingual Dependency Parsing with Universal Dependencies and Predicted PoS Labels Jörg Tiedemann Uppsala University Department of Linguistics and Philology firstname.lastname@lingfil.uu.se Abstract
More informationExploiting Phrasal Lexica and Additional Morpho-syntactic Language Resources for Statistical Machine Translation with Scarce Training Data
Exploiting Phrasal Lexica and Additional Morpho-syntactic Language Resources for Statistical Machine Translation with Scarce Training Data Maja Popović and Hermann Ney Lehrstuhl für Informatik VI, Computer
More informationTarget Language Preposition Selection an Experiment with Transformation-Based Learning and Aligned Bilingual Data
Target Language Preposition Selection an Experiment with Transformation-Based Learning and Aligned Bilingual Data Ebba Gustavii Department of Linguistics and Philology, Uppsala University, Sweden ebbag@stp.ling.uu.se
More informationDomain Adaptation in Statistical Machine Translation of User-Forum Data using Component-Level Mixture Modelling
Domain Adaptation in Statistical Machine Translation of User-Forum Data using Component-Level Mixture Modelling Pratyush Banerjee, Sudip Kumar Naskar, Johann Roturier 1, Andy Way 2, Josef van Genabith
More informationThe KIT-LIMSI Translation System for WMT 2014
The KIT-LIMSI Translation System for WMT 2014 Quoc Khanh Do, Teresa Herrmann, Jan Niehues, Alexandre Allauzen, François Yvon and Alex Waibel LIMSI-CNRS, Orsay, France Karlsruhe Institute of Technology,
More informationEnhancing Unlexicalized Parsing Performance using a Wide Coverage Lexicon, Fuzzy Tag-set Mapping, and EM-HMM-based Lexical Probabilities
Enhancing Unlexicalized Parsing Performance using a Wide Coverage Lexicon, Fuzzy Tag-set Mapping, and EM-HMM-based Lexical Probabilities Yoav Goldberg Reut Tsarfaty Meni Adler Michael Elhadad Ben Gurion
More informationTraining and evaluation of POS taggers on the French MULTITAG corpus
Training and evaluation of POS taggers on the French MULTITAG corpus A. Allauzen, H. Bonneau-Maynard LIMSI/CNRS; Univ Paris-Sud, Orsay, F-91405 {allauzen,maynard}@limsi.fr Abstract The explicit introduction
More informationCross Language Information Retrieval
Cross Language Information Retrieval RAFFAELLA BERNARDI UNIVERSITÀ DEGLI STUDI DI TRENTO P.ZZA VENEZIA, ROOM: 2.05, E-MAIL: BERNARDI@DISI.UNITN.IT Contents 1 Acknowledgment.............................................
More informationPrediction of Maximal Projection for Semantic Role Labeling
Prediction of Maximal Projection for Semantic Role Labeling Weiwei Sun, Zhifang Sui Institute of Computational Linguistics Peking University Beijing, 100871, China {ws, szf}@pku.edu.cn Haifeng Wang Toshiba
More information11/29/2010. Statistical Parsing. Statistical Parsing. Simple PCFG for ATIS English. Syntactic Disambiguation
tatistical Parsing (Following slides are modified from Prof. Raymond Mooney s slides.) tatistical Parsing tatistical parsing uses a probabilistic model of syntax in order to assign probabilities to each
More informationChunk Parsing for Base Noun Phrases using Regular Expressions. Let s first let the variable s0 be the sentence tree of the first sentence.
NLP Lab Session Week 8 October 15, 2014 Noun Phrase Chunking and WordNet in NLTK Getting Started In this lab session, we will work together through a series of small examples using the IDLE window and
More informationContext Free Grammars. Many slides from Michael Collins
Context Free Grammars Many slides from Michael Collins Overview I An introduction to the parsing problem I Context free grammars I A brief(!) sketch of the syntax of English I Examples of ambiguous structures
More informationDeveloping a TT-MCTAG for German with an RCG-based Parser
Developing a TT-MCTAG for German with an RCG-based Parser Laura Kallmeyer, Timm Lichte, Wolfgang Maier, Yannick Parmentier, Johannes Dellert University of Tübingen, Germany CNRS-LORIA, France LREC 2008,
More informationThe NICT Translation System for IWSLT 2012
The NICT Translation System for IWSLT 2012 Andrew Finch Ohnmar Htun Eiichiro Sumita Multilingual Translation Group MASTAR Project National Institute of Information and Communications Technology Kyoto,
More informationCS 598 Natural Language Processing
CS 598 Natural Language Processing Natural language is everywhere Natural language is everywhere Natural language is everywhere Natural language is everywhere!"#$%&'&()*+,-./012 34*5665756638/9:;< =>?@ABCDEFGHIJ5KL@
More informationTheoretical Syntax Winter Answers to practice problems
Linguistics 325 Sturman Theoretical Syntax Winter 2017 Answers to practice problems 1. Draw trees for the following English sentences. a. I have not been running in the mornings. 1 b. Joel frequently sings
More informationParsing of part-of-speech tagged Assamese Texts
IJCSI International Journal of Computer Science Issues, Vol. 6, No. 1, 2009 ISSN (Online): 1694-0784 ISSN (Print): 1694-0814 28 Parsing of part-of-speech tagged Assamese Texts Mirzanur Rahman 1, Sufal
More informationEdIt: A Broad-Coverage Grammar Checker Using Pattern Grammar
EdIt: A Broad-Coverage Grammar Checker Using Pattern Grammar Chung-Chi Huang Mei-Hua Chen Shih-Ting Huang Jason S. Chang Institute of Information Systems and Applications, National Tsing Hua University,
More informationRe-evaluating the Role of Bleu in Machine Translation Research
Re-evaluating the Role of Bleu in Machine Translation Research Chris Callison-Burch Miles Osborne Philipp Koehn School on Informatics University of Edinburgh 2 Buccleuch Place Edinburgh, EH8 9LW callison-burch@ed.ac.uk
More informationThe stages of event extraction
The stages of event extraction David Ahn Intelligent Systems Lab Amsterdam University of Amsterdam ahn@science.uva.nl Abstract Event detection and recognition is a complex task consisting of multiple sub-tasks
More informationImproved Reordering for Shallow-n Grammar based Hierarchical Phrase-based Translation
Improved Reordering for Shallow-n Grammar based Hierarchical Phrase-based Translation Baskaran Sankaran and Anoop Sarkar School of Computing Science Simon Fraser University Burnaby BC. Canada {baskaran,
More informationSemi-supervised methods of text processing, and an application to medical concept extraction. Yacine Jernite Text-as-Data series September 17.
Semi-supervised methods of text processing, and an application to medical concept extraction Yacine Jernite Text-as-Data series September 17. 2015 What do we want from text? 1. Extract information 2. Link
More informationUsing dialogue context to improve parsing performance in dialogue systems
Using dialogue context to improve parsing performance in dialogue systems Ivan Meza-Ruiz and Oliver Lemon School of Informatics, Edinburgh University 2 Buccleuch Place, Edinburgh I.V.Meza-Ruiz@sms.ed.ac.uk,
More informationLinking Task: Identifying authors and book titles in verbose queries
Linking Task: Identifying authors and book titles in verbose queries Anaïs Ollagnier, Sébastien Fournier, and Patrice Bellot Aix-Marseille University, CNRS, ENSAM, University of Toulon, LSIS UMR 7296,
More informationBasic Parsing with Context-Free Grammars. Some slides adapted from Julia Hirschberg and Dan Jurafsky 1
Basic Parsing with Context-Free Grammars Some slides adapted from Julia Hirschberg and Dan Jurafsky 1 Announcements HW 2 to go out today. Next Tuesday most important for background to assignment Sign up
More informationENGBG1 ENGBL1 Campus Linguistics. Meeting 2. Chapter 7 (Morphology) and chapter 9 (Syntax) Pia Sundqvist
Meeting 2 Chapter 7 (Morphology) and chapter 9 (Syntax) Today s agenda Repetition of meeting 1 Mini-lecture on morphology Seminar on chapter 7, worksheet Mini-lecture on syntax Seminar on chapter 9, worksheet
More informationConstructing Parallel Corpus from Movie Subtitles
Constructing Parallel Corpus from Movie Subtitles Han Xiao 1 and Xiaojie Wang 2 1 School of Information Engineering, Beijing University of Post and Telecommunications artex.xh@gmail.com 2 CISTR, Beijing
More informationBANGLA TO ENGLISH TEXT CONVERSION USING OPENNLP TOOLS
Daffodil International University Institutional Repository DIU Journal of Science and Technology Volume 8, Issue 1, January 2013 2013-01 BANGLA TO ENGLISH TEXT CONVERSION USING OPENNLP TOOLS Uddin, Sk.
More informationChinese Language Parsing with Maximum-Entropy-Inspired Parser
Chinese Language Parsing with Maximum-Entropy-Inspired Parser Heng Lian Brown University Abstract The Chinese language has many special characteristics that make parsing difficult. The performance of state-of-the-art
More informationWords come in categories
Nouns Words come in categories D: A grammatical category is a class of expressions which share a common set of grammatical properties (a.k.a. word class or part of speech). Words come in categories Open
More informationImpact of Controlled Language on Translation Quality and Post-editing in a Statistical Machine Translation Environment
Impact of Controlled Language on Translation Quality and Post-editing in a Statistical Machine Translation Environment Takako Aikawa, Lee Schwartz, Ronit King Mo Corston-Oliver Carmen Lozano Microsoft
More informationIntroduction to HPSG. Introduction. Historical Overview. The HPSG architecture. Signature. Linguistic Objects. Descriptions.
to as a linguistic theory to to a member of the family of linguistic frameworks that are called generative grammars a grammar which is formalized to a high degree and thus makes exact predictions about
More informationAnnotation Projection for Discourse Connectives
SFB 833 / Univ. Tübingen Penn Discourse Treebank Workshop Annotation projection Basic idea: Given a bitext E/F and annotation for F, how would the annotation look for E? Examples: Word Sense Disambiguation
More informationTHE VERB ARGUMENT BROWSER
THE VERB ARGUMENT BROWSER Bálint Sass sass.balint@itk.ppke.hu Péter Pázmány Catholic University, Budapest, Hungary 11 th International Conference on Text, Speech and Dialog 8-12 September 2008, Brno PREVIEW
More informationSEMAFOR: Frame Argument Resolution with Log-Linear Models
SEMAFOR: Frame Argument Resolution with Log-Linear Models Desai Chen or, The Case of the Missing Arguments Nathan Schneider SemEval July 16, 2010 Dipanjan Das School of Computer Science Carnegie Mellon
More informationDEVELOPMENT OF A MULTILINGUAL PARALLEL CORPUS AND A PART-OF-SPEECH TAGGER FOR AFRIKAANS
DEVELOPMENT OF A MULTILINGUAL PARALLEL CORPUS AND A PART-OF-SPEECH TAGGER FOR AFRIKAANS Julia Tmshkina Centre for Text Techitology, North-West University, 253 Potchefstroom, South Africa 2025770@puk.ac.za
More informationThe taming of the data:
The taming of the data: Using text mining in building a corpus for diachronic analysis Stefania Degaetano-Ortlieb, Hannah Kermes, Ashraf Khamis, Jörg Knappen, Noam Ordan and Elke Teich Background Big data
More informationAccurate Unlexicalized Parsing for Modern Hebrew
Accurate Unlexicalized Parsing for Modern Hebrew Reut Tsarfaty and Khalil Sima an Institute for Logic, Language and Computation, University of Amsterdam Plantage Muidergracht 24, 1018TV Amsterdam, The
More information2/15/13. POS Tagging Problem. Part-of-Speech Tagging. Example English Part-of-Speech Tagsets. More Details of the Problem. Typical Problem Cases
POS Tagging Problem Part-of-Speech Tagging L545 Spring 203 Given a sentence W Wn and a tagset of lexical categories, find the most likely tag T..Tn for each word in the sentence Example Secretariat/P is/vbz
More informationProject in the framework of the AIM-WEST project Annotation of MWEs for translation
Project in the framework of the AIM-WEST project Annotation of MWEs for translation 1 Agnès Tutin LIDILEM/LIG Université Grenoble Alpes 30 october 2014 Outline 2 Why annotate MWEs in corpora? A first experiment
More informationUNIVERSITY OF OSLO Department of Informatics. Dialog Act Recognition using Dependency Features. Master s thesis. Sindre Wetjen
UNIVERSITY OF OSLO Department of Informatics Dialog Act Recognition using Dependency Features Master s thesis Sindre Wetjen November 15, 2013 Acknowledgments First I want to thank my supervisors Lilja
More informationWeb as Corpus. Corpus Linguistics. Web as Corpus 1 / 1. Corpus Linguistics. Web as Corpus. web.pl 3 / 1. Sketch Engine. Corpus Linguistics
(L615) Markus Dickinson Department of Linguistics, Indiana University Spring 2013 The web provides new opportunities for gathering data Viable source of disposable corpora, built ad hoc for specific purposes
More informationSyntax Parsing 1. Grammars and parsing 2. Top-down and bottom-up parsing 3. Chart parsers 4. Bottom-up chart parsing 5. The Earley Algorithm
Syntax Parsing 1. Grammars and parsing 2. Top-down and bottom-up parsing 3. Chart parsers 4. Bottom-up chart parsing 5. The Earley Algorithm syntax: from the Greek syntaxis, meaning setting out together
More informationThe Smart/Empire TIPSTER IR System
The Smart/Empire TIPSTER IR System Chris Buckley, Janet Walz Sabir Research, Gaithersburg, MD chrisb,walz@sabir.com Claire Cardie, Scott Mardis, Mandar Mitra, David Pierce, Kiri Wagstaff Department of
More informationExperts Retrieval with Multiword-Enhanced Author Topic Model
NAACL 10 Workshop on Semantic Search Experts Retrieval with Multiword-Enhanced Author Topic Model Nikhil Johri Dan Roth Yuancheng Tu Dept. of Computer Science Dept. of Linguistics University of Illinois
More informationThe Internet as a Normative Corpus: Grammar Checking with a Search Engine
The Internet as a Normative Corpus: Grammar Checking with a Search Engine Jonas Sjöbergh KTH Nada SE-100 44 Stockholm, Sweden jsh@nada.kth.se Abstract In this paper some methods using the Internet as a
More informationSome Principles of Automated Natural Language Information Extraction
Some Principles of Automated Natural Language Information Extraction Gregers Koch Department of Computer Science, Copenhagen University DIKU, Universitetsparken 1, DK-2100 Copenhagen, Denmark Abstract
More informationA heuristic framework for pivot-based bilingual dictionary induction
2013 International Conference on Culture and Computing A heuristic framework for pivot-based bilingual dictionary induction Mairidan Wushouer, Toru Ishida, Donghui Lin Department of Social Informatics,
More informationSystem Implementation for SemEval-2017 Task 4 Subtask A Based on Interpolated Deep Neural Networks
System Implementation for SemEval-2017 Task 4 Subtask A Based on Interpolated Deep Neural Networks 1 Tzu-Hsuan Yang, 2 Tzu-Hsuan Tseng, and 3 Chia-Ping Chen Department of Computer Science and Engineering
More informationNatural Language Processing. George Konidaris
Natural Language Processing George Konidaris gdk@cs.brown.edu Fall 2017 Natural Language Processing Understanding spoken/written sentences in a natural language. Major area of research in AI. Why? Humans
More informationA Graph Based Authorship Identification Approach
A Graph Based Authorship Identification Approach Notebook for PAN at CLEF 2015 Helena Gómez-Adorno 1, Grigori Sidorov 1, David Pinto 2, and Ilia Markov 1 1 Center for Computing Research, Instituto Politécnico
More informationChapter 4: Valence & Agreement CSLI Publications
Chapter 4: Valence & Agreement Reminder: Where We Are Simple CFG doesn t allow us to cross-classify categories, e.g., verbs can be grouped by transitivity (deny vs. disappear) or by number (deny vs. denies).
More informationTINE: A Metric to Assess MT Adequacy
TINE: A Metric to Assess MT Adequacy Miguel Rios, Wilker Aziz and Lucia Specia Research Group in Computational Linguistics University of Wolverhampton Stafford Street, Wolverhampton, WV1 1SB, UK {m.rios,
More informationESSLLI 2010: Resource-light Morpho-syntactic Analysis of Highly
ESSLLI 2010: Resource-light Morpho-syntactic Analysis of Highly Inflected Languages Classical Approaches to Tagging The slides are posted on the web. The url is http://chss.montclair.edu/~feldmana/esslli10/.
More informationWhat Can Neural Networks Teach us about Language? Graham Neubig a2-dlearn 11/18/2017
What Can Neural Networks Teach us about Language? Graham Neubig a2-dlearn 11/18/2017 Supervised Training of Neural Networks for Language Training Data Training Model this is an example the cat went to
More informationMETHODS FOR EXTRACTING AND CLASSIFYING PAIRS OF COGNATES AND FALSE FRIENDS
METHODS FOR EXTRACTING AND CLASSIFYING PAIRS OF COGNATES AND FALSE FRIENDS Ruslan Mitkov (R.Mitkov@wlv.ac.uk) University of Wolverhampton ViktorPekar (v.pekar@wlv.ac.uk) University of Wolverhampton Dimitar
More informationModeling full form lexica for Arabic
Modeling full form lexica for Arabic Susanne Alt Amine Akrout Atilf-CNRS Laurent Romary Loria-CNRS Objectives Presentation of the current standardization activity in the domain of lexical data modeling
More informationL1 and L2 acquisition. Holger Diessel
L1 and L2 acquisition Holger Diessel Schedule Comparing L1 and L2 acquisition The role of the native language in L2 acquisition The critical period hypothesis [student presentation] Non-linguistic factors
More informationGreedy Decoding for Statistical Machine Translation in Almost Linear Time
in: Proceedings of HLT-NAACL 23. Edmonton, Canada, May 27 June 1, 23. This version was produced on April 2, 23. Greedy Decoding for Statistical Machine Translation in Almost Linear Time Ulrich Germann
More informationCase government vs Case agreement: modelling Modern Greek case attraction phenomena in LFG
Case government vs Case agreement: modelling Modern Greek case attraction phenomena in LFG Dr. Kakia Chatsiou, University of Essex achats at essex.ac.uk Explorations in Syntactic Government and Subcategorisation,
More informationMemory-based grammatical error correction
Memory-based grammatical error correction Antal van den Bosch Peter Berck Radboud University Nijmegen Tilburg University P.O. Box 9103 P.O. Box 90153 NL-6500 HD Nijmegen, The Netherlands NL-5000 LE Tilburg,
More informationEnsemble Technique Utilization for Indonesian Dependency Parser
Ensemble Technique Utilization for Indonesian Dependency Parser Arief Rahman Institut Teknologi Bandung Indonesia 23516008@std.stei.itb.ac.id Ayu Purwarianti Institut Teknologi Bandung Indonesia ayu@stei.itb.ac.id
More informationA Computational Evaluation of Case-Assignment Algorithms
A Computational Evaluation of Case-Assignment Algorithms Miles Calabresi Advisors: Bob Frank and Jim Wood Submitted to the faculty of the Department of Linguistics in partial fulfillment of the requirements
More informationRegression for Sentence-Level MT Evaluation with Pseudo References
Regression for Sentence-Level MT Evaluation with Pseudo References Joshua S. Albrecht and Rebecca Hwa Department of Computer Science University of Pittsburgh {jsa8,hwa}@cs.pitt.edu Abstract Many automatic
More informationUnderlying and Surface Grammatical Relations in Greek consider
0 Underlying and Surface Grammatical Relations in Greek consider Sentences Brian D. Joseph The Ohio State University Abbreviated Title Grammatical Relations in Greek consider Sentences Brian D. Joseph
More informationConstruction Grammar. University of Jena.
Construction Grammar Holger Diessel University of Jena holger.diessel@uni-jena.de http://www.holger-diessel.de/ Words seem to have a prototype structure; but language does not only consist of words. What
More informationApplications of memory-based natural language processing
Applications of memory-based natural language processing Antal van den Bosch and Roser Morante ILK Research Group Tilburg University Prague, June 24, 2007 Current ILK members Principal investigator: Antal
More informationDetecting English-French Cognates Using Orthographic Edit Distance
Detecting English-French Cognates Using Orthographic Edit Distance Qiongkai Xu 1,2, Albert Chen 1, Chang i 1 1 The Australian National University, College of Engineering and Computer Science 2 National
More informationMultilingual Sentiment and Subjectivity Analysis
Multilingual Sentiment and Subjectivity Analysis Carmen Banea and Rada Mihalcea Department of Computer Science University of North Texas rada@cs.unt.edu, carmen.banea@gmail.com Janyce Wiebe Department
More informationA Minimalist Approach to Code-Switching. In the field of linguistics, the topic of bilingualism is a broad one. There are many
Schmidt 1 Eric Schmidt Prof. Suzanne Flynn Linguistic Study of Bilingualism December 13, 2013 A Minimalist Approach to Code-Switching In the field of linguistics, the topic of bilingualism is a broad one.
More informationProblems of the Arabic OCR: New Attitudes
Problems of the Arabic OCR: New Attitudes Prof. O.Redkin, Dr. O.Bernikova Department of Asian and African Studies, St. Petersburg State University, St Petersburg, Russia Abstract - This paper reviews existing
More informationSpecification and Evaluation of Machine Translation Toy Systems - Criteria for laboratory assignments
Specification and Evaluation of Machine Translation Toy Systems - Criteria for laboratory assignments Cristina Vertan, Walther v. Hahn University of Hamburg, Natural Language Systems Division Hamburg,
More informationLTAG-spinal and the Treebank
LTAG-spinal and the Treebank a new resource for incremental, dependency and semantic parsing Libin Shen (lshen@bbn.com) BBN Technologies, 10 Moulton Street, Cambridge, MA 02138, USA Lucas Champollion (champoll@ling.upenn.edu)
More informationhave to be modeled) or isolated words. Output of the system is a grapheme-tophoneme conversion system which takes as its input the spelling of words,
A Language-Independent, Data-Oriented Architecture for Grapheme-to-Phoneme Conversion Walter Daelemans and Antal van den Bosch Proceedings ESCA-IEEE speech synthesis conference, New York, September 1994
More informationBasic Syntax. Doug Arnold We review some basic grammatical ideas and terminology, and look at some common constructions in English.
Basic Syntax Doug Arnold doug@essex.ac.uk We review some basic grammatical ideas and terminology, and look at some common constructions in English. 1 Categories 1.1 Word level (lexical and functional)
More informationOverview of the 3rd Workshop on Asian Translation
Overview of the 3rd Workshop on Asian Translation Toshiaki Nakazawa Chenchen Ding and Hideya Mino Japan Science and National Institute of Technology Agency Information and nakazawa@pa.jst.jp Communications
More informationInteractive Corpus Annotation of Anaphor Using NLP Algorithms
Interactive Corpus Annotation of Anaphor Using NLP Algorithms Catherine Smith 1 and Matthew Brook O Donnell 1 1. Introduction Pronouns occur with a relatively high frequency in all forms English discourse.
More informationAn Interactive Intelligent Language Tutor Over The Internet
An Interactive Intelligent Language Tutor Over The Internet Trude Heift Linguistics Department and Language Learning Centre Simon Fraser University, B.C. Canada V5A1S6 E-mail: heift@sfu.ca Abstract: This
More informationThe Role of the Head in the Interpretation of English Deverbal Compounds
The Role of the Head in the Interpretation of English Deverbal Compounds Gianina Iordăchioaia i, Lonneke van der Plas ii, Glorianna Jagfeld i (Universität Stuttgart i, University of Malta ii ) Wen wurmt
More informationOCR for Arabic using SIFT Descriptors With Online Failure Prediction
OCR for Arabic using SIFT Descriptors With Online Failure Prediction Andrey Stolyarenko, Nachum Dershowitz The Blavatnik School of Computer Science Tel Aviv University Tel Aviv, Israel Email: stloyare@tau.ac.il,
More informationFlorida Reading Endorsement Alignment Matrix Competency 1
Florida Reading Endorsement Alignment Matrix Competency 1 Reading Endorsement Guiding Principle: Teachers will understand and teach reading as an ongoing strategic process resulting in students comprehending
More informationMulti-Lingual Text Leveling
Multi-Lingual Text Leveling Salim Roukos, Jerome Quin, and Todd Ward IBM T. J. Watson Research Center, Yorktown Heights, NY 10598 {roukos,jlquinn,tward}@us.ibm.com Abstract. Determining the language proficiency
More informationClickthrough-Based Translation Models for Web Search: from Word Models to Phrase Models
Clickthrough-Based Translation Models for Web Search: from Word Models to Phrase Models Jianfeng Gao Microsoft Research One Microsoft Way Redmond, WA 98052 USA jfgao@microsoft.com Xiaodong He Microsoft
More informationLearning Structural Correspondences Across Different Linguistic Domains with Synchronous Neural Language Models
Learning Structural Correspondences Across Different Linguistic Domains with Synchronous Neural Language Models Stephan Gouws and GJ van Rooyen MIH Medialab, Stellenbosch University SOUTH AFRICA {stephan,gvrooyen}@ml.sun.ac.za
More information1/20 idea. We ll spend an extra hour on 1/21. based on assigned readings. so you ll be ready to discuss them in class
If we cancel class 1/20 idea We ll spend an extra hour on 1/21 I ll give you a brief writing problem for 1/21 based on assigned readings Jot down your thoughts based on your reading so you ll be ready
More informationThe RWTH Aachen University English-German and German-English Machine Translation System for WMT 2017
The RWTH Aachen University English-German and German-English Machine Translation System for WMT 2017 Jan-Thorsten Peter, Andreas Guta, Tamer Alkhouli, Parnia Bahar, Jan Rosendahl, Nick Rossenbach, Miguel
More informationSINGLE DOCUMENT AUTOMATIC TEXT SUMMARIZATION USING TERM FREQUENCY-INVERSE DOCUMENT FREQUENCY (TF-IDF)
SINGLE DOCUMENT AUTOMATIC TEXT SUMMARIZATION USING TERM FREQUENCY-INVERSE DOCUMENT FREQUENCY (TF-IDF) Hans Christian 1 ; Mikhael Pramodana Agus 2 ; Derwin Suhartono 3 1,2,3 Computer Science Department,
More informationEnhancing Morphological Alignment for Translating Highly Inflected Languages
Enhancing Morphological Alignment for Translating Highly Inflected Languages Minh-Thang Luong School of Computing National University of Singapore luongmin@comp.nus.edu.sg Min-Yen Kan School of Computing
More informationUniversity of Alberta. Large-Scale Semi-Supervised Learning for Natural Language Processing. Shane Bergsma
University of Alberta Large-Scale Semi-Supervised Learning for Natural Language Processing by Shane Bergsma A thesis submitted to the Faculty of Graduate Studies and Research in partial fulfillment of
More informationModeling Attachment Decisions with a Probabilistic Parser: The Case of Head Final Structures
Modeling Attachment Decisions with a Probabilistic Parser: The Case of Head Final Structures Ulrike Baldewein (ulrike@coli.uni-sb.de) Computational Psycholinguistics, Saarland University D-66041 Saarbrücken,
More informationProof Theory for Syntacticians
Department of Linguistics Ohio State University Syntax 2 (Linguistics 602.02) January 5, 2012 Logics for Linguistics Many different kinds of logic are directly applicable to formalizing theories in syntax
More informationGrammars & Parsing, Part 1:
Grammars & Parsing, Part 1: Rules, representations, and transformations- oh my! Sentence VP The teacher Verb gave the lecture 2015-02-12 CS 562/662: Natural Language Processing Game plan for today: Review
More informationDerivational and Inflectional Morphemes in Pak-Pak Language
Derivational and Inflectional Morphemes in Pak-Pak Language Agustina Situmorang and Tima Mariany Arifin ABSTRACT The objectives of this study are to find out the derivational and inflectional morphemes
More informationExtracting Opinion Expressions and Their Polarities Exploration of Pipelines and Joint Models
Extracting Opinion Expressions and Their Polarities Exploration of Pipelines and Joint Models Richard Johansson and Alessandro Moschitti DISI, University of Trento Via Sommarive 14, 38123 Trento (TN),
More informationDiscriminative Learning of Beam-Search Heuristics for Planning
Discriminative Learning of Beam-Search Heuristics for Planning Yuehua Xu School of EECS Oregon State University Corvallis,OR 97331 xuyu@eecs.oregonstate.edu Alan Fern School of EECS Oregon State University
More information