FACTORED TRANSLATION MODELS. Raj Dabre Raksha Sharma Avishek Dan
|
|
- Rosemary Flynn
- 5 years ago
- Views:
Transcription
1 FACTORED TRANSLATION MODELS Raj Dabre Raksha Sharma Avishek Dan
2 Purpose of the talk To give motivations for Factored Based Machine Translation (FBMT) To cover the basic concepts of FBMT To highlight all possible factors that can help in translation To illustrate the process of FBMT
3 Flow of the Presentation Motivation Introduction to FBMT Decomposing the FBMT Process Lemma Translation Morphology Translation Generation Statistical Model Training Combining Components Decoding Experiments and Analysis Conclusion
4 Motivation Consider the example: यह म र ग ड़ ह {yaha meri gaadi hai} {This is my car} ग ड़ + Plural य म र ग ड़ड़य ह {ye meri gaadiyan hai} {These are my cars} Utilize Factors to overcome data sparsity
5 FBMT in Vauquois Triangle
6 Another motivating example from the Orient わたしのなまえわラジです Watashino namaewa laji desu. My name is Raj. Difficult to know word mappings. Suppose POS tags are given. わたしの (PRON) なまえわ (NN) ラジ (NNP) です (VM/VCOP). My(PRON) name(nn) is(vm/vcop) Raj(NNP). Mappings easier to deduce. Factors reduce uncertainty.
7 Introduction to FBMT Definition FBMT is an extension of phrase-based statistical machine translation models that integrates additional annotation at the wordlevel. Annotations can be linguistic markup or automatically generated word classes.
8 Factors to Exploit Surface form Lemma Part-of-speech Morphological features gender count and case Automatic word classes Shallow syntactic tags Dedicated factors to ensure agreement
9 Example ग ड़ड़य from (य म र ग ड़ड़य ह ) Surface form: ग ड़ड़य Lemma: ग ड़ Part-of-speech: NN Morphological features Gender: female Number: plural Case: Accusative Shallow syntactic tags: NP
10 Decomposition of FBMT
11 Decomposition of FBMT For translating cars to ग ड़ य Translate input lemmas into output lemmas car to ग ड़ Translate morphological and POS factors Noun to Noun Plural to Plural Neuter to Female Generate surface forms given the lemma and linguistic factors ग ड़ + Noun + Plural + Female = ग ड़ य
12 Statistical Model - Training Automatically annotate the parallel corpus with additional factors POS, Morphology Word Alignment using GIZA++ Can specify alignment basis POS to POS, Theta roles to Theta roles etc. Can use any combination of factors 3 types of tables generated Lemma translation (source lemma to target lemma) Morphology translation (source morphology to target morphology) Word Generation (target lemma + target morphology to target word form)
13 Annotating the corpus Use POS taggers, Shallow Syntactic parsers, UNL and dependency parsers for generating factors. Example: These are my cars य म र ग ड़ य ह These this DET subj Are is VM/VCOP present My me PRON possessor Cars car NN neuter, plural, object य यह DET subj म र म र PRON possessor ग ड़ड़य ग ड़ NN feminine, plural, object ह ह न VM/VCOP present
14 Alignments of Phrases यह म र ग ह This Is My Car
15 Alignments of Factors DET-Subj DET/Subj PRON - Poss NN - Fem VM/VCOP - Pres VM/VCOP- Pres PRON-Poss NN-Neu
16 Translation Tables Sr. English Phrase Hindi Phrase 1 This यह 2 My Car म र ग ड़ 3 Is ह Sr. English Factors Hindi Factors 1 DET-Subj DET-Subj 2 PRON-possessor NN- neuter, plural, object Lemma Translation Table PRON-possessor NN-feminine, plural, object 3 VM/VCOP-present VM/VCOP-present Factor Translation Table
17 Generation Table (Target Language) Sr no Lemma Factors Surface word 1 यह DET+subj य 2 म र PRON+possessor म र 3 ग ड़ NN+feminine, plural, object ग ड़ड़य 4 ह न VM/VCOP+present ह
18 A tougher example I am going home म घर ज रह ह {Main ghar jaa raha hoon} Here home is neuter, singular which is mapped to घर which is masculine, singular. Non trivial mapping since difference in gender. Here am going is mapped to ज रह ह. Non trivial since 2 word phrase mapped to 3 word phrase. Am going has factors [(is VM Present) (go VAUX greund, continuous)] ज रह ह has factors [(ज न VM )(रहन VAUX continuous)(ह न VAUX Present)] Here extracting the factor mappings is also non trivial. Difficulty is greater when small phrases map to big phrases. Morphologically rich to morphologically poor languages.
19 Alignments of Phrases म घर ज न रहन ह न I Is Go Home
20 Alignments of Factors PRON VM,PRES VAUX,CONT NN, NEUTER PRON NN,FEMININE VM VAUX,CONT VAUX,PRES
21 Components of FBMT
22 Combining the Components
23 Decoding Beam search decoding algorithm is used. Start with empty hypothesis. Generate and add hypothesis until full sentence is covered. Highest scoring complete hypothesis is the best translation. Per phrase translation options limited to 50 to address combinatorial explosion.
24 EXPERIMENTS AND RESULTS
25 Corpus English German Training: Europarl corpus Training: News Commentary corpus Test: WMT 2006 test set English Spanish Training: Europarl corpus English Czech WSJ corpus
26 Syntactic Enrichment Implementation part of Moses Factors Surface word (3 gram) POS (7 gram) Morphological Shallow syntactic Higher order sequence model obtained supports syntactic coherence of output
27 Syntactic Enrichment Results
28 Morphological Analysis and Generation Translate word lemma and morphology separately Pure lemma/ morph model yields poor results Evidence based choice of model 21% of unknown word forms translated
29 Conclusion Incorporating linguistic tools in the translation model improves translation accuracy Linguistic tools ensure grammatical coherence Separate translation of lemma and morphology leads to better handling of OOV words Complex factor models lead to larger search space and increased computation time
30 References Philipp Koehn and Hieu Hoang, Factored Translation Models, Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning,2007 Ananthakrishnan Ramanathan, Hansraj Choudhary, Avishek Ghosh, and Pushpak Bhattacharyya Case markers and morphology: Addressing the crux of the fluency problem in English- Hindi SMT. In Proceedings of the Joint Conference of the 47th Annual Meeting of the Association for Computational Linguistics and the 4th International Joint Conference on Natural Language Processing of the Asian Federation of Natural Language Processing, pages , Suntec, Singapore.Association for Computational Linguistics,2009.
Cross-Lingual Dependency Parsing with Universal Dependencies and Predicted PoS Labels
Cross-Lingual Dependency Parsing with Universal Dependencies and Predicted PoS Labels Jörg Tiedemann Uppsala University Department of Linguistics and Philology firstname.lastname@lingfil.uu.se Abstract
More informationHinMA: Distributed Morphology based Hindi Morphological Analyzer
HinMA: Distributed Morphology based Hindi Morphological Analyzer Ankit Bahuguna TU Munich ankitbahuguna@outlook.com Lavita Talukdar IIT Bombay lavita.talukdar@gmail.com Pushpak Bhattacharyya IIT Bombay
More informationTraining and evaluation of POS taggers on the French MULTITAG corpus
Training and evaluation of POS taggers on the French MULTITAG corpus A. Allauzen, H. Bonneau-Maynard LIMSI/CNRS; Univ Paris-Sud, Orsay, F-91405 {allauzen,maynard}@limsi.fr Abstract The explicit introduction
More informationEnhancing Unlexicalized Parsing Performance using a Wide Coverage Lexicon, Fuzzy Tag-set Mapping, and EM-HMM-based Lexical Probabilities
Enhancing Unlexicalized Parsing Performance using a Wide Coverage Lexicon, Fuzzy Tag-set Mapping, and EM-HMM-based Lexical Probabilities Yoav Goldberg Reut Tsarfaty Meni Adler Michael Elhadad Ben Gurion
More informationThe Karlsruhe Institute of Technology Translation Systems for the WMT 2011
The Karlsruhe Institute of Technology Translation Systems for the WMT 2011 Teresa Herrmann, Mohammed Mediani, Jan Niehues and Alex Waibel Karlsruhe Institute of Technology Karlsruhe, Germany firstname.lastname@kit.edu
More informationIntroduction to HPSG. Introduction. Historical Overview. The HPSG architecture. Signature. Linguistic Objects. Descriptions.
to as a linguistic theory to to a member of the family of linguistic frameworks that are called generative grammars a grammar which is formalized to a high degree and thus makes exact predictions about
More informationEdIt: A Broad-Coverage Grammar Checker Using Pattern Grammar
EdIt: A Broad-Coverage Grammar Checker Using Pattern Grammar Chung-Chi Huang Mei-Hua Chen Shih-Ting Huang Jason S. Chang Institute of Information Systems and Applications, National Tsing Hua University,
More informationक त क ई-व द य लय पत र क 2016 KENDRIYA VIDYALAYA ADILABAD
क त क ई-व द य लय पत र क 2016 KENDRIYA VIDYALAYA ADILABAD FROM PRINCIPAL S KALAM Dear all, Only when one is equipped with both, worldly education for living and spiritual education, he/she deserves respect
More informationCROSS LANGUAGE INFORMATION RETRIEVAL: IN INDIAN LANGUAGE PERSPECTIVE
CROSS LANGUAGE INFORMATION RETRIEVAL: IN INDIAN LANGUAGE PERSPECTIVE Pratibha Bajpai 1, Dr. Parul Verma 2 1 Research Scholar, Department of Information Technology, Amity University, Lucknow 2 Assistant
More informationTarget Language Preposition Selection an Experiment with Transformation-Based Learning and Aligned Bilingual Data
Target Language Preposition Selection an Experiment with Transformation-Based Learning and Aligned Bilingual Data Ebba Gustavii Department of Linguistics and Philology, Uppsala University, Sweden ebbag@stp.ling.uu.se
More informationarxiv: v1 [cs.cl] 2 Apr 2017
Word-Alignment-Based Segment-Level Machine Translation Evaluation using Word Embeddings Junki Matsuo and Mamoru Komachi Graduate School of System Design, Tokyo Metropolitan University, Japan matsuo-junki@ed.tmu.ac.jp,
More informationAccurate Unlexicalized Parsing for Modern Hebrew
Accurate Unlexicalized Parsing for Modern Hebrew Reut Tsarfaty and Khalil Sima an Institute for Logic, Language and Computation, University of Amsterdam Plantage Muidergracht 24, 1018TV Amsterdam, The
More informationA Graph Based Authorship Identification Approach
A Graph Based Authorship Identification Approach Notebook for PAN at CLEF 2015 Helena Gómez-Adorno 1, Grigori Sidorov 1, David Pinto 2, and Ilia Markov 1 1 Center for Computing Research, Instituto Politécnico
More informationExploiting Phrasal Lexica and Additional Morpho-syntactic Language Resources for Statistical Machine Translation with Scarce Training Data
Exploiting Phrasal Lexica and Additional Morpho-syntactic Language Resources for Statistical Machine Translation with Scarce Training Data Maja Popović and Hermann Ney Lehrstuhl für Informatik VI, Computer
More informationThe KIT-LIMSI Translation System for WMT 2014
The KIT-LIMSI Translation System for WMT 2014 Quoc Khanh Do, Teresa Herrmann, Jan Niehues, Alexandre Allauzen, François Yvon and Alex Waibel LIMSI-CNRS, Orsay, France Karlsruhe Institute of Technology,
More informationChunk Parsing for Base Noun Phrases using Regular Expressions. Let s first let the variable s0 be the sentence tree of the first sentence.
NLP Lab Session Week 8 October 15, 2014 Noun Phrase Chunking and WordNet in NLTK Getting Started In this lab session, we will work together through a series of small examples using the IDLE window and
More informationJapanese Language Course 2017/18
Japanese Language Course 2017/18 The Faculty of Philosophy, University of Sarajevo is pleased to announce that a Japanese language course, taught by a native Japanese speaker, will be offered to the citizens
More informationTwo methods to incorporate local morphosyntactic features in Hindi dependency
Two methods to incorporate local morphosyntactic features in Hindi dependency parsing Bharat Ram Ambati, Samar Husain, Sambhav Jain, Dipti Misra Sharma and Rajeev Sangal Language Technologies Research
More informationLinking Task: Identifying authors and book titles in verbose queries
Linking Task: Identifying authors and book titles in verbose queries Anaïs Ollagnier, Sébastien Fournier, and Patrice Bellot Aix-Marseille University, CNRS, ENSAM, University of Toulon, LSIS UMR 7296,
More informationProject in the framework of the AIM-WEST project Annotation of MWEs for translation
Project in the framework of the AIM-WEST project Annotation of MWEs for translation 1 Agnès Tutin LIDILEM/LIG Université Grenoble Alpes 30 october 2014 Outline 2 Why annotate MWEs in corpora? A first experiment
More informationModeling full form lexica for Arabic
Modeling full form lexica for Arabic Susanne Alt Amine Akrout Atilf-CNRS Laurent Romary Loria-CNRS Objectives Presentation of the current standardization activity in the domain of lexical data modeling
More informationPrediction of Maximal Projection for Semantic Role Labeling
Prediction of Maximal Projection for Semantic Role Labeling Weiwei Sun, Zhifang Sui Institute of Computational Linguistics Peking University Beijing, 100871, China {ws, szf}@pku.edu.cn Haifeng Wang Toshiba
More informationSemi-supervised methods of text processing, and an application to medical concept extraction. Yacine Jernite Text-as-Data series September 17.
Semi-supervised methods of text processing, and an application to medical concept extraction Yacine Jernite Text-as-Data series September 17. 2015 What do we want from text? 1. Extract information 2. Link
More information3 Character-based KJ Translation
NICT at WAT 2015 Chenchen Ding, Masao Utiyama, Eiichiro Sumita Multilingual Translation Laboratory National Institute of Information and Communications Technology 3-5 Hikaridai, Seikacho, Sorakugun, Kyoto,
More informationDCA प रय जन क य म ग नद शक द र श नद श लय मह म ग ध अ तरर य ह द व व व लय प ट ह द व व व लय, ग ध ह स, वध (मह र ) DCA-09 Project Work Handbook
मह म ग ध अ तरर य ह द व व व लय (स सद र प रत अ ध नयम 1997, म क 3 क अ तगत थ पत क य व व व लय) Mahatma Gandhi Antarrashtriya Hindi Vishwavidyalaya (A Central University Established by Parliament by Act No.
More informationTowards a Machine-Learning Architecture for Lexical Functional Grammar Parsing. Grzegorz Chrupa la
Towards a Machine-Learning Architecture for Lexical Functional Grammar Parsing Grzegorz Chrupa la A dissertation submitted in fulfilment of the requirements for the award of Doctor of Philosophy (Ph.D.)
More informationLanguage Model and Grammar Extraction Variation in Machine Translation
Language Model and Grammar Extraction Variation in Machine Translation Vladimir Eidelman, Chris Dyer, and Philip Resnik UMIACS Laboratory for Computational Linguistics and Information Processing Department
More informationTHE VERB ARGUMENT BROWSER
THE VERB ARGUMENT BROWSER Bálint Sass sass.balint@itk.ppke.hu Péter Pázmány Catholic University, Budapest, Hungary 11 th International Conference on Text, Speech and Dialog 8-12 September 2008, Brno PREVIEW
More informationUNIVERSITY OF OSLO Department of Informatics. Dialog Act Recognition using Dependency Features. Master s thesis. Sindre Wetjen
UNIVERSITY OF OSLO Department of Informatics Dialog Act Recognition using Dependency Features Master s thesis Sindre Wetjen November 15, 2013 Acknowledgments First I want to thank my supervisors Lilja
More informationHeuristic Sample Selection to Minimize Reference Standard Training Set for a Part-Of-Speech Tagger
Page 1 of 35 Heuristic Sample Selection to Minimize Reference Standard Training Set for a Part-Of-Speech Tagger Kaihong Liu, MD, MS, Wendy Chapman, PhD, Rebecca Hwa, PhD, and Rebecca S. Crowley, MD, MS
More informationWeb as Corpus. Corpus Linguistics. Web as Corpus 1 / 1. Corpus Linguistics. Web as Corpus. web.pl 3 / 1. Sketch Engine. Corpus Linguistics
(L615) Markus Dickinson Department of Linguistics, Indiana University Spring 2013 The web provides new opportunities for gathering data Viable source of disposable corpora, built ad hoc for specific purposes
More informationCross Language Information Retrieval
Cross Language Information Retrieval RAFFAELLA BERNARDI UNIVERSITÀ DEGLI STUDI DI TRENTO P.ZZA VENEZIA, ROOM: 2.05, E-MAIL: BERNARDI@DISI.UNITN.IT Contents 1 Acknowledgment.............................................
More informationParsing of part-of-speech tagged Assamese Texts
IJCSI International Journal of Computer Science Issues, Vol. 6, No. 1, 2009 ISSN (Online): 1694-0784 ISSN (Print): 1694-0814 28 Parsing of part-of-speech tagged Assamese Texts Mirzanur Rahman 1, Sufal
More informationProblems of the Arabic OCR: New Attitudes
Problems of the Arabic OCR: New Attitudes Prof. O.Redkin, Dr. O.Bernikova Department of Asian and African Studies, St. Petersburg State University, St Petersburg, Russia Abstract - This paper reviews existing
More informationWords come in categories
Nouns Words come in categories D: A grammatical category is a class of expressions which share a common set of grammatical properties (a.k.a. word class or part of speech). Words come in categories Open
More informationS. RAZA GIRLS HIGH SCHOOL
S. RAZA GIRLS HIGH SCHOOL SYLLABUS SESSION 2017-2018 STD. III PRESCRIBED BOOKS ENGLISH 1) NEW WORLD READER 2) THE ENGLISH CHANNEL 3) EASY ENGLISH GRAMMAR SYLLABUS TO BE COVERED MONTH NEW WORLD READER THE
More informationOn the Notion Determiner
On the Notion Determiner Frank Van Eynde University of Leuven Proceedings of the 10th International Conference on Head-Driven Phrase Structure Grammar Michigan State University Stefan Müller (Editor) 2003
More informationWritten by: YULI AMRIA (RRA1B210085) ABSTRACT. Key words: ability, possessive pronouns, and possessive adjectives INTRODUCTION
STUDYING GRAMMAR OF ENGLISH AS A FOREIGN LANGUAGE: STUDENTS ABILITY IN USING POSSESSIVE PRONOUNS AND POSSESSIVE ADJECTIVES IN ONE JUNIOR HIGH SCHOOL IN JAMBI CITY Written by: YULI AMRIA (RRA1B210085) ABSTRACT
More informationBANGLA TO ENGLISH TEXT CONVERSION USING OPENNLP TOOLS
Daffodil International University Institutional Repository DIU Journal of Science and Technology Volume 8, Issue 1, January 2013 2013-01 BANGLA TO ENGLISH TEXT CONVERSION USING OPENNLP TOOLS Uddin, Sk.
More informationSINGLE DOCUMENT AUTOMATIC TEXT SUMMARIZATION USING TERM FREQUENCY-INVERSE DOCUMENT FREQUENCY (TF-IDF)
SINGLE DOCUMENT AUTOMATIC TEXT SUMMARIZATION USING TERM FREQUENCY-INVERSE DOCUMENT FREQUENCY (TF-IDF) Hans Christian 1 ; Mikhael Pramodana Agus 2 ; Derwin Suhartono 3 1,2,3 Computer Science Department,
More informationNoisy SMS Machine Translation in Low-Density Languages
Noisy SMS Machine Translation in Low-Density Languages Vladimir Eidelman, Kristy Hollingshead, and Philip Resnik UMIACS Laboratory for Computational Linguistics and Information Processing Department of
More informationUnderlying and Surface Grammatical Relations in Greek consider
0 Underlying and Surface Grammatical Relations in Greek consider Sentences Brian D. Joseph The Ohio State University Abbreviated Title Grammatical Relations in Greek consider Sentences Brian D. Joseph
More informationRole of Pausing in Text-to-Speech Synthesis for Simultaneous Interpretation
Role of Pausing in Text-to-Speech Synthesis for Simultaneous Interpretation Vivek Kumar Rangarajan Sridhar, John Chen, Srinivas Bangalore, Alistair Conkie AT&T abs - Research 180 Park Avenue, Florham Park,
More informationUsing dialogue context to improve parsing performance in dialogue systems
Using dialogue context to improve parsing performance in dialogue systems Ivan Meza-Ruiz and Oliver Lemon School of Informatics, Edinburgh University 2 Buccleuch Place, Edinburgh I.V.Meza-Ruiz@sms.ed.ac.uk,
More informationUC Berkeley Berkeley Undergraduate Journal of Classics
UC Berkeley Berkeley Undergraduate Journal of Classics Title The Declension of Bloom: Grammar, Diversion, and Union in Joyce s Ulysses Permalink https://escholarship.org/uc/item/56m627ts Journal Berkeley
More informationESSLLI 2010: Resource-light Morpho-syntactic Analysis of Highly
ESSLLI 2010: Resource-light Morpho-syntactic Analysis of Highly Inflected Languages Classical Approaches to Tagging The slides are posted on the web. The url is http://chss.montclair.edu/~feldmana/esslli10/.
More informationDeveloping a TT-MCTAG for German with an RCG-based Parser
Developing a TT-MCTAG for German with an RCG-based Parser Laura Kallmeyer, Timm Lichte, Wolfgang Maier, Yannick Parmentier, Johannes Dellert University of Tübingen, Germany CNRS-LORIA, France LREC 2008,
More informationLessons from a Massive Open Online Course (MOOC) on Natural Language Processing for Digital Humanities
Lessons from a Massive Open Online Course (MOOC) on Natural Language Processing for Digital Humanities Simon Clematide, Isabel Meraner, Noah Bubenhofer, Martin Volk Institute of Computational Linguistics
More information11/29/2010. Statistical Parsing. Statistical Parsing. Simple PCFG for ATIS English. Syntactic Disambiguation
tatistical Parsing (Following slides are modified from Prof. Raymond Mooney s slides.) tatistical Parsing tatistical parsing uses a probabilistic model of syntax in order to assign probabilities to each
More informationThe Role of the Head in the Interpretation of English Deverbal Compounds
The Role of the Head in the Interpretation of English Deverbal Compounds Gianina Iordăchioaia i, Lonneke van der Plas ii, Glorianna Jagfeld i (Universität Stuttgart i, University of Malta ii ) Wen wurmt
More informationSpecifying a shallow grammatical for parsing purposes
Specifying a shallow grammatical for parsing purposes representation Atro Voutilainen and Timo J~irvinen Research Unit for Multilingual Language Technology P.O. Box 4 FIN-0004 University of Helsinki Finland
More informationBULATS A2 WORDLIST 2
BULATS A2 WORDLIST 2 INTRODUCTION TO THE BULATS A2 WORDLIST 2 The BULATS A2 WORDLIST 21 is a list of approximately 750 words to help candidates aiming at an A2 pass in the Cambridge BULATS exam. It is
More informationThe MSR-NRC-SRI MT System for NIST Open Machine Translation 2008 Evaluation
The MSR-NRC-SRI MT System for NIST Open Machine Translation 2008 Evaluation AUTHORS AND AFFILIATIONS MSR: Xiaodong He, Jianfeng Gao, Chris Quirk, Patrick Nguyen, Arul Menezes, Robert Moore, Kristina Toutanova,
More informationLeveraging Sentiment to Compute Word Similarity
Leveraging Sentiment to Compute Word Similarity Balamurali A.R., Subhabrata Mukherjee, Akshat Malu and Pushpak Bhattacharyya Dept. of Computer Science and Engineering, IIT Bombay 6th International Global
More informationThe taming of the data:
The taming of the data: Using text mining in building a corpus for diachronic analysis Stefania Degaetano-Ortlieb, Hannah Kermes, Ashraf Khamis, Jörg Knappen, Noam Ordan and Elke Teich Background Big data
More informationThe NICT Translation System for IWSLT 2012
The NICT Translation System for IWSLT 2012 Andrew Finch Ohnmar Htun Eiichiro Sumita Multilingual Translation Group MASTAR Project National Institute of Information and Communications Technology Kyoto,
More informationIndian Institute of Technology, Kanpur
Indian Institute of Technology, Kanpur Course Project - CS671A POS Tagging of Code Mixed Text Ayushman Sisodiya (12188) {ayushmn@iitk.ac.in} Donthu Vamsi Krishna (15111016) {vamsi@iitk.ac.in} Sandeep Kumar
More informationCROSS-LANGUAGE INFORMATION RETRIEVAL USING PARAFAC2
1 CROSS-LANGUAGE INFORMATION RETRIEVAL USING PARAFAC2 Peter A. Chew, Brett W. Bader, Ahmed Abdelali Proceedings of the 13 th SIGKDD, 2007 Tiago Luís Outline 2 Cross-Language IR (CLIR) Latent Semantic Analysis
More informationThe stages of event extraction
The stages of event extraction David Ahn Intelligent Systems Lab Amsterdam University of Amsterdam ahn@science.uva.nl Abstract Event detection and recognition is a complex task consisting of multiple sub-tasks
More informationPhenomena of gender attraction in Polish *
Chiara Finocchiaro and Anna Cielicka Phenomena of gender attraction in Polish * 1. Introduction The selection and use of grammatical features - such as gender and number - in producing sentences involve
More informationCS 598 Natural Language Processing
CS 598 Natural Language Processing Natural language is everywhere Natural language is everywhere Natural language is everywhere Natural language is everywhere!"#$%&'&()*+,-./012 34*5665756638/9:;< =>?@ABCDEFGHIJ5KL@
More informationApplications of memory-based natural language processing
Applications of memory-based natural language processing Antal van den Bosch and Roser Morante ILK Research Group Tilburg University Prague, June 24, 2007 Current ILK members Principal investigator: Antal
More information1 st Quarter (September, October, November) August/September Strand Topic Standard Notes Reading for Literature
1 st Grade Curriculum Map Common Core Standards Language Arts 2013 2014 1 st Quarter (September, October, November) August/September Strand Topic Standard Notes Reading for Literature Key Ideas and Details
More informationNotes on The Sciences of the Artificial Adapted from a shorter document written for course (Deciding What to Design) 1
Notes on The Sciences of the Artificial Adapted from a shorter document written for course 17-652 (Deciding What to Design) 1 Ali Almossawi December 29, 2005 1 Introduction The Sciences of the Artificial
More informationThe Smart/Empire TIPSTER IR System
The Smart/Empire TIPSTER IR System Chris Buckley, Janet Walz Sabir Research, Gaithersburg, MD chrisb,walz@sabir.com Claire Cardie, Scott Mardis, Mandar Mitra, David Pierce, Kiri Wagstaff Department of
More informationIntroduction. Beáta B. Megyesi. Uppsala University Department of Linguistics and Philology Introduction 1(48)
Introduction Beáta B. Megyesi Uppsala University Department of Linguistics and Philology beata.megyesi@lingfil.uu.se Introduction 1(48) Course content Credits: 7.5 ECTS Subject: Computational linguistics
More informationInteractive Corpus Annotation of Anaphor Using NLP Algorithms
Interactive Corpus Annotation of Anaphor Using NLP Algorithms Catherine Smith 1 and Matthew Brook O Donnell 1 1. Introduction Pronouns occur with a relatively high frequency in all forms English discourse.
More informationReview in ICAME Journal, Volume 38, 2014, DOI: /icame
Review in ICAME Journal, Volume 38, 2014, DOI: 10.2478/icame-2014-0012 Gaëtanelle Gilquin and Sylvie De Cock (eds.). Errors and disfluencies in spoken corpora. Amsterdam: John Benjamins. 2013. 172 pp.
More informationDevelopment of the First LRs for Macedonian: Current Projects
Development of the First LRs for Macedonian: Current Projects Ruska Ivanovska-Naskova Faculty of Philology- University St. Cyril and Methodius Bul. Krste Petkov Misirkov bb, 1000 Skopje, Macedonia rivanovska@flf.ukim.edu.mk
More informationSpecification and Evaluation of Machine Translation Toy Systems - Criteria for laboratory assignments
Specification and Evaluation of Machine Translation Toy Systems - Criteria for laboratory assignments Cristina Vertan, Walther v. Hahn University of Hamburg, Natural Language Systems Division Hamburg,
More informationGrammars & Parsing, Part 1:
Grammars & Parsing, Part 1: Rules, representations, and transformations- oh my! Sentence VP The teacher Verb gave the lecture 2015-02-12 CS 562/662: Natural Language Processing Game plan for today: Review
More informationDomain Adaptation in Statistical Machine Translation of User-Forum Data using Component-Level Mixture Modelling
Domain Adaptation in Statistical Machine Translation of User-Forum Data using Component-Level Mixture Modelling Pratyush Banerjee, Sudip Kumar Naskar, Johann Roturier 1, Andy Way 2, Josef van Genabith
More informationMemory-based grammatical error correction
Memory-based grammatical error correction Antal van den Bosch Peter Berck Radboud University Nijmegen Tilburg University P.O. Box 9103 P.O. Box 90153 NL-6500 HD Nijmegen, The Netherlands NL-5000 LE Tilburg,
More informationA High-Quality Web Corpus of Czech
A High-Quality Web Corpus of Czech Johanka Spoustová, Miroslav Spousta Institute of Formal and Applied Linguistics Faculty of Mathematics and Physics Charles University Prague, Czech Republic {johanka,spousta}@ufal.mff.cuni.cz
More informationDetection of Multiword Expressions for Hindi Language using Word Embeddings and WordNet-based Features
Detection of Multiword Expressions for Hindi Language using Word Embeddings and WordNet-based Features Dhirendra Singh Sudha Bhingardive Kevin Patel Pushpak Bhattacharyya Department of Computer Science
More informationhave to be modeled) or isolated words. Output of the system is a grapheme-tophoneme conversion system which takes as its input the spelling of words,
A Language-Independent, Data-Oriented Architecture for Grapheme-to-Phoneme Conversion Walter Daelemans and Antal van den Bosch Proceedings ESCA-IEEE speech synthesis conference, New York, September 1994
More informationA Minimalist Approach to Code-Switching. In the field of linguistics, the topic of bilingualism is a broad one. There are many
Schmidt 1 Eric Schmidt Prof. Suzanne Flynn Linguistic Study of Bilingualism December 13, 2013 A Minimalist Approach to Code-Switching In the field of linguistics, the topic of bilingualism is a broad one.
More informationProduct Feature-based Ratings foropinionsummarization of E-Commerce Feedback Comments
Product Feature-based Ratings foropinionsummarization of E-Commerce Feedback Comments Vijayshri Ramkrishna Ingale PG Student, Department of Computer Engineering JSPM s Imperial College of Engineering &
More informationAnnotation Projection for Discourse Connectives
SFB 833 / Univ. Tübingen Penn Discourse Treebank Workshop Annotation projection Basic idea: Given a bitext E/F and annotation for F, how would the annotation look for E? Examples: Word Sense Disambiguation
More informationENGBG1 ENGBL1 Campus Linguistics. Meeting 2. Chapter 7 (Morphology) and chapter 9 (Syntax) Pia Sundqvist
Meeting 2 Chapter 7 (Morphology) and chapter 9 (Syntax) Today s agenda Repetition of meeting 1 Mini-lecture on morphology Seminar on chapter 7, worksheet Mini-lecture on syntax Seminar on chapter 9, worksheet
More informationCitation for published version (APA): Veenstra, M. J. A. (1998). Formalizing the minimalist program Groningen: s.n.
University of Groningen Formalizing the minimalist program Veenstra, Mettina Jolanda Arnoldina IMPORTANT NOTE: You are advised to consult the publisher's version (publisher's PDF if you wish to cite from
More informationIntroduction to Text Mining
Prelude Overview Introduction to Text Mining Tutorial at EDBT 06 René Witte Faculty of Informatics Institute for Program Structures and Data Organization (IPD) Universität Karlsruhe, Germany http://rene-witte.net
More informationTHE ROLE OF DECISION TREES IN NATURAL LANGUAGE PROCESSING
SISOM & ACOUSTICS 2015, Bucharest 21-22 May THE ROLE OF DECISION TREES IN NATURAL LANGUAGE PROCESSING MarilenaăLAZ R 1, Diana MILITARU 2 1 Military Equipment and Technologies Research Agency, Bucharest,
More informationBasic Parsing with Context-Free Grammars. Some slides adapted from Julia Hirschberg and Dan Jurafsky 1
Basic Parsing with Context-Free Grammars Some slides adapted from Julia Hirschberg and Dan Jurafsky 1 Announcements HW 2 to go out today. Next Tuesday most important for background to assignment Sign up
More informationSyntax Parsing 1. Grammars and parsing 2. Top-down and bottom-up parsing 3. Chart parsers 4. Bottom-up chart parsing 5. The Earley Algorithm
Syntax Parsing 1. Grammars and parsing 2. Top-down and bottom-up parsing 3. Chart parsers 4. Bottom-up chart parsing 5. The Earley Algorithm syntax: from the Greek syntaxis, meaning setting out together
More informationA Computational Evaluation of Case-Assignment Algorithms
A Computational Evaluation of Case-Assignment Algorithms Miles Calabresi Advisors: Bob Frank and Jim Wood Submitted to the faculty of the Department of Linguistics in partial fulfillment of the requirements
More informationPOS tagging of Chinese Buddhist texts using Recurrent Neural Networks
POS tagging of Chinese Buddhist texts using Recurrent Neural Networks Longlu Qin Department of East Asian Languages and Cultures longlu@stanford.edu Abstract Chinese POS tagging, as one of the most important
More informationA Simple Surface Realization Engine for Telugu
A Simple Surface Realization Engine for Telugu Sasi Raja Sekhar Dokkara, Suresh Verma Penumathsa Dept. of Computer Science Adikavi Nannayya University, India dsairajasekhar@gmail.com,vermaps@yahoo.com
More informationContext Free Grammars. Many slides from Michael Collins
Context Free Grammars Many slides from Michael Collins Overview I An introduction to the parsing problem I Context free grammars I A brief(!) sketch of the syntax of English I Examples of ambiguous structures
More informationRe-evaluating the Role of Bleu in Machine Translation Research
Re-evaluating the Role of Bleu in Machine Translation Research Chris Callison-Burch Miles Osborne Philipp Koehn School on Informatics University of Edinburgh 2 Buccleuch Place Edinburgh, EH8 9LW callison-burch@ed.ac.uk
More informationDEVELOPMENT OF A MULTILINGUAL PARALLEL CORPUS AND A PART-OF-SPEECH TAGGER FOR AFRIKAANS
DEVELOPMENT OF A MULTILINGUAL PARALLEL CORPUS AND A PART-OF-SPEECH TAGGER FOR AFRIKAANS Julia Tmshkina Centre for Text Techitology, North-West University, 253 Potchefstroom, South Africa 2025770@puk.ac.za
More informationarxiv:cmp-lg/ v1 7 Jun 1997 Abstract
Comparing a Linguistic and a Stochastic Tagger Christer Samuelsson Lucent Technologies Bell Laboratories 600 Mountain Ave, Room 2D-339 Murray Hill, NJ 07974, USA christer@research.bell-labs.com Atro Voutilainen
More informationMachine Learning from Garden Path Sentences: The Application of Computational Linguistics
Machine Learning from Garden Path Sentences: The Application of Computational Linguistics http://dx.doi.org/10.3991/ijet.v9i6.4109 J.L. Du 1, P.F. Yu 1 and M.L. Li 2 1 Guangdong University of Foreign Studies,
More informationThe Interplay of Text Cohesion and L2 Reading Proficiency in Different Levels of Text Comprehension Among EFL Readers
The Interplay of Text Cohesion and L2 Reading Proficiency in Different Levels of Text Comprehension Among EFL Readers Masaya HOSODA Graduate School, University of Tsukuba / The Japan Society for the Promotion
More informationLQVSumm: A Corpus of Linguistic Quality Violations in Multi-Document Summarization
LQVSumm: A Corpus of Linguistic Quality Violations in Multi-Document Summarization Annemarie Friedrich, Marina Valeeva and Alexis Palmer COMPUTATIONAL LINGUISTICS & PHONETICS SAARLAND UNIVERSITY, GERMANY
More informationNamed Entity Recognition: A Survey for the Indian Languages
Named Entity Recognition: A Survey for the Indian Languages Padmaja Sharma Dept. of CSE Tezpur University Assam, India 784028 psharma@tezu.ernet.in Utpal Sharma Dept.of CSE Tezpur University Assam, India
More informationA process by any other name
January 05, 2016 Roger Tregear A process by any other name thoughts on the conflicted use of process language What s in a name? That which we call a rose By any other name would smell as sweet. William
More informationConstructing Parallel Corpus from Movie Subtitles
Constructing Parallel Corpus from Movie Subtitles Han Xiao 1 and Xiaojie Wang 2 1 School of Information Engineering, Beijing University of Post and Telecommunications artex.xh@gmail.com 2 CISTR, Beijing
More informationFlorida Reading Endorsement Alignment Matrix Competency 1
Florida Reading Endorsement Alignment Matrix Competency 1 Reading Endorsement Guiding Principle: Teachers will understand and teach reading as an ongoing strategic process resulting in students comprehending
More informationThe Pennsylvania State University. The Graduate School. College of the Liberal Arts THE TEACHABILITY HYPOTHESIS AND CONCEPT-BASED INSTRUCTION
The Pennsylvania State University The Graduate School College of the Liberal Arts THE TEACHABILITY HYPOTHESIS AND CONCEPT-BASED INSTRUCTION TOPICALIZATION IN CHINESE AS A SECOND LANGUAGE A Dissertation
More information