Machine Translation in Practice. Convertus AB. Anna Sågvall Hein 2017-

Size: px
Start display at page:

Download "Machine Translation in Practice. Convertus AB. Anna Sågvall Hein 2017-"

Transcription

1 Machine Translation in Practice Convertus AB Anna Sågvall Hein 2017-

2 Convertus AB A Swedish Language Technology company specialising in Machine Translation, MT, and automatic Language Checking, LC Founded in 2006 by a group of Computational Linguists at Uppsala University, headed by Professor Anna Sågvall Hein First LC reference installation - Scania Checker, 2000 at Scania CV AB First MT reference installation - the Syllabus Translator, 2007 at Uppsala University

3 Convertus business model Convertus develops and markets software and services in the Language Technology field. The core product is a complete solution to Machine Translation provided as a web-based service. A key product is BTS - a platform for running and managing machine Anna Sågvall translation Hein using different

4 The complete MT solution Preprocessing Translation memory Machine translation Automatic post-editing Manual post-editing

5 Preprocessing Spell checking Grammar checking Reformulation Compliance to controlled language

6 Convertus MT engines Rule-based translation engines, RBMT+ Statistical translation engines, SMT Combinations of RBMT and SMT

7 RBMT The oldest MT type Based on linguistic models Employs language resources Analyses the source language at some level

8 Convertus RBMT+ engines Multra sv-> en Deployed since 2007 Analysis: Handcrafted grammar, chart parser, UCP Fall-back Dictionary: Tagger Parsing: Partial parsing Generation: Language model Maltra fi->en Research prototype

9 Primary modules in Multra and Maltra ANALYSIS lexical and syntactic analysis parsing of source language segments into linguistic structures TRANSFER transfer translation of source language

10 Problems to be handled in MT Lexical ambiguity a word/token may represent different parts of speech, inflectional forms and meanings Variation/Translation ambiguity different words for the same meaning synonymy different ways of formulating a statement paraphrase Language differences lexicon, morphology, syntax

11 An example Sv. Under tiden stannade bilen. En. Meanwhile, the car stopped. Problems Lexical choice / under under, during, miracle Word order Punctuation

12 Multra solution Lexical choice / dictionary, transfer Word order / generation grammar Punctuation / generation grammar

13 Analysis structure

14 Transfer structure

15 Generation structure

16 Dictionary One-word-units and multiword units Hierarchical organisation of the dictionary Dictionary set-up specifies a hierarchy between parts of the dictionary (sub-dictionaries) All unique alternatives are presented to the parser in the preferred order

17 A dictionary hierarchy

18 Tagging Simple syntactic analysis based on N-gram processing filtering out the best alternatives provided by the dictionary (HunPos). då om då.ab-> then.ab/ då.sn- >because.sn om.sn->if.sn/ om.pp->about.pp Default alternatives are provided for words outside the dictionary.

19 Parsing Chart parsing using Uppsala Chart Processor, UCP. UCP is an in-house chart parser with a procedural grammar formalism. The grammar writer inserts active and passive edges in the chart thereby promoting the processing. Processing is non-deterministic, i.e. all combinations of active and passive edges are explored.

20 Ranking rules express priorities between competing analyses formulated in linguistic terms competing analyses are submitted to the transfer module in the order in which they were ranked

21 Transfer Transfer rules are applied to feature structures generated by the parser Rule-application is implemented as unification of feature structures. Transfer rules are expressed in a PATR-like formalism All applicable rules are applied

22 Transfer rule format LABEL name SOURCE path expression(s) TARGET path expression(s) TRANSFER?x1?x2

23 Types of transfer rules Copy a feature Delete a feature Transfer a feature structure Define a target feature structure

24 Copy a feature LABEL phr.cat SOURCE <* phr.cat> = x TARGET <* phr.cat> = x TRANSFER

25 Delete a feature LABEL gender SOURCE <* gender> = ANY TARGET <*> = <*> TRANSFER

26 Transfer a feature structure LABEL subj SOURCE <* subj> =?x1 TARGET <* subj> =?x2 TRANSFER?x1?x2

27 Define a target feature structure LABEL ta.bort-remove SOURCE <* lex sym>=ta.vb+bort.ab.1 <* word.cat>=verb TARGET <* lex>=remove.vb.1 <* word.cat>=verb

28 Generation The generation module operates on the preferred feature structure produced by the transfer component. The module is responsible for creating words from feature bundles and ordering them in the order specified by the English grammar. Generation rules are expressed in a PATR-like formalism. Rule application is based on unification and concatenation.

29 A generation rule % "The process includes reconditioning" LABEL subj-verb-obj.dir x1 ---> x2 x3 x4 : <x1 subj> = <x2> <x1 verb> = <x3> <x1 obj.dir> = <x4> <x3 numb> = <x2 numb> <x3 person> = <x2 person>

30 Language resources Dictionaries Swedish dictionary English dictionary Translation dictionary Tagging rules and correction rules Grammars

31 Fall-back Lexical Out Of Vocabulary units, OOVs Tagger creates default source word description Source word copied into target Parsing No complete parse Use partial parses Generation No complete generation

32 Partial parsing Select the best configuration of partial parses and translate them one by one. The best configuration of partial parses is assumed to be the one with the smallest number of partial parses. The selection uses a greedy search algorithm.

33 Aspects of RBMT Translations close to the source language Translation quality dependent on the quality and coverage of the language resources A labor-intensive task to build the language resources Fall-back mechanisms to account for gaps in the language resources

34 Aspects of SMT Pros Can be built fast on previous translations Idiomatic translations Cons Translation quality dependent on the quality and size of the training data No guaranteed translation relation to the source language Words may be lost Words may be inserted

35 Convertus applications The syllabus translator Multra, Maltra, SMT Technical translation for industrial clients Multra, SMT Gisting for in-house purposes SMT

36 How to run the MT service? Users Integrated in the user s normal work flow the education data bases of the universities The Syllabus Translator as a plug-in to Trados the translators usual way of working Independent service the BTS platform Developers Terminal for large-scale testing

37 Bologna Translation Service, BTS A translation platform developed in the Bologna Project ( Supports dynamic learning of MT See further

38

39 Manual post-editing A platform for manual post-editing of translation segments is provided edit save approve Approved translation segments are stored in the Translation Memory and re-used

40 Translation memory, TM TM comprises approved translation segments. TM grows as the service is used. Search in TM is the first option in the translation process.

41 Automatic post-editing Translation memory is searched for manual postedits at regular intervals. Post edits are reformulated as Automatic Postediting Rules, APEs. APEs are appended to the translation engines contributing to their quality.

42 Project DigInclude Facilitate access to digital information provided by Swedish authorities Convertus role is to provide translation services for immigrant languages Coordinator SICS EAST SWEDISH ICT AB

Syntax Parsing 1. Grammars and parsing 2. Top-down and bottom-up parsing 3. Chart parsers 4. Bottom-up chart parsing 5. The Earley Algorithm

Syntax Parsing 1. Grammars and parsing 2. Top-down and bottom-up parsing 3. Chart parsers 4. Bottom-up chart parsing 5. The Earley Algorithm Syntax Parsing 1. Grammars and parsing 2. Top-down and bottom-up parsing 3. Chart parsers 4. Bottom-up chart parsing 5. The Earley Algorithm syntax: from the Greek syntaxis, meaning setting out together

More information

Specification and Evaluation of Machine Translation Toy Systems - Criteria for laboratory assignments

Specification and Evaluation of Machine Translation Toy Systems - Criteria for laboratory assignments Specification and Evaluation of Machine Translation Toy Systems - Criteria for laboratory assignments Cristina Vertan, Walther v. Hahn University of Hamburg, Natural Language Systems Division Hamburg,

More information

Parsing of part-of-speech tagged Assamese Texts

Parsing of part-of-speech tagged Assamese Texts IJCSI International Journal of Computer Science Issues, Vol. 6, No. 1, 2009 ISSN (Online): 1694-0784 ISSN (Print): 1694-0814 28 Parsing of part-of-speech tagged Assamese Texts Mirzanur Rahman 1, Sufal

More information

The Internet as a Normative Corpus: Grammar Checking with a Search Engine

The Internet as a Normative Corpus: Grammar Checking with a Search Engine The Internet as a Normative Corpus: Grammar Checking with a Search Engine Jonas Sjöbergh KTH Nada SE-100 44 Stockholm, Sweden jsh@nada.kth.se Abstract In this paper some methods using the Internet as a

More information

Target Language Preposition Selection an Experiment with Transformation-Based Learning and Aligned Bilingual Data

Target Language Preposition Selection an Experiment with Transformation-Based Learning and Aligned Bilingual Data Target Language Preposition Selection an Experiment with Transformation-Based Learning and Aligned Bilingual Data Ebba Gustavii Department of Linguistics and Philology, Uppsala University, Sweden ebbag@stp.ling.uu.se

More information

National Literacy and Numeracy Framework for years 3/4

National Literacy and Numeracy Framework for years 3/4 1. Oracy National Literacy and Numeracy Framework for years 3/4 Speaking Listening Collaboration and discussion Year 3 - Explain information and ideas using relevant vocabulary - Organise what they say

More information

EdIt: A Broad-Coverage Grammar Checker Using Pattern Grammar

EdIt: A Broad-Coverage Grammar Checker Using Pattern Grammar EdIt: A Broad-Coverage Grammar Checker Using Pattern Grammar Chung-Chi Huang Mei-Hua Chen Shih-Ting Huang Jason S. Chang Institute of Information Systems and Applications, National Tsing Hua University,

More information

English Language and Applied Linguistics. Module Descriptions 2017/18

English Language and Applied Linguistics. Module Descriptions 2017/18 English Language and Applied Linguistics Module Descriptions 2017/18 Level I (i.e. 2 nd Yr.) Modules Please be aware that all modules are subject to availability. If you have any questions about the modules,

More information

Controlled vocabulary

Controlled vocabulary Indexing languages 6.2.2. Controlled vocabulary Overview Anyone who has struggled to find the exact search term to retrieve information about a certain subject can benefit from controlled vocabulary. Controlled

More information

Developing a TT-MCTAG for German with an RCG-based Parser

Developing a TT-MCTAG for German with an RCG-based Parser Developing a TT-MCTAG for German with an RCG-based Parser Laura Kallmeyer, Timm Lichte, Wolfgang Maier, Yannick Parmentier, Johannes Dellert University of Tübingen, Germany CNRS-LORIA, France LREC 2008,

More information

Derivational: Inflectional: In a fit of rage the soldiers attacked them both that week, but lost the fight.

Derivational: Inflectional: In a fit of rage the soldiers attacked them both that week, but lost the fight. Final Exam (120 points) Click on the yellow balloons below to see the answers I. Short Answer (32pts) 1. (6) The sentence The kinder teachers made sure that the students comprehended the testable material

More information

A Framework for Customizable Generation of Hypertext Presentations

A Framework for Customizable Generation of Hypertext Presentations A Framework for Customizable Generation of Hypertext Presentations Benoit Lavoie and Owen Rambow CoGenTex, Inc. 840 Hanshaw Road, Ithaca, NY 14850, USA benoit, owen~cogentex, com Abstract In this paper,

More information

Enhancing Unlexicalized Parsing Performance using a Wide Coverage Lexicon, Fuzzy Tag-set Mapping, and EM-HMM-based Lexical Probabilities

Enhancing Unlexicalized Parsing Performance using a Wide Coverage Lexicon, Fuzzy Tag-set Mapping, and EM-HMM-based Lexical Probabilities Enhancing Unlexicalized Parsing Performance using a Wide Coverage Lexicon, Fuzzy Tag-set Mapping, and EM-HMM-based Lexical Probabilities Yoav Goldberg Reut Tsarfaty Meni Adler Michael Elhadad Ben Gurion

More information

An Interactive Intelligent Language Tutor Over The Internet

An Interactive Intelligent Language Tutor Over The Internet An Interactive Intelligent Language Tutor Over The Internet Trude Heift Linguistics Department and Language Learning Centre Simon Fraser University, B.C. Canada V5A1S6 E-mail: heift@sfu.ca Abstract: This

More information

APA Basics. APA Formatting. Title Page. APA Sections. Title Page. Title Page

APA Basics. APA Formatting. Title Page. APA Sections. Title Page. Title Page APA Formatting APA Basics Abstract, Introduction & Formatting/Style Tips Psychology 280 Lecture Notes Basic word processing format Double spaced All margins 1 Manuscript page header on all pages except

More information

LING 329 : MORPHOLOGY

LING 329 : MORPHOLOGY LING 329 : MORPHOLOGY TTh 10:30 11:50 AM, Physics 121 Course Syllabus Spring 2013 Matt Pearson Office: Vollum 313 Email: pearsonm@reed.edu Phone: 7618 (off campus: 503-517-7618) Office hrs: Mon 1:30 2:30,

More information

Cross Language Information Retrieval

Cross Language Information Retrieval Cross Language Information Retrieval RAFFAELLA BERNARDI UNIVERSITÀ DEGLI STUDI DI TRENTO P.ZZA VENEZIA, ROOM: 2.05, E-MAIL: BERNARDI@DISI.UNITN.IT Contents 1 Acknowledgment.............................................

More information

Character Stream Parsing of Mixed-lingual Text

Character Stream Parsing of Mixed-lingual Text Character Stream Parsing of Mixed-lingual Text Harald Romsdorfer and Beat Pfister Speech Processing Group Computer Engineering and Networks Laboratory ETH Zurich {romsdorfer,pfister}@tik.ee.ethz.ch Abstract

More information

Project in the framework of the AIM-WEST project Annotation of MWEs for translation

Project in the framework of the AIM-WEST project Annotation of MWEs for translation Project in the framework of the AIM-WEST project Annotation of MWEs for translation 1 Agnès Tutin LIDILEM/LIG Université Grenoble Alpes 30 october 2014 Outline 2 Why annotate MWEs in corpora? A first experiment

More information

Candidates must achieve a grade of at least C2 level in each examination in order to achieve the overall qualification at C2 Level.

Candidates must achieve a grade of at least C2 level in each examination in order to achieve the overall qualification at C2 Level. The Test of Interactive English, C2 Level Qualification Structure The Test of Interactive English consists of two units: Unit Name English English Each Unit is assessed via a separate examination, set,

More information

Cross-Lingual Dependency Parsing with Universal Dependencies and Predicted PoS Labels

Cross-Lingual Dependency Parsing with Universal Dependencies and Predicted PoS Labels Cross-Lingual Dependency Parsing with Universal Dependencies and Predicted PoS Labels Jörg Tiedemann Uppsala University Department of Linguistics and Philology firstname.lastname@lingfil.uu.se Abstract

More information

Intel-powered Classmate PC. SMART Response* Training Foils. Version 2.0

Intel-powered Classmate PC. SMART Response* Training Foils. Version 2.0 Intel-powered Classmate PC Training Foils Version 2.0 1 Legal Information INFORMATION IN THIS DOCUMENT IS PROVIDED IN CONNECTION WITH INTEL PRODUCTS. NO LICENSE, EXPRESS OR IMPLIED, BY ESTOPPEL OR OTHERWISE,

More information

Exploiting Phrasal Lexica and Additional Morpho-syntactic Language Resources for Statistical Machine Translation with Scarce Training Data

Exploiting Phrasal Lexica and Additional Morpho-syntactic Language Resources for Statistical Machine Translation with Scarce Training Data Exploiting Phrasal Lexica and Additional Morpho-syntactic Language Resources for Statistical Machine Translation with Scarce Training Data Maja Popović and Hermann Ney Lehrstuhl für Informatik VI, Computer

More information

ARNE - A tool for Namend Entity Recognition from Arabic Text

ARNE - A tool for Namend Entity Recognition from Arabic Text 24 ARNE - A tool for Namend Entity Recognition from Arabic Text Carolin Shihadeh DFKI Stuhlsatzenhausweg 3 66123 Saarbrücken, Germany carolin.shihadeh@dfki.de Günter Neumann DFKI Stuhlsatzenhausweg 3 66123

More information

Evaluation of a Simultaneous Interpretation System and Analysis of Speech Log for User Experience Assessment

Evaluation of a Simultaneous Interpretation System and Analysis of Speech Log for User Experience Assessment Evaluation of a Simultaneous Interpretation System and Analysis of Speech Log for User Experience Assessment Akiko Sakamoto, Kazuhiko Abe, Kazuo Sumita and Satoshi Kamatani Knowledge Media Laboratory,

More information

MISSISSIPPI OCCUPATIONAL DIPLOMA EMPLOYMENT ENGLISH I: NINTH, TENTH, ELEVENTH AND TWELFTH GRADES

MISSISSIPPI OCCUPATIONAL DIPLOMA EMPLOYMENT ENGLISH I: NINTH, TENTH, ELEVENTH AND TWELFTH GRADES MISSISSIPPI OCCUPATIONAL DIPLOMA EMPLOYMENT ENGLISH I: NINTH, TENTH, ELEVENTH AND TWELFTH GRADES Students will: 1. Recognize main idea in written, oral, and visual formats. Examples: Stories, informational

More information

CLASSIFICATION OF PROGRAM Critical Elements Analysis 1. High Priority Items Phonemic Awareness Instruction

CLASSIFICATION OF PROGRAM Critical Elements Analysis 1. High Priority Items Phonemic Awareness Instruction CLASSIFICATION OF PROGRAM Critical Elements Analysis 1 Program Name: Macmillan/McGraw Hill Reading 2003 Date of Publication: 2003 Publisher: Macmillan/McGraw Hill Reviewer Code: 1. X The program meets

More information

Universiteit Leiden ICT in Business

Universiteit Leiden ICT in Business Universiteit Leiden ICT in Business Ranking of Multi-Word Terms Name: Ricardo R.M. Blikman Student-no: s1184164 Internal report number: 2012-11 Date: 07/03/2013 1st supervisor: Prof. Dr. J.N. Kok 2nd supervisor:

More information

PUBLIC CASE REPORT Use of the GeoGebra software at upper secondary school

PUBLIC CASE REPORT Use of the GeoGebra software at upper secondary school PUBLIC CASE REPORT Use of the GeoGebra software at upper secondary school Linked to the pedagogical activity: Use of the GeoGebra software at upper secondary school Written by: Philippe Leclère, Cyrille

More information

Applications of memory-based natural language processing

Applications of memory-based natural language processing Applications of memory-based natural language processing Antal van den Bosch and Roser Morante ILK Research Group Tilburg University Prague, June 24, 2007 Current ILK members Principal investigator: Antal

More information

Approaches to control phenomena handout Obligatory control and morphological case: Icelandic and Basque

Approaches to control phenomena handout Obligatory control and morphological case: Icelandic and Basque Approaches to control phenomena handout 6 5.4 Obligatory control and morphological case: Icelandic and Basque Icelandinc quirky case (displaying properties of both structural and inherent case: lexically

More information

Some Principles of Automated Natural Language Information Extraction

Some Principles of Automated Natural Language Information Extraction Some Principles of Automated Natural Language Information Extraction Gregers Koch Department of Computer Science, Copenhagen University DIKU, Universitetsparken 1, DK-2100 Copenhagen, Denmark Abstract

More information

Impact of Controlled Language on Translation Quality and Post-editing in a Statistical Machine Translation Environment

Impact of Controlled Language on Translation Quality and Post-editing in a Statistical Machine Translation Environment Impact of Controlled Language on Translation Quality and Post-editing in a Statistical Machine Translation Environment Takako Aikawa, Lee Schwartz, Ronit King Mo Corston-Oliver Carmen Lozano Microsoft

More information

SIE: Speech Enabled Interface for E-Learning

SIE: Speech Enabled Interface for E-Learning SIE: Speech Enabled Interface for E-Learning Shikha M.Tech Student Lovely Professional University, Phagwara, Punjab INDIA ABSTRACT In today s world, e-learning is very important and popular. E- learning

More information

A First-Pass Approach for Evaluating Machine Translation Systems

A First-Pass Approach for Evaluating Machine Translation Systems [Proceedings of the Evaluators Forum, April 21st 24th, 1991, Les Rasses, Vaud, Switzerland; ed. Kirsten Falkedal (Geneva: ISSCO).] A First-Pass Approach for Evaluating Machine Translation Systems Pamela

More information

PAGE(S) WHERE TAUGHT If sub mission ins not a book, cite appropriate location(s))

PAGE(S) WHERE TAUGHT If sub mission ins not a book, cite appropriate location(s)) Ohio Academic Content Standards Grade Level Indicators (Grade 11) A. ACQUISITION OF VOCABULARY Students acquire vocabulary through exposure to language-rich situations, such as reading books and other

More information

Derivational and Inflectional Morphemes in Pak-Pak Language

Derivational and Inflectional Morphemes in Pak-Pak Language Derivational and Inflectional Morphemes in Pak-Pak Language Agustina Situmorang and Tima Mariany Arifin ABSTRACT The objectives of this study are to find out the derivational and inflectional morphemes

More information

Update on Soar-based language processing

Update on Soar-based language processing Update on Soar-based language processing Deryle Lonsdale (and the rest of the BYU NL-Soar Research Group) BYU Linguistics lonz@byu.edu Soar 2006 1 NL-Soar Soar 2006 2 NL-Soar developments Discourse/robotic

More information

Parallel Evaluation in Stratal OT * Adam Baker University of Arizona

Parallel Evaluation in Stratal OT * Adam Baker University of Arizona Parallel Evaluation in Stratal OT * Adam Baker University of Arizona tabaker@u.arizona.edu 1.0. Introduction The model of Stratal OT presented by Kiparsky (forthcoming), has not and will not prove uncontroversial

More information

Ensemble Technique Utilization for Indonesian Dependency Parser

Ensemble Technique Utilization for Indonesian Dependency Parser Ensemble Technique Utilization for Indonesian Dependency Parser Arief Rahman Institut Teknologi Bandung Indonesia 23516008@std.stei.itb.ac.id Ayu Purwarianti Institut Teknologi Bandung Indonesia ayu@stei.itb.ac.id

More information

Specifying a shallow grammatical for parsing purposes

Specifying a shallow grammatical for parsing purposes Specifying a shallow grammatical for parsing purposes representation Atro Voutilainen and Timo J~irvinen Research Unit for Multilingual Language Technology P.O. Box 4 FIN-0004 University of Helsinki Finland

More information

Evaluation of Usage Patterns for Web-based Educational Systems using Web Mining

Evaluation of Usage Patterns for Web-based Educational Systems using Web Mining Evaluation of Usage Patterns for Web-based Educational Systems using Web Mining Dave Donnellan, School of Computer Applications Dublin City University Dublin 9 Ireland daviddonnellan@eircom.net Claus Pahl

More information

Evaluation of Usage Patterns for Web-based Educational Systems using Web Mining

Evaluation of Usage Patterns for Web-based Educational Systems using Web Mining Evaluation of Usage Patterns for Web-based Educational Systems using Web Mining Dave Donnellan, School of Computer Applications Dublin City University Dublin 9 Ireland daviddonnellan@eircom.net Claus Pahl

More information

A Grammar for Battle Management Language

A Grammar for Battle Management Language Bastian Haarmann 1 Dr. Ulrich Schade 1 Dr. Michael R. Hieb 2 1 Fraunhofer Institute for Communication, Information Processing and Ergonomics 2 George Mason University bastian.haarmann@fkie.fraunhofer.de

More information

METHODS FOR EXTRACTING AND CLASSIFYING PAIRS OF COGNATES AND FALSE FRIENDS

METHODS FOR EXTRACTING AND CLASSIFYING PAIRS OF COGNATES AND FALSE FRIENDS METHODS FOR EXTRACTING AND CLASSIFYING PAIRS OF COGNATES AND FALSE FRIENDS Ruslan Mitkov (R.Mitkov@wlv.ac.uk) University of Wolverhampton ViktorPekar (v.pekar@wlv.ac.uk) University of Wolverhampton Dimitar

More information

COMPUTATIONAL COMPLEXITY OF LEFT-ASSOCIATIVE GRAMMAR

COMPUTATIONAL COMPLEXITY OF LEFT-ASSOCIATIVE GRAMMAR COMPUTATIONAL COMPLEXITY OF LEFT-ASSOCIATIVE GRAMMAR ROLAND HAUSSER Institut für Deutsche Philologie Ludwig-Maximilians Universität München München, West Germany 1. CHOICE OF A PRIMITIVE OPERATION The

More information

Longman English Interactive

Longman English Interactive Longman English Interactive Level 3 Orientation Quick Start 2 Microphone for Speaking Activities 2 Course Navigation 3 Course Home Page 3 Course Overview 4 Course Outline 5 Navigating the Course Page 6

More information

Modeling full form lexica for Arabic

Modeling full form lexica for Arabic Modeling full form lexica for Arabic Susanne Alt Amine Akrout Atilf-CNRS Laurent Romary Loria-CNRS Objectives Presentation of the current standardization activity in the domain of lexical data modeling

More information

Language Model and Grammar Extraction Variation in Machine Translation

Language Model and Grammar Extraction Variation in Machine Translation Language Model and Grammar Extraction Variation in Machine Translation Vladimir Eidelman, Chris Dyer, and Philip Resnik UMIACS Laboratory for Computational Linguistics and Information Processing Department

More information

CS 598 Natural Language Processing

CS 598 Natural Language Processing CS 598 Natural Language Processing Natural language is everywhere Natural language is everywhere Natural language is everywhere Natural language is everywhere!"#$%&'&()*+,-./012 34*5665756638/9:;< =>?@ABCDEFGHIJ5KL@

More information

Towards a MWE-driven A* parsing with LTAGs [WG2,WG3]

Towards a MWE-driven A* parsing with LTAGs [WG2,WG3] Towards a MWE-driven A* parsing with LTAGs [WG2,WG3] Jakub Waszczuk, Agata Savary To cite this version: Jakub Waszczuk, Agata Savary. Towards a MWE-driven A* parsing with LTAGs [WG2,WG3]. PARSEME 6th general

More information

Linking Task: Identifying authors and book titles in verbose queries

Linking Task: Identifying authors and book titles in verbose queries Linking Task: Identifying authors and book titles in verbose queries Anaïs Ollagnier, Sébastien Fournier, and Patrice Bellot Aix-Marseille University, CNRS, ENSAM, University of Toulon, LSIS UMR 7296,

More information

Salli Kankaanpää, Riitta Korhonen & Ulla Onkamo. Tallinn,15 th September 2016

Salli Kankaanpää, Riitta Korhonen & Ulla Onkamo. Tallinn,15 th September 2016 Official language consultation services in Finland Salli Kankaanpää, Riitta Korhonen & Ulla Onkamo Tallinn,15 th September 2016 Institute for the Languages of Finland (1976 ) KOTUS (www.kotus.fi) Finnish

More information

Implementing a tool to Support KAOS-Beta Process Model Using EPF

Implementing a tool to Support KAOS-Beta Process Model Using EPF Implementing a tool to Support KAOS-Beta Process Model Using EPF Malihe Tabatabaie Malihe.Tabatabaie@cs.york.ac.uk Department of Computer Science The University of York United Kingdom Eclipse Process Framework

More information

(12) United States Patent Bernth et al.

(12) United States Patent Bernth et al. , (12) United States Patent Bernth et al. US006285978B1 (10) Patent N0.: (45) Date of Patent: Sep. 4, 2001 (54) SYSTEM AND METHOD FOR ESTIMATING ACCURACY OF AN AUTOMATIC NATURAL LANGUAGE TRANSLATION (75)

More information

Product Feature-based Ratings foropinionsummarization of E-Commerce Feedback Comments

Product Feature-based Ratings foropinionsummarization of E-Commerce Feedback Comments Product Feature-based Ratings foropinionsummarization of E-Commerce Feedback Comments Vijayshri Ramkrishna Ingale PG Student, Department of Computer Engineering JSPM s Imperial College of Engineering &

More information

The presence of interpretable but ungrammatical sentences corresponds to mismatches between interpretive and productive parsing.

The presence of interpretable but ungrammatical sentences corresponds to mismatches between interpretive and productive parsing. Lecture 4: OT Syntax Sources: Kager 1999, Section 8; Legendre et al. 1998; Grimshaw 1997; Barbosa et al. 1998, Introduction; Bresnan 1998; Fanselow et al. 1999; Gibson & Broihier 1998. OT is not a theory

More information

11/29/2010. Statistical Parsing. Statistical Parsing. Simple PCFG for ATIS English. Syntactic Disambiguation

11/29/2010. Statistical Parsing. Statistical Parsing. Simple PCFG for ATIS English. Syntactic Disambiguation tatistical Parsing (Following slides are modified from Prof. Raymond Mooney s slides.) tatistical Parsing tatistical parsing uses a probabilistic model of syntax in order to assign probabilities to each

More information

AQUA: An Ontology-Driven Question Answering System

AQUA: An Ontology-Driven Question Answering System AQUA: An Ontology-Driven Question Answering System Maria Vargas-Vera, Enrico Motta and John Domingue Knowledge Media Institute (KMI) The Open University, Walton Hall, Milton Keynes, MK7 6AA, United Kingdom.

More information

Chinese Language Parsing with Maximum-Entropy-Inspired Parser

Chinese Language Parsing with Maximum-Entropy-Inspired Parser Chinese Language Parsing with Maximum-Entropy-Inspired Parser Heng Lian Brown University Abstract The Chinese language has many special characteristics that make parsing difficult. The performance of state-of-the-art

More information

Software Maintenance

Software Maintenance 1 What is Software Maintenance? Software Maintenance is a very broad activity that includes error corrections, enhancements of capabilities, deletion of obsolete capabilities, and optimization. 2 Categories

More information

Constraining X-Bar: Theta Theory

Constraining X-Bar: Theta Theory Constraining X-Bar: Theta Theory Carnie, 2013, chapter 8 Kofi K. Saah 1 Learning objectives Distinguish between thematic relation and theta role. Identify the thematic relations agent, theme, goal, source,

More information

Executive summary (in English)

Executive summary (in English) Executive summary (in English) Project description The project "Open Educational Resources in institutional repositories has been carried out in collaboration between Göteborg university, University of

More information

! # %& ( ) ( + ) ( &, % &. / 0!!1 2/.&, 3 ( & 2/ &,

! # %& ( ) ( + ) ( &, % &. / 0!!1 2/.&, 3 ( & 2/ &, ! # %& ( ) ( + ) ( &, % &. / 0!!1 2/.&, 3 ( & 2/ &, 4 The Interaction of Knowledge Sources in Word Sense Disambiguation Mark Stevenson Yorick Wilks University of Shef eld University of Shef eld Word sense

More information

THE VERB ARGUMENT BROWSER

THE VERB ARGUMENT BROWSER THE VERB ARGUMENT BROWSER Bálint Sass sass.balint@itk.ppke.hu Péter Pázmány Catholic University, Budapest, Hungary 11 th International Conference on Text, Speech and Dialog 8-12 September 2008, Brno PREVIEW

More information

1/20 idea. We ll spend an extra hour on 1/21. based on assigned readings. so you ll be ready to discuss them in class

1/20 idea. We ll spend an extra hour on 1/21. based on assigned readings. so you ll be ready to discuss them in class If we cancel class 1/20 idea We ll spend an extra hour on 1/21 I ll give you a brief writing problem for 1/21 based on assigned readings Jot down your thoughts based on your reading so you ll be ready

More information

Context Free Grammars. Many slides from Michael Collins

Context Free Grammars. Many slides from Michael Collins Context Free Grammars Many slides from Michael Collins Overview I An introduction to the parsing problem I Context free grammars I A brief(!) sketch of the syntax of English I Examples of ambiguous structures

More information

Basic Parsing with Context-Free Grammars. Some slides adapted from Julia Hirschberg and Dan Jurafsky 1

Basic Parsing with Context-Free Grammars. Some slides adapted from Julia Hirschberg and Dan Jurafsky 1 Basic Parsing with Context-Free Grammars Some slides adapted from Julia Hirschberg and Dan Jurafsky 1 Announcements HW 2 to go out today. Next Tuesday most important for background to assignment Sign up

More information

A Domain Ontology Development Environment Using a MRD and Text Corpus

A Domain Ontology Development Environment Using a MRD and Text Corpus A Domain Ontology Development Environment Using a MRD and Text Corpus Naomi Nakaya 1 and Masaki Kurematsu 2 and Takahira Yamaguchi 1 1 Faculty of Information, Shizuoka University 3-5-1 Johoku Hamamatsu

More information

have to be modeled) or isolated words. Output of the system is a grapheme-tophoneme conversion system which takes as its input the spelling of words,

have to be modeled) or isolated words. Output of the system is a grapheme-tophoneme conversion system which takes as its input the spelling of words, A Language-Independent, Data-Oriented Architecture for Grapheme-to-Phoneme Conversion Walter Daelemans and Antal van den Bosch Proceedings ESCA-IEEE speech synthesis conference, New York, September 1994

More information

Constructing Parallel Corpus from Movie Subtitles

Constructing Parallel Corpus from Movie Subtitles Constructing Parallel Corpus from Movie Subtitles Han Xiao 1 and Xiaojie Wang 2 1 School of Information Engineering, Beijing University of Post and Telecommunications artex.xh@gmail.com 2 CISTR, Beijing

More information

The Verbmobil Semantic Database. Humboldt{Univ. zu Berlin. Computerlinguistik. Abstract

The Verbmobil Semantic Database. Humboldt{Univ. zu Berlin. Computerlinguistik. Abstract The Verbmobil Semantic Database Karsten L. Worm Univ. des Saarlandes Computerlinguistik Postfach 15 11 50 D{66041 Saarbrucken Germany worm@coli.uni-sb.de Johannes Heinecke Humboldt{Univ. zu Berlin Computerlinguistik

More information

Parsing with Treebank Grammars: Empirical Bounds, Theoretical Models, and the Structure of the Penn Treebank

Parsing with Treebank Grammars: Empirical Bounds, Theoretical Models, and the Structure of the Penn Treebank Parsing with Treebank Grammars: Empirical Bounds, Theoretical Models, and the Structure of the Penn Treebank Dan Klein and Christopher D. Manning Computer Science Department Stanford University Stanford,

More information

Evaluation of Learning Management System software. Part II of LMS Evaluation

Evaluation of Learning Management System software. Part II of LMS Evaluation Version DRAFT 1.0 Evaluation of Learning Management System software Author: Richard Wyles Date: 1 August 2003 Part II of LMS Evaluation Open Source e-learning Environment and Community Platform Project

More information

Noisy SMS Machine Translation in Low-Density Languages

Noisy SMS Machine Translation in Low-Density Languages Noisy SMS Machine Translation in Low-Density Languages Vladimir Eidelman, Kristy Hollingshead, and Philip Resnik UMIACS Laboratory for Computational Linguistics and Information Processing Department of

More information

Tavastia Way of Finnish Language Support during Vocational Studies. Tiina Alhainen Coordinator of Multicultural Issues Tavastia Education Consortium

Tavastia Way of Finnish Language Support during Vocational Studies. Tiina Alhainen Coordinator of Multicultural Issues Tavastia Education Consortium Tavastia Way of Finnish Language Support during Vocational Studies Tiina Alhainen Coordinator of Multicultural Issues Tavastia Education Consortium Background - Needs of New Practices in the Tavastia Operating

More information

Distant Supervised Relation Extraction with Wikipedia and Freebase

Distant Supervised Relation Extraction with Wikipedia and Freebase Distant Supervised Relation Extraction with Wikipedia and Freebase Marcel Ackermann TU Darmstadt ackermann@tk.informatik.tu-darmstadt.de Abstract In this paper we discuss a new approach to extract relational

More information

New Features & Functionality in Q Release Version 3.1 January 2016

New Features & Functionality in Q Release Version 3.1 January 2016 in Q Release Version 3.1 January 2016 Contents Release Highlights 2 New Features & Functionality 3 Multiple Applications 3 Analysis 3 Student Pulse 3 Attendance 4 Class Attendance 4 Student Attendance

More information

Program Matrix - Reading English 6-12 (DOE Code 398) University of Florida. Reading

Program Matrix - Reading English 6-12 (DOE Code 398) University of Florida. Reading Program Requirements Competency 1: Foundations of Instruction 60 In-service Hours Teachers will develop substantive understanding of six components of reading as a process: comprehension, oral language,

More information

What the National Curriculum requires in reading at Y5 and Y6

What the National Curriculum requires in reading at Y5 and Y6 What the National Curriculum requires in reading at Y5 and Y6 Word reading apply their growing knowledge of root words, prefixes and suffixes (morphology and etymology), as listed in Appendix 1 of the

More information

Formulaic Language and Fluency: ESL Teaching Applications

Formulaic Language and Fluency: ESL Teaching Applications Formulaic Language and Fluency: ESL Teaching Applications Formulaic Language Terminology Formulaic sequence One such item Formulaic language Non-count noun referring to these items Phraseology The study

More information

Introduction to Moodle

Introduction to Moodle Center for Excellence in Teaching and Learning Mr. Philip Daoud Introduction to Moodle Beginner s guide Center for Excellence in Teaching and Learning / Teaching Resource This manual is part of a serious

More information

Timeline. Recommendations

Timeline. Recommendations Introduction Advanced Placement Course Credit Alignment Recommendations In 2007, the State of Ohio Legislature passed legislation mandating the Board of Regents to recommend and the Chancellor to adopt

More information

Syntactic and Lexical Simplification: The Impact on EFL Listening Comprehension at Low and High Language Proficiency Levels

Syntactic and Lexical Simplification: The Impact on EFL Listening Comprehension at Low and High Language Proficiency Levels ISSN 1798-4769 Journal of Language Teaching and Research, Vol. 5, No. 3, pp. 566-571, May 2014 Manufactured in Finland. doi:10.4304/jltr.5.3.566-571 Syntactic and Lexical Simplification: The Impact on

More information

Web as Corpus. Corpus Linguistics. Web as Corpus 1 / 1. Corpus Linguistics. Web as Corpus. web.pl 3 / 1. Sketch Engine. Corpus Linguistics

Web as Corpus. Corpus Linguistics. Web as Corpus 1 / 1. Corpus Linguistics. Web as Corpus. web.pl 3 / 1. Sketch Engine. Corpus Linguistics (L615) Markus Dickinson Department of Linguistics, Indiana University Spring 2013 The web provides new opportunities for gathering data Viable source of disposable corpora, built ad hoc for specific purposes

More information

Myths, Legends, Fairytales and Novels (Writing a Letter)

Myths, Legends, Fairytales and Novels (Writing a Letter) Assessment Focus This task focuses on Communication through the mode of Writing at Levels 3, 4 and 5. Two linked tasks (Hot Seating and Character Study) that use the same context are available to assess

More information

A relational approach to translation

A relational approach to translation A relational approach to translation Rémi Zajac Project POLYGLOSS* University of Stuttgart IMS-CL /IfI-AIS, KeplerstraBe 17 7000 Stuttgart 1, West-Germany zajac@is.informatik.uni-stuttgart.dbp.de Abstract.

More information

SINGLE DOCUMENT AUTOMATIC TEXT SUMMARIZATION USING TERM FREQUENCY-INVERSE DOCUMENT FREQUENCY (TF-IDF)

SINGLE DOCUMENT AUTOMATIC TEXT SUMMARIZATION USING TERM FREQUENCY-INVERSE DOCUMENT FREQUENCY (TF-IDF) SINGLE DOCUMENT AUTOMATIC TEXT SUMMARIZATION USING TERM FREQUENCY-INVERSE DOCUMENT FREQUENCY (TF-IDF) Hans Christian 1 ; Mikhael Pramodana Agus 2 ; Derwin Suhartono 3 1,2,3 Computer Science Department,

More information

Curriculum for the Academy Profession Degree Programme in Energy Technology

Curriculum for the Academy Profession Degree Programme in Energy Technology Curriculum for the Academy Profession Degree Programme in Energy Technology Version: 2016 Curriculum for the Academy Profession Degree Programme in Energy Technology 2016 Addresses of the institutions

More information

SRI LANKA INSTITUTE OF ADVANCED TECHNOLOGICAL EDUCATION REVISED CURRICULUM HIGHER NATIONAL DIPLOMA IN ENGLISH. September 2010

SRI LANKA INSTITUTE OF ADVANCED TECHNOLOGICAL EDUCATION REVISED CURRICULUM HIGHER NATIONAL DIPLOMA IN ENGLISH. September 2010 SRI LANKA INSTITUTE OF ADVANCED TECHNOLOGICAL EDUCATION REVISED CURRICULUM HIGHER NATIONAL DIPLOMA IN ENGLISH September 2010 Produced by MG Consultants (Pvt) Ltd, 267, Pannipitiya Road, Pelawatte, Battaramulla

More information

POS tagging of Chinese Buddhist texts using Recurrent Neural Networks

POS tagging of Chinese Buddhist texts using Recurrent Neural Networks POS tagging of Chinese Buddhist texts using Recurrent Neural Networks Longlu Qin Department of East Asian Languages and Cultures longlu@stanford.edu Abstract Chinese POS tagging, as one of the most important

More information

Refining the Design of a Contracting Finite-State Dependency Parser

Refining the Design of a Contracting Finite-State Dependency Parser Refining the Design of a Contracting Finite-State Dependency Parser Anssi Yli-Jyrä and Jussi Piitulainen and Atro Voutilainen The Department of Modern Languages PO Box 3 00014 University of Helsinki {anssi.yli-jyra,jussi.piitulainen,atro.voutilainen}@helsinki.fi

More information

On-Line Data Analytics

On-Line Data Analytics International Journal of Computer Applications in Engineering Sciences [VOL I, ISSUE III, SEPTEMBER 2011] [ISSN: 2231-4946] On-Line Data Analytics Yugandhar Vemulapalli #, Devarapalli Raghu *, Raja Jacob

More information

Semi-supervised methods of text processing, and an application to medical concept extraction. Yacine Jernite Text-as-Data series September 17.

Semi-supervised methods of text processing, and an application to medical concept extraction. Yacine Jernite Text-as-Data series September 17. Semi-supervised methods of text processing, and an application to medical concept extraction Yacine Jernite Text-as-Data series September 17. 2015 What do we want from text? 1. Extract information 2. Link

More information

Intermediate Academic Writing

Intermediate Academic Writing Intermediate Academic Writing COURSE DESIGNATOR: MONT 3xxx NUMBER OF CREDITS: 3 LANGUAGE OF INSTRUCTION: French CONTACT HOURS: 45 COURSE DESCRIPTION This class is designed to introduce students to the

More information

THE ROLE OF DECISION TREES IN NATURAL LANGUAGE PROCESSING

THE ROLE OF DECISION TREES IN NATURAL LANGUAGE PROCESSING SISOM & ACOUSTICS 2015, Bucharest 21-22 May THE ROLE OF DECISION TREES IN NATURAL LANGUAGE PROCESSING MarilenaăLAZ R 1, Diana MILITARU 2 1 Military Equipment and Technologies Research Agency, Bucharest,

More information

Procedia - Social and Behavioral Sciences 154 ( 2014 )

Procedia - Social and Behavioral Sciences 154 ( 2014 ) Available online at www.sciencedirect.com ScienceDirect Procedia - Social and Behavioral Sciences 154 ( 2014 ) 263 267 THE XXV ANNUAL INTERNATIONAL ACADEMIC CONFERENCE, LANGUAGE AND CULTURE, 20-22 October

More information

Epping Elementary School Plan for Writing Instruction Fourth Grade

Epping Elementary School Plan for Writing Instruction Fourth Grade Epping Elementary School Plan for Writing Instruction Fourth Grade Unit of Study Learning Targets Common Core Standards LAUNCH: Becoming 4 th Grade Writers The Craft of the Reader s Response: Test Prep,

More information

M55205-Mastering Microsoft Project 2016

M55205-Mastering Microsoft Project 2016 M55205-Mastering Microsoft Project 2016 Course Number: M55205 Category: Desktop Applications Duration: 3 days Certification: Exam 70-343 Overview This three-day, instructor-led course is intended for individuals

More information

Detecting English-French Cognates Using Orthographic Edit Distance

Detecting English-French Cognates Using Orthographic Edit Distance Detecting English-French Cognates Using Orthographic Edit Distance Qiongkai Xu 1,2, Albert Chen 1, Chang i 1 1 The Australian National University, College of Engineering and Computer Science 2 National

More information