Improved Word Alignments for Statistical Machine Translation
|
|
- Kelley Atkins
- 5 years ago
- Views:
Transcription
1 Improved Word Alignments for Statistical Machine Translation Institut für Maschinelle Sprachverarbeitung Universität Stuttgart Universität Heidelberg
2 Outline 2 Intro to statistical machine translation (SMT) How to build an SMT system SMT terminology What are word alignments? Improving word alignments for SMT Evaluating quality New model New training algorithm
3 How to Build an SMT System 3 Start with a large parallel corpus Consists of document pairs (document and its translation) Sentence alignment: in each document pair automatically find those sentences which are translations of one another Results in sentence pairs (sentence and its translation) Word alignment: in each sentence pair automatically annotate those words which are translations of one another Results in word-aligned sentence pairs
4 How to Build an SMT System 4 Construct a function g which, given a sentence in the source language and a hypothesized translation into the target language, assigns a goodness score g(die Waschmaschine läuft, the washing machine is running) = high number g(die Waschmaschine läuft, the car drove) = low number
5 How to Build an SMT System 5 Implement a search algorithm which, given a source language sentence, finds the target language sentence which maximizes g Problem: exhaustively searching this space is intractable Need an auxiliary function h that returns an approximate goodness score for only a part of the target sentence Using h, gradually build the target sentence from left to right
6 Using the SMT System 6 To use our SMT system to translate a new, unseen sentence, call the search algorithm Returns its determination of the best target language sentence To see if your SMT system works well, do this for a large number of unseen sentences and evaluate the results
7 SMT Models 7 We wish to build a machine translation system which given a Foreign sentence f produces its English translation e We build a model of P( e f ), the probability of the sentence e given the sentence f To translate a Foreign text f, choose the English text e which maximizes P( e f )
8 8 Noisy Channel: Decomposing P(e f ) argmax P( e f ) = argmax P( f e ) P( e ) e e P( e ) is referred to as the language model P ( e ) can be modeled using standard models (N-grams, etc) Parameters of P ( e ) can be estimated using large amounts of monolingual text (English) P( f e ) is referred to as the translation model
9 SMT Terminology Parameterized Model: the form of the function g which is used to determine the goodness of a translation g(die Waschmaschine läuft, the washing machine is running) = P(e f) P(the washing machine is running die Waschmaschine läuft)= n(1 die) t(the die) n(2 Waschmaschine) t(washing Waschmaschine) t(machine Waschmaschine) n(2 läuft) t(is läuft) t(running läuft) l(the START) l(washing the) l(machine washing) l(is machine) l(running is) 9
10 SMT Terminology Parameters: lookup tables used in the function g P(the washing machine is running die Waschmaschine läuft)= n(1 die) t(the die) n(2 Waschmaschine) t(washing Waschmaschine) t(machine Waschmaschine) n(2 läuft) t(is läuft) t(running läuft) l(the START) l(washing the) l(machine washing) l(is machine) l(running is) x 0.1 x 0.5 x 0.8 x 0.7 x 0.1 x 0.1 x 0.1 x
11 SMT Terminology Parameters: lookup tables used in the function g P(the washing machine is running die Waschmaschine läuft)= n(1 die) t(the die) n(2 Waschmaschine) t(washing Waschmaschine) t(machine Waschmaschine) n(2 läuft) t(is läuft) t(running läuft) l(the START) l(washing the) l(machine washing) l(is machine) l(running is) x 0.1 x 0.5 x 0.8 x 0.7 x 0.1 x 0.1 x 0.1 x Change washing machine to car 0.1 x 0.1 x 0.1 x n( 1 Waschmaschine) t(car Waschmaschine) x 0.1 x 0.1 x 0.1 x also different
12 SMT Terminology 12 Training: automatically building the lookup tables used in g, using parallel sentences One way to determine t(the die) Generate a word alignment for each sentence pair Look through the word-aligned sentence pairs Count the number of times die is translated as the Divide by the number of times die is translated. If this is 10% of the time, we set t(the die) = 0.1
13 Evaluation Evaluation metric: method for assigning a numeric score to a set of hypothesized translations Automatic evaluation metrics often rely on comparison with previously completed human translations BLEU compares the 1,2,3,4-gram overlap with one to four human translations BLEU penalizes generating long strings BLEU works well for comparing two similar MT systems 13
14 SMT Last Words 14 Translating is usually referred to as decoding (Warren Weaver) SMT was invented by ASR (Automatic Speech Recognition) researchers. In ASR: P(e) = language model P(e f) = acoustic model However, SMT must deal with word reordering!
15 Word Alignments 15 Recall that we build translation models from word-aligned parallel sentences The statistics involved in state of the art SMT translation models are simple Just count translations in the word-aligned parallel sentences But what is a word alignment, and how do we obtain it?
16 Word alignment is annotation of minimal translational correspondences Annotated in the context in which they occur Not idealized translations! (solid blue lines annotated by a bilingual expert)
17 Word Alignments Mathematically, P(f e) = P(f, a e) An alignment represents one way f could be generated from e But for the models discussed today, we approximate! P(f e) = argmax P(f, a e) a a 17
18 Automatic word alignments are typically generated using a model called IBM Model 4 No linguistic knowledge No correct answers are supplied to the system unsupervised learning (red dashed line = automatically generated hypothesis)
19 Overview: Improving Word Alignment 19 Solving problems with: Measuring word alignment quality Modeling word alignments Knowledge-free training process
20 How to measure alignment quality? 20 If we want to compare word alignment algorithms, we can generate a word alignment with each algorithm Then build an SMT system from each alignment Compare performance of the SMT systems using BLEU But this is slow, building SMT systems can take days of computation Question: Can we have an automatic metric like BLEU, but for alignment? Answer: there are several metrics already defined, they involve comparison with gold standard alignments
21 Problem: Existing Metrics Do Not Track Translation Quality 21 - Dozens of papers at ACL, NAACL, HLT, COLING, WPT03, WPT05, etc, report word alignment quality increases using various metrics - Contradiction: few of these report translation results - Those that do report inconclusive gains - This is because the two commonly used metrics, Alignment Error Rate (AER) and balanced F- Measure, do not correlate with MT performance! - We will show that these metrics have low correlation with BLEU
22 Measuring Precision and Recall 22 Start by fully linking hypothesized alignments
23 Measuring Precision and Recall 23 Precision is percentage of links in hypothesis that are correct If we hypothesize there are no links, have 100% precision Recall is percentage of correct links we hypothesized If we hypothesize all possible links, have 100% recall We will test metrics which formally define and combine these in different ways
24 Evaluating Alignment Error Rate 24 Does the widely used Alignment Error Rate (AER) metric correlate with BLEU? Use our baseline unsupervised alignment system in combination with three symmetrization heuristics (union, refined, intersection) One of these is usually used to build MT systems Effect is having three very different alignment systems
25 Alignment Error Rate (AER) 25 Gold Precision( A, P) = P A A = 3 4 (e3,f4) wrong f1 f2 f3 f4 f5 e1 e2 e3 e4 Recall( A,S) = S A S = 2 3 (e2,f3) not in hyp Hypothesis AER( A,P,S) = 1 P A + S + S A A = 2 7 f1 f2 f3 f4 f5 e1 e2 e3 e4
26 Experiment Desideratum: Keep everything constant in a set of SMT systems except the word-level alignments Alignments should be realistic Experiment: Take a parallel corpus of 8M words of Foreign-English. Word-align it. Build SMT system. Report AER and Bleu. For better alignments: train on 16M, 32M, 64M words (but use only the 8M words for MT building). For worse alignments: train on 2 1/2, 4 1/4, 8 1/8 of the 8M word training corpus. If AER is a good indicator of MT performance, 1 AER and BLEU should correlate no matter how the alignments are built (union, intersection, refined) Low 1 AER scores should correspond to low BLEU scores High 1 AER scores should correspond to high BLEU scores 26
27 AER is not a good indicator of MT performance 27
28 28 AER is wrongly derived from F-Measure (can be shown analytically) For details see Squib in Comp. Ling. (Sept 2007) Important: AER incorrectly favors sparse alignments (many unlinked words).
29 F α -score 29 We will try a different evaluation metric called F α -score The alpha refers to a parameter tuned to favor either precision or recall
30 F α -score 30 Gold f1 f2 f3 f4 f5 e1 e2 e3 e4 Hypothesis f1 f2 f3 f4 f5 e1 e2 e3 e4 Precision( A, S) = Recall( A,S) = F( A,S, α ) = S A A S A S 1 α + Precision( A,S) = 3 4 = α Recall( A,S) Called F α -score to differentiate from ambiguous term F-Measure (e3,f4) wrong (e2,f3) (e3,f5) not in hyp
31 F α -score is a good indicator of MT performance 31 α = 0.4
32 32 We have a way to rapidly measure alignment quality for SMT We will now look at alignment modeling
33 Problem: Existing Models Have the Wrong Structure 33 Existing generative models make false assumptions about alignment structure Proposed discriminative models either: Depend on generative models for their best results Or make false assumptions about structure themselves
34 1-to-N Assumption 34 1-to-N assumption Multi-word cepts (words in one language translated as a unit) only allowed on target side. Source side limited to single word cepts. Forced to create M-to-N alignments using heuristics, e.g. union
35 LEAF Generative Story (all) 35
36 LEAF Generative Story (0) 36
37 LEAF Generative Story (1) 37 Explicitly model three word types: Head word: provide most of conditioning for translation Robust representation of multi-word cepts (for this task) This is to semantics as ``syntactic head word'' is to syntax Non-head word: attached to a head word Deleted source words and spurious target words
38 LEAF Generative Story (2) 38 Stochastically attach the non-head words to a head word (using distance and the non-head word class)
39 LEAF Generative Story (3) 39 Generate exactly one target head word from each source word
40 LEAF Generative Story (4) 40 Decide how big the target cepts will be (using the source head and whether the source cept is only one word)
41 LEAF Generative Story (5) 41 Decide the number of spurious words (use the number of non-spurious words)
42 LEAF Generative Story (6) 42 Generate the spurious words
43 LEAF Generative Story (7) 43 Generate the target non-head words in each cept, conditioned on the source head word and the target head word class
44 LEAF Generative Story (8) 44 For each cept, place the target head word and then non-head words (relative distortion model)
45 LEAF Generative Story (9) 45 Place the spurious words
46 LEAF Can score the same structure in both directions Math in one direction (please do not try to read):
47 Comparing LEAF with Model 4 47 Model 4 does not allow source cepts to be more than one word This requires us to use heuristics to account for multiple word constructions LEAF allows multiple word source cepts LEAF is able to use the head-word relationship to better model both the source cept and the target cept
48 Unsupervised Training with EM 48 Expectation Maximization (EM) Unsupervised learning Maximize the likelihood of the training data Likelihood is (informally) the probability the model assigns to the training data E-Step: predict according to current parameters M-Step: reestimate parameters from predictions Amazing but true: if we iterate E and M steps, we increase likelihood!
49 The EM Algorithm 49 Bootstrap Viterbi alignments Translation Model Initial parameters E-Step Refined parameters Viterbi alignments M-Step
50 Want to learn more about EM? 50 See K. Knight 1999 word alignment tutorial Available from
51 M-Step 51 M-Step: reestimate parameters Count events in the Viterbi Simple smoothing: add a small fractional constant Normalize to sum to 1 Bootstrap (initial M-step) See EMNLP 2007 paper for details
52 E-Step 52 E-Step: search for Viterbi alignments Solved using local hillclimbing search Given a starting alignment we can permute the alignment by making small changes such as swapping the incoming links for two words Algorithm: Begin: Given starting alignment, make list of possible small changes (e.g. list every possible swap of the incoming links for two words) for each possible small change Create new alignment A2 by copying A and applying small change If score(a2) > score(best) then best = A2 end for Choose best alignment as starting point, goto Begin: See ACL 2006 paper for improved local hillclimbing search
53 Discussion 53 LEAF has powerful features But requires approximate search Correct structure: M-to-N discontiguous First general purpose statistical word alignment model of this structure! Head word assumption allows use of multi-word cepts Gives power of phrase-based models, but decisions robustly decompose over words
54 The story so far 54 We know that better alignments (as measured using the F α -score) lead to better MT We have defined LEAF, a generative model which models M-to-N discontiguous alignments LEAF can be trained using approximate EM What about integrating new knowledge Light supervision (the correct alignments for a few sentence pairs) Linguistic knowledge?
55 Existing Approaches Can Not Utilize New Knowledge 55 Existing unsupervised alignment techniques can not use manually annotated data Could be useful for light supervision It is difficult to add new knowledge sources to generative models Requires completely reengineering the generative story for each new knowledge source
56 Semi-Supervised Training Overview 56 First decompose the steps of the LEAF generative story into sub-models of a (log-) linear model Allows us to tune vector λ which has a scalar for each submodel controlling its contribution The idea here is that we might trust, for instance, the translation distribution (one sub-model) more than the number of words in a cept distribution (another sub-model). This allows us to integrate new sub-models unrelated to LEAF and adjust their weights with respect to other submodels Then
57 Semi-Supervised Training Overview 57 Define a semi-supervised algorithm which alternates increasing likelihood with decreasing error Increasing likelihood similar to EM Discriminatively bias EM to converge to a local maxima of likelihood which corresponds to better alignments Better = higher F α -score on small gold standard corpus
58 The EMD Algorithm Initialize: Perform initial M-step : estimate sub-model parameters from the HMM Viterbi alignments (bootstrap) Perform initial D-step : Find λ values which maximize F α -score on the small gold standard word-aligned development corpus Repeat: E-Step : Find Viterbi alignments using sub-models weighted by λ M-Step : Re-estimate sub-model params from the new Viterbi alignments D-step : Find λ values that maximize F α -score on the small gold word-aligned development corpus 58
59 The EMD Algorithm 59 Bootstrap Initial sub-model parameters Tuned lambda vector E-Step Viterbi alignments Translation Model D-Step Viterbi alignments Sub-model parameters M-Step
60 Previous Work: Semi-Supervised Usual formulation of semi-supervised learning: using unlabeled data to help supervised learning Build supervised system using labeled data Predict on unlabeled data Iterate (estimating from both labeled data and predictions on unlabeled data) We do not have enough gold standard word alignments to estimate parameters directly! EMD allows us to train a small number of important parameters discriminatively, the rest using likelihood maximization, and allows interaction 60
61 Story so far 61 We ve now presented a new metric, a new model, and a new semi-supervised training algorithm We ve reformulated LEAF as a log-linear model and added additional sub-models We will train this model using the semi-supervised EMD training algorithm to maximize F α -score How well does this work?
62 Experiments 62 French/English LDC Hansard (67 M English words) 110 gold standard aligned sentences MT: Alignment Templates, phrase-based Arabic/English NIST 2006 task (168 M English words) 1000 gold standard aligned sentences MT: Hiero, hierarchical phrases
63 Results French/English Arabic/English 63 System F-Measure BLEU F-Measure BLEU (α = 0.4) (1 ref) (α = 0.1) (4 refs) IBM Model (GIZA++) and heuristics EMD (ACL model) and heuristics LEAF+EMD
64 Contributions 64 Found a metric for measuring alignment quality which correlates with MT quality Designed LEAF, the first generative model of M-to-N discontiguous alignments Developed a semi-supervised training algorithm, the EMD algorithm Obtained large gains of 1.2 BLEU and 2.8 BLEU points for French/English and Arabic/English tasks
65 65 Much of the presented work was joint work with Daniel Marcu ISI (Univ. Southern California) Thank You! Dankeschön!
Learning Structural Correspondences Across Different Linguistic Domains with Synchronous Neural Language Models
Learning Structural Correspondences Across Different Linguistic Domains with Synchronous Neural Language Models Stephan Gouws and GJ van Rooyen MIH Medialab, Stellenbosch University SOUTH AFRICA {stephan,gvrooyen}@ml.sun.ac.za
More informationarxiv: v1 [cs.cl] 2 Apr 2017
Word-Alignment-Based Segment-Level Machine Translation Evaluation using Word Embeddings Junki Matsuo and Mamoru Komachi Graduate School of System Design, Tokyo Metropolitan University, Japan matsuo-junki@ed.tmu.ac.jp,
More informationThe Karlsruhe Institute of Technology Translation Systems for the WMT 2011
The Karlsruhe Institute of Technology Translation Systems for the WMT 2011 Teresa Herrmann, Mohammed Mediani, Jan Niehues and Alex Waibel Karlsruhe Institute of Technology Karlsruhe, Germany firstname.lastname@kit.edu
More informationLanguage Model and Grammar Extraction Variation in Machine Translation
Language Model and Grammar Extraction Variation in Machine Translation Vladimir Eidelman, Chris Dyer, and Philip Resnik UMIACS Laboratory for Computational Linguistics and Information Processing Department
More informationLecture 1: Machine Learning Basics
1/69 Lecture 1: Machine Learning Basics Ali Harakeh University of Waterloo WAVE Lab ali.harakeh@uwaterloo.ca May 1, 2017 2/69 Overview 1 Learning Algorithms 2 Capacity, Overfitting, and Underfitting 3
More informationDomain Adaptation in Statistical Machine Translation of User-Forum Data using Component-Level Mixture Modelling
Domain Adaptation in Statistical Machine Translation of User-Forum Data using Component-Level Mixture Modelling Pratyush Banerjee, Sudip Kumar Naskar, Johann Roturier 1, Andy Way 2, Josef van Genabith
More informationPython Machine Learning
Python Machine Learning Unlock deeper insights into machine learning with this vital guide to cuttingedge predictive analytics Sebastian Raschka [ PUBLISHING 1 open source I community experience distilled
More informationGreedy Decoding for Statistical Machine Translation in Almost Linear Time
in: Proceedings of HLT-NAACL 23. Edmonton, Canada, May 27 June 1, 23. This version was produced on April 2, 23. Greedy Decoding for Statistical Machine Translation in Almost Linear Time Ulrich Germann
More informationUnsupervised Acoustic Model Training for Simultaneous Lecture Translation in Incremental and Batch Mode
Unsupervised Acoustic Model Training for Simultaneous Lecture Translation in Incremental and Batch Mode Diploma Thesis of Michael Heck At the Department of Informatics Karlsruhe Institute of Technology
More informationThe MSR-NRC-SRI MT System for NIST Open Machine Translation 2008 Evaluation
The MSR-NRC-SRI MT System for NIST Open Machine Translation 2008 Evaluation AUTHORS AND AFFILIATIONS MSR: Xiaodong He, Jianfeng Gao, Chris Quirk, Patrick Nguyen, Arul Menezes, Robert Moore, Kristina Toutanova,
More informationThe NICT Translation System for IWSLT 2012
The NICT Translation System for IWSLT 2012 Andrew Finch Ohnmar Htun Eiichiro Sumita Multilingual Translation Group MASTAR Project National Institute of Information and Communications Technology Kyoto,
More informationExtracting Opinion Expressions and Their Polarities Exploration of Pipelines and Joint Models
Extracting Opinion Expressions and Their Polarities Exploration of Pipelines and Joint Models Richard Johansson and Alessandro Moschitti DISI, University of Trento Via Sommarive 14, 38123 Trento (TN),
More informationSemi-supervised methods of text processing, and an application to medical concept extraction. Yacine Jernite Text-as-Data series September 17.
Semi-supervised methods of text processing, and an application to medical concept extraction Yacine Jernite Text-as-Data series September 17. 2015 What do we want from text? 1. Extract information 2. Link
More informationClickthrough-Based Translation Models for Web Search: from Word Models to Phrase Models
Clickthrough-Based Translation Models for Web Search: from Word Models to Phrase Models Jianfeng Gao Microsoft Research One Microsoft Way Redmond, WA 98052 USA jfgao@microsoft.com Xiaodong He Microsoft
More informationNoisy SMS Machine Translation in Low-Density Languages
Noisy SMS Machine Translation in Low-Density Languages Vladimir Eidelman, Kristy Hollingshead, and Philip Resnik UMIACS Laboratory for Computational Linguistics and Information Processing Department of
More informationThe Strong Minimalist Thesis and Bounded Optimality
The Strong Minimalist Thesis and Bounded Optimality DRAFT-IN-PROGRESS; SEND COMMENTS TO RICKL@UMICH.EDU Richard L. Lewis Department of Psychology University of Michigan 27 March 2010 1 Purpose of this
More informationImproved Reordering for Shallow-n Grammar based Hierarchical Phrase-based Translation
Improved Reordering for Shallow-n Grammar based Hierarchical Phrase-based Translation Baskaran Sankaran and Anoop Sarkar School of Computing Science Simon Fraser University Burnaby BC. Canada {baskaran,
More informationBAUM-WELCH TRAINING FOR SEGMENT-BASED SPEECH RECOGNITION. Han Shu, I. Lee Hetherington, and James Glass
BAUM-WELCH TRAINING FOR SEGMENT-BASED SPEECH RECOGNITION Han Shu, I. Lee Hetherington, and James Glass Computer Science and Artificial Intelligence Laboratory Massachusetts Institute of Technology Cambridge,
More informationSpeech Recognition at ICSI: Broadcast News and beyond
Speech Recognition at ICSI: Broadcast News and beyond Dan Ellis International Computer Science Institute, Berkeley CA Outline 1 2 3 The DARPA Broadcast News task Aspects of ICSI
More informationProbabilistic Latent Semantic Analysis
Probabilistic Latent Semantic Analysis Thomas Hofmann Presentation by Ioannis Pavlopoulos & Andreas Damianou for the course of Data Mining & Exploration 1 Outline Latent Semantic Analysis o Need o Overview
More informationLikelihood-Maximizing Beamforming for Robust Hands-Free Speech Recognition
MITSUBISHI ELECTRIC RESEARCH LABORATORIES http://www.merl.com Likelihood-Maximizing Beamforming for Robust Hands-Free Speech Recognition Seltzer, M.L.; Raj, B.; Stern, R.M. TR2004-088 December 2004 Abstract
More informationLearning Methods in Multilingual Speech Recognition
Learning Methods in Multilingual Speech Recognition Hui Lin Department of Electrical Engineering University of Washington Seattle, WA 98125 linhui@u.washington.edu Li Deng, Jasha Droppo, Dong Yu, and Alex
More informationUsing dialogue context to improve parsing performance in dialogue systems
Using dialogue context to improve parsing performance in dialogue systems Ivan Meza-Ruiz and Oliver Lemon School of Informatics, Edinburgh University 2 Buccleuch Place, Edinburgh I.V.Meza-Ruiz@sms.ed.ac.uk,
More informationRe-evaluating the Role of Bleu in Machine Translation Research
Re-evaluating the Role of Bleu in Machine Translation Research Chris Callison-Burch Miles Osborne Philipp Koehn School on Informatics University of Edinburgh 2 Buccleuch Place Edinburgh, EH8 9LW callison-burch@ed.ac.uk
More informationCross Language Information Retrieval
Cross Language Information Retrieval RAFFAELLA BERNARDI UNIVERSITÀ DEGLI STUDI DI TRENTO P.ZZA VENEZIA, ROOM: 2.05, E-MAIL: BERNARDI@DISI.UNITN.IT Contents 1 Acknowledgment.............................................
More informationIterative Cross-Training: An Algorithm for Learning from Unlabeled Web Pages
Iterative Cross-Training: An Algorithm for Learning from Unlabeled Web Pages Nuanwan Soonthornphisaj 1 and Boonserm Kijsirikul 2 Machine Intelligence and Knowledge Discovery Laboratory Department of Computer
More informationChinese Language Parsing with Maximum-Entropy-Inspired Parser
Chinese Language Parsing with Maximum-Entropy-Inspired Parser Heng Lian Brown University Abstract The Chinese language has many special characteristics that make parsing difficult. The performance of state-of-the-art
More informationRule Learning With Negation: Issues Regarding Effectiveness
Rule Learning With Negation: Issues Regarding Effectiveness S. Chua, F. Coenen, G. Malcolm University of Liverpool Department of Computer Science, Ashton Building, Ashton Street, L69 3BX Liverpool, United
More informationCross-Lingual Dependency Parsing with Universal Dependencies and Predicted PoS Labels
Cross-Lingual Dependency Parsing with Universal Dependencies and Predicted PoS Labels Jörg Tiedemann Uppsala University Department of Linguistics and Philology firstname.lastname@lingfil.uu.se Abstract
More informationA New Perspective on Combining GMM and DNN Frameworks for Speaker Adaptation
A New Perspective on Combining GMM and DNN Frameworks for Speaker Adaptation SLSP-2016 October 11-12 Natalia Tomashenko 1,2,3 natalia.tomashenko@univ-lemans.fr Yuri Khokhlov 3 khokhlov@speechpro.com Yannick
More informationLecture 10: Reinforcement Learning
Lecture 1: Reinforcement Learning Cognitive Systems II - Machine Learning SS 25 Part III: Learning Programs and Strategies Q Learning, Dynamic Programming Lecture 1: Reinforcement Learning p. Motivation
More informationLab 1 - The Scientific Method
Lab 1 - The Scientific Method As Biologists we are interested in learning more about life. Through observations of the living world we often develop questions about various phenomena occurring around us.
More informationLinking Task: Identifying authors and book titles in verbose queries
Linking Task: Identifying authors and book titles in verbose queries Anaïs Ollagnier, Sébastien Fournier, and Patrice Bellot Aix-Marseille University, CNRS, ENSAM, University of Toulon, LSIS UMR 7296,
More informationBYLINE [Heng Ji, Computer Science Department, New York University,
INFORMATION EXTRACTION BYLINE [Heng Ji, Computer Science Department, New York University, hengji@cs.nyu.edu] SYNONYMS NONE DEFINITION Information Extraction (IE) is a task of extracting pre-specified types
More informationA Case Study: News Classification Based on Term Frequency
A Case Study: News Classification Based on Term Frequency Petr Kroha Faculty of Computer Science University of Technology 09107 Chemnitz Germany kroha@informatik.tu-chemnitz.de Ricardo Baeza-Yates Center
More informationDEVELOPMENT OF A MULTILINGUAL PARALLEL CORPUS AND A PART-OF-SPEECH TAGGER FOR AFRIKAANS
DEVELOPMENT OF A MULTILINGUAL PARALLEL CORPUS AND A PART-OF-SPEECH TAGGER FOR AFRIKAANS Julia Tmshkina Centre for Text Techitology, North-West University, 253 Potchefstroom, South Africa 2025770@puk.ac.za
More informationLEARNING A SEMANTIC PARSER FROM SPOKEN UTTERANCES. Judith Gaspers and Philipp Cimiano
LEARNING A SEMANTIC PARSER FROM SPOKEN UTTERANCES Judith Gaspers and Philipp Cimiano Semantic Computing Group, CITEC, Bielefeld University {jgaspers cimiano}@cit-ec.uni-bielefeld.de ABSTRACT Semantic parsers
More informationEvaluation of a Simultaneous Interpretation System and Analysis of Speech Log for User Experience Assessment
Evaluation of a Simultaneous Interpretation System and Analysis of Speech Log for User Experience Assessment Akiko Sakamoto, Kazuhiko Abe, Kazuo Sumita and Satoshi Kamatani Knowledge Media Laboratory,
More informationTraining and evaluation of POS taggers on the French MULTITAG corpus
Training and evaluation of POS taggers on the French MULTITAG corpus A. Allauzen, H. Bonneau-Maynard LIMSI/CNRS; Univ Paris-Sud, Orsay, F-91405 {allauzen,maynard}@limsi.fr Abstract The explicit introduction
More informationModule 12. Machine Learning. Version 2 CSE IIT, Kharagpur
Module 12 Machine Learning 12.1 Instructional Objective The students should understand the concept of learning systems Students should learn about different aspects of a learning system Students should
More informationOnline Updating of Word Representations for Part-of-Speech Tagging
Online Updating of Word Representations for Part-of-Speech Tagging Wenpeng Yin LMU Munich wenpeng@cis.lmu.de Tobias Schnabel Cornell University tbs49@cornell.edu Hinrich Schütze LMU Munich inquiries@cislmu.org
More informationDetecting English-French Cognates Using Orthographic Edit Distance
Detecting English-French Cognates Using Orthographic Edit Distance Qiongkai Xu 1,2, Albert Chen 1, Chang i 1 1 The Australian National University, College of Engineering and Computer Science 2 National
More informationTHE ROLE OF DECISION TREES IN NATURAL LANGUAGE PROCESSING
SISOM & ACOUSTICS 2015, Bucharest 21-22 May THE ROLE OF DECISION TREES IN NATURAL LANGUAGE PROCESSING MarilenaăLAZ R 1, Diana MILITARU 2 1 Military Equipment and Technologies Research Agency, Bucharest,
More informationOn the Combined Behavior of Autonomous Resource Management Agents
On the Combined Behavior of Autonomous Resource Management Agents Siri Fagernes 1 and Alva L. Couch 2 1 Faculty of Engineering Oslo University College Oslo, Norway siri.fagernes@iu.hio.no 2 Computer Science
More informationProof Theory for Syntacticians
Department of Linguistics Ohio State University Syntax 2 (Linguistics 602.02) January 5, 2012 Logics for Linguistics Many different kinds of logic are directly applicable to formalizing theories in syntax
More information(Sub)Gradient Descent
(Sub)Gradient Descent CMSC 422 MARINE CARPUAT marine@cs.umd.edu Figures credit: Piyush Rai Logistics Midterm is on Thursday 3/24 during class time closed book/internet/etc, one page of notes. will include
More informationBootstrapping and Evaluating Named Entity Recognition in the Biomedical Domain
Bootstrapping and Evaluating Named Entity Recognition in the Biomedical Domain Andreas Vlachos Computer Laboratory University of Cambridge Cambridge, CB3 0FD, UK av308@cl.cam.ac.uk Caroline Gasperin Computer
More informationAnnotation Projection for Discourse Connectives
SFB 833 / Univ. Tübingen Penn Discourse Treebank Workshop Annotation projection Basic idea: Given a bitext E/F and annotation for F, how would the annotation look for E? Examples: Word Sense Disambiguation
More informationTarget Language Preposition Selection an Experiment with Transformation-Based Learning and Aligned Bilingual Data
Target Language Preposition Selection an Experiment with Transformation-Based Learning and Aligned Bilingual Data Ebba Gustavii Department of Linguistics and Philology, Uppsala University, Sweden ebbag@stp.ling.uu.se
More informationBridging Lexical Gaps between Queries and Questions on Large Online Q&A Collections with Compact Translation Models
Bridging Lexical Gaps between Queries and Questions on Large Online Q&A Collections with Compact Translation Models Jung-Tae Lee and Sang-Bum Kim and Young-In Song and Hae-Chang Rim Dept. of Computer &
More information11/29/2010. Statistical Parsing. Statistical Parsing. Simple PCFG for ATIS English. Syntactic Disambiguation
tatistical Parsing (Following slides are modified from Prof. Raymond Mooney s slides.) tatistical Parsing tatistical parsing uses a probabilistic model of syntax in order to assign probabilities to each
More informationA heuristic framework for pivot-based bilingual dictionary induction
2013 International Conference on Culture and Computing A heuristic framework for pivot-based bilingual dictionary induction Mairidan Wushouer, Toru Ishida, Donghui Lin Department of Social Informatics,
More informationRegression for Sentence-Level MT Evaluation with Pseudo References
Regression for Sentence-Level MT Evaluation with Pseudo References Joshua S. Albrecht and Rebecca Hwa Department of Computer Science University of Pittsburgh {jsa8,hwa}@cs.pitt.edu Abstract Many automatic
More informationConstructing Parallel Corpus from Movie Subtitles
Constructing Parallel Corpus from Movie Subtitles Han Xiao 1 and Xiaojie Wang 2 1 School of Information Engineering, Beijing University of Post and Telecommunications artex.xh@gmail.com 2 CISTR, Beijing
More informationLanguage Acquisition Fall 2010/Winter Lexical Categories. Afra Alishahi, Heiner Drenhaus
Language Acquisition Fall 2010/Winter 2011 Lexical Categories Afra Alishahi, Heiner Drenhaus Computational Linguistics and Phonetics Saarland University Children s Sensitivity to Lexical Categories Look,
More informationWhat is a Mental Model?
Mental Models for Program Understanding Dr. Jonathan I. Maletic Computer Science Department Kent State University What is a Mental Model? Internal (mental) representation of a real system s behavior,
More informationAssignment 1: Predicting Amazon Review Ratings
Assignment 1: Predicting Amazon Review Ratings 1 Dataset Analysis Richard Park r2park@acsmail.ucsd.edu February 23, 2015 The dataset selected for this assignment comes from the set of Amazon reviews for
More informationSyntactic Patterns versus Word Alignment: Extracting Opinion Targets from Online Reviews
Syntactic Patterns versus Word Alignment: Extracting Opinion Targets from Online Reviews Kang Liu, Liheng Xu and Jun Zhao National Laboratory of Pattern Recognition Institute of Automation, Chinese Academy
More informationLearning to Schedule Straight-Line Code
Learning to Schedule Straight-Line Code Eliot Moss, Paul Utgoff, John Cavazos Doina Precup, Darko Stefanović Dept. of Comp. Sci., Univ. of Mass. Amherst, MA 01003 Carla Brodley, David Scheeff Sch. of Elec.
More informationAGS THE GREAT REVIEW GAME FOR PRE-ALGEBRA (CD) CORRELATED TO CALIFORNIA CONTENT STANDARDS
AGS THE GREAT REVIEW GAME FOR PRE-ALGEBRA (CD) CORRELATED TO CALIFORNIA CONTENT STANDARDS 1 CALIFORNIA CONTENT STANDARDS: Chapter 1 ALGEBRA AND WHOLE NUMBERS Algebra and Functions 1.4 Students use algebraic
More informationRule Learning with Negation: Issues Regarding Effectiveness
Rule Learning with Negation: Issues Regarding Effectiveness Stephanie Chua, Frans Coenen, and Grant Malcolm University of Liverpool Department of Computer Science, Ashton Building, Ashton Street, L69 3BX
More informationNCU IISR English-Korean and English-Chinese Named Entity Transliteration Using Different Grapheme Segmentation Approaches
NCU IISR English-Korean and English-Chinese Named Entity Transliteration Using Different Grapheme Segmentation Approaches Yu-Chun Wang Chun-Kai Wu Richard Tzong-Han Tsai Department of Computer Science
More informationGenerative models and adversarial training
Day 4 Lecture 1 Generative models and adversarial training Kevin McGuinness kevin.mcguinness@dcu.ie Research Fellow Insight Centre for Data Analytics Dublin City University What is a generative model?
More information2/15/13. POS Tagging Problem. Part-of-Speech Tagging. Example English Part-of-Speech Tagsets. More Details of the Problem. Typical Problem Cases
POS Tagging Problem Part-of-Speech Tagging L545 Spring 203 Given a sentence W Wn and a tagset of lexical categories, find the most likely tag T..Tn for each word in the sentence Example Secretariat/P is/vbz
More informationEnd-of-Module Assessment Task
Student Name Date 1 Date 2 Date 3 Topic E: Decompositions of 9 and 10 into Number Pairs Topic E Rubric Score: Time Elapsed: Topic F Topic G Topic H Materials: (S) Personal white board, number bond mat,
More informationThe stages of event extraction
The stages of event extraction David Ahn Intelligent Systems Lab Amsterdam University of Amsterdam ahn@science.uva.nl Abstract Event detection and recognition is a complex task consisting of multiple sub-tasks
More informationOCR for Arabic using SIFT Descriptors With Online Failure Prediction
OCR for Arabic using SIFT Descriptors With Online Failure Prediction Andrey Stolyarenko, Nachum Dershowitz The Blavatnik School of Computer Science Tel Aviv University Tel Aviv, Israel Email: stloyare@tau.ac.il,
More informationMultilingual Sentiment and Subjectivity Analysis
Multilingual Sentiment and Subjectivity Analysis Carmen Banea and Rada Mihalcea Department of Computer Science University of North Texas rada@cs.unt.edu, carmen.banea@gmail.com Janyce Wiebe Department
More informationSpecification and Evaluation of Machine Translation Toy Systems - Criteria for laboratory assignments
Specification and Evaluation of Machine Translation Toy Systems - Criteria for laboratory assignments Cristina Vertan, Walther v. Hahn University of Hamburg, Natural Language Systems Division Hamburg,
More informationPrediction of Maximal Projection for Semantic Role Labeling
Prediction of Maximal Projection for Semantic Role Labeling Weiwei Sun, Zhifang Sui Institute of Computational Linguistics Peking University Beijing, 100871, China {ws, szf}@pku.edu.cn Haifeng Wang Toshiba
More informationCalibration of Confidence Measures in Speech Recognition
Submitted to IEEE Trans on Audio, Speech, and Language, July 2010 1 Calibration of Confidence Measures in Speech Recognition Dong Yu, Senior Member, IEEE, Jinyu Li, Member, IEEE, Li Deng, Fellow, IEEE
More informationLecture 2: Quantifiers and Approximation
Lecture 2: Quantifiers and Approximation Case study: Most vs More than half Jakub Szymanik Outline Number Sense Approximate Number Sense Approximating most Superlative Meaning of most What About Counting?
More informationExploiting Phrasal Lexica and Additional Morpho-syntactic Language Resources for Statistical Machine Translation with Scarce Training Data
Exploiting Phrasal Lexica and Additional Morpho-syntactic Language Resources for Statistical Machine Translation with Scarce Training Data Maja Popović and Hermann Ney Lehrstuhl für Informatik VI, Computer
More informationInteligencia Artificial. Revista Iberoamericana de Inteligencia Artificial ISSN:
Inteligencia Artificial. Revista Iberoamericana de Inteligencia Artificial ISSN: 1137-3601 revista@aepia.org Asociación Española para la Inteligencia Artificial España Lucena, Diego Jesus de; Bastos Pereira,
More informationModeling Attachment Decisions with a Probabilistic Parser: The Case of Head Final Structures
Modeling Attachment Decisions with a Probabilistic Parser: The Case of Head Final Structures Ulrike Baldewein (ulrike@coli.uni-sb.de) Computational Psycholinguistics, Saarland University D-66041 Saarbrücken,
More informationData Integration through Clustering and Finding Statistical Relations - Validation of Approach
Data Integration through Clustering and Finding Statistical Relations - Validation of Approach Marek Jaszuk, Teresa Mroczek, and Barbara Fryc University of Information Technology and Management, ul. Sucharskiego
More informationEdIt: A Broad-Coverage Grammar Checker Using Pattern Grammar
EdIt: A Broad-Coverage Grammar Checker Using Pattern Grammar Chung-Chi Huang Mei-Hua Chen Shih-Ting Huang Jason S. Chang Institute of Information Systems and Applications, National Tsing Hua University,
More informationarxiv:cmp-lg/ v1 22 Aug 1994
arxiv:cmp-lg/94080v 22 Aug 994 DISTRIBUTIONAL CLUSTERING OF ENGLISH WORDS Fernando Pereira AT&T Bell Laboratories 600 Mountain Ave. Murray Hill, NJ 07974 pereira@research.att.com Abstract We describe and
More informationCombining Bidirectional Translation and Synonymy for Cross-Language Information Retrieval
Combining Bidirectional Translation and Synonymy for Cross-Language Information Retrieval Jianqiang Wang and Douglas W. Oard College of Information Studies and UMIACS University of Maryland, College Park,
More informationLearning Optimal Dialogue Strategies: A Case Study of a Spoken Dialogue Agent for
Learning Optimal Dialogue Strategies: A Case Study of a Spoken Dialogue Agent for Email Marilyn A. Walker Jeanne C. Fromer Shrikanth Narayanan walker@research.att.com jeannie@ai.mit.edu shri@research.att.com
More informationMETHODS FOR EXTRACTING AND CLASSIFYING PAIRS OF COGNATES AND FALSE FRIENDS
METHODS FOR EXTRACTING AND CLASSIFYING PAIRS OF COGNATES AND FALSE FRIENDS Ruslan Mitkov (R.Mitkov@wlv.ac.uk) University of Wolverhampton ViktorPekar (v.pekar@wlv.ac.uk) University of Wolverhampton Dimitar
More informationUsing computational modeling in language acquisition research
Chapter 8 Using computational modeling in language acquisition research Lisa Pearl 1. Introduction Language acquisition research is often concerned with questions of what, when, and how what children know,
More informationWHEN THERE IS A mismatch between the acoustic
808 IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 14, NO. 3, MAY 2006 Optimization of Temporal Filters for Constructing Robust Features in Speech Recognition Jeih-Weih Hung, Member,
More informationThe KIT-LIMSI Translation System for WMT 2014
The KIT-LIMSI Translation System for WMT 2014 Quoc Khanh Do, Teresa Herrmann, Jan Niehues, Alexandre Allauzen, François Yvon and Alex Waibel LIMSI-CNRS, Orsay, France Karlsruhe Institute of Technology,
More informationExperiments with SMS Translation and Stochastic Gradient Descent in Spanish Text Author Profiling
Experiments with SMS Translation and Stochastic Gradient Descent in Spanish Text Author Profiling Notebook for PAN at CLEF 2013 Andrés Alfonso Caurcel Díaz 1 and José María Gómez Hidalgo 2 1 Universidad
More informationFinding Translations in Scanned Book Collections
Finding Translations in Scanned Book Collections Ismet Zeki Yalniz Dept. of Computer Science University of Massachusetts Amherst, MA, 01003 zeki@cs.umass.edu R. Manmatha Dept. of Computer Science University
More informationhave to be modeled) or isolated words. Output of the system is a grapheme-tophoneme conversion system which takes as its input the spelling of words,
A Language-Independent, Data-Oriented Architecture for Grapheme-to-Phoneme Conversion Walter Daelemans and Antal van den Bosch Proceedings ESCA-IEEE speech synthesis conference, New York, September 1994
More informationDeep Neural Network Language Models
Deep Neural Network Language Models Ebru Arısoy, Tara N. Sainath, Brian Kingsbury, Bhuvana Ramabhadran IBM T.J. Watson Research Center Yorktown Heights, NY, 10598, USA {earisoy, tsainath, bedk, bhuvana}@us.ibm.com
More informationLoughton School s curriculum evening. 28 th February 2017
Loughton School s curriculum evening 28 th February 2017 Aims of this session Share our approach to teaching writing, reading, SPaG and maths. Share resources, ideas and strategies to support children's
More informationCS Machine Learning
CS 478 - Machine Learning Projects Data Representation Basic testing and evaluation schemes CS 478 Data and Testing 1 Programming Issues l Program in any platform you want l Realize that you will be doing
More informationSoftware Maintenance
1 What is Software Maintenance? Software Maintenance is a very broad activity that includes error corrections, enhancements of capabilities, deletion of obsolete capabilities, and optimization. 2 Categories
More informationArtificial Neural Networks written examination
1 (8) Institutionen för informationsteknologi Olle Gällmo Universitetsadjunkt Adress: Lägerhyddsvägen 2 Box 337 751 05 Uppsala Artificial Neural Networks written examination Monday, May 15, 2006 9 00-14
More informationLearning to Rank with Selection Bias in Personal Search
Learning to Rank with Selection Bias in Personal Search Xuanhui Wang, Michael Bendersky, Donald Metzler, Marc Najork Google Inc. Mountain View, CA 94043 {xuanhui, bemike, metzler, najork}@google.com ABSTRACT
More informationUsing Small Random Samples for the Manual Evaluation of Statistical Association Measures
Using Small Random Samples for the Manual Evaluation of Statistical Association Measures Stefan Evert IMS, University of Stuttgart, Germany Brigitte Krenn ÖFAI, Vienna, Austria Abstract In this paper,
More informationUniversity of Alberta. Large-Scale Semi-Supervised Learning for Natural Language Processing. Shane Bergsma
University of Alberta Large-Scale Semi-Supervised Learning for Natural Language Processing by Shane Bergsma A thesis submitted to the Faculty of Graduate Studies and Research in partial fulfillment of
More informationIntroduction to Ensemble Learning Featuring Successes in the Netflix Prize Competition
Introduction to Ensemble Learning Featuring Successes in the Netflix Prize Competition Todd Holloway Two Lecture Series for B551 November 20 & 27, 2007 Indiana University Outline Introduction Bias and
More informationInformatics 2A: Language Complexity and the. Inf2A: Chomsky Hierarchy
Informatics 2A: Language Complexity and the Chomsky Hierarchy September 28, 2010 Starter 1 Is there a finite state machine that recognises all those strings s from the alphabet {a, b} where the difference
More informationAge Effects on Syntactic Control in. Second Language Learning
Age Effects on Syntactic Control in Second Language Learning Miriam Tullgren Loyola University Chicago Abstract 1 This paper explores the effects of age on second language acquisition in adolescents, ages
More informationCorrective Feedback and Persistent Learning for Information Extraction
Corrective Feedback and Persistent Learning for Information Extraction Aron Culotta a, Trausti Kristjansson b, Andrew McCallum a, Paul Viola c a Dept. of Computer Science, University of Massachusetts,
More informationarxiv: v2 [cs.cv] 30 Mar 2017
Domain Adaptation for Visual Applications: A Comprehensive Survey Gabriela Csurka arxiv:1702.05374v2 [cs.cv] 30 Mar 2017 Abstract The aim of this paper 1 is to give an overview of domain adaptation and
More information