From Dependency Parsing to Imitation Learning

Similar documents
(Sub)Gradient Descent

An investigation of imitation learning algorithms for structured prediction

Ensemble Technique Utilization for Indonesian Dependency Parser

Lecture 1: Machine Learning Basics

Module 12. Machine Learning. Version 2 CSE IIT, Kharagpur

Decision Analysis. Decision-Making Problem. Decision Analysis. Part 1 Decision Analysis and Decision Tables. Decision Analysis, Part 1

Machine Learning and Data Mining. Ensembles of Learners. Prof. Alexander Ihler

CS Machine Learning

A deep architecture for non-projective dependency parsing

Compositional Semantics

Major Milestones, Team Activities, and Individual Deliverables

Enhancing Unlexicalized Parsing Performance using a Wide Coverage Lexicon, Fuzzy Tag-set Mapping, and EM-HMM-based Lexical Probabilities

Software Maintenance

The stages of event extraction

Chinese Language Parsing with Maximum-Entropy-Inspired Parser

Experiments with a Higher-Order Projective Dependency Parser

Circuit Simulators: A Revolutionary E-Learning Platform

Given a directed graph G =(N A), where N is a set of m nodes and A. destination node, implying a direction for ow to follow. Arcs have limitations

Designing a Computer to Play Nim: A Mini-Capstone Project in Digital Design I

An Effective Framework for Fast Expert Mining in Collaboration Networks: A Group-Oriented and Cost-Based Method

Community Power Simulation

Survey on parsing three dependency representations for English

Discriminative Learning of Beam-Search Heuristics for Planning

Reinforcement Learning by Comparing Immediate Reward

The ParisNLP entry at the ConLL UD Shared Task 2017: A Tale of a #ParsingTragedy

BMBF Project ROBUKOM: Robust Communication Networks

OPTIMIZATINON OF TRAINING SETS FOR HEBBIAN-LEARNING- BASED CLASSIFIERS

Defragmenting Textual Data by Leveraging the Syntactic Structure of the English Language

What s in a Step? Toward General, Abstract Representations of Tutoring System Log Data

Towards a MWE-driven A* parsing with LTAGs [WG2,WG3]

Syntax Parsing 1. Grammars and parsing 2. Top-down and bottom-up parsing 3. Chart parsers 4. Bottom-up chart parsing 5. The Earley Algorithm

What Can Neural Networks Teach us about Language? Graham Neubig a2-dlearn 11/18/2017

Testing A Moving Target: How Do We Test Machine Learning Systems? Peter Varhol Technology Strategy Research, USA

How to Do Research. Jeff Chase Duke University

2/15/13. POS Tagging Problem. Part-of-Speech Tagging. Example English Part-of-Speech Tagsets. More Details of the Problem. Typical Problem Cases

RANKING AND UNRANKING LEFT SZILARD LANGUAGES. Erkki Mäkinen DEPARTMENT OF COMPUTER SCIENCE UNIVERSITY OF TAMPERE REPORT A ER E P S I M S

Research Design & Analysis Made Easy! Brainstorming Worksheet

A Graph Based Authorship Identification Approach

Generative models and adversarial training

Introduction to Ensemble Learning Featuring Successes in the Netflix Prize Competition

Regret-based Reward Elicitation for Markov Decision Processes

The Evolution of Random Phenomena

POFI 2301 WORD PROCESSING MS WORD 2010 LAB ASSIGNMENT WORKSHEET Office Systems Technology Daily Flex Entry

How to set up gradebook categories in Moodle 2.

University of Groningen. Systemen, planning, netwerken Bosman, Aart

A Comparison of Annealing Techniques for Academic Course Scheduling

Learning Methods in Multilingual Speech Recognition

Pragmatic Use Case Writing

Data Driven Grammatical Error Detection in Transcripts of Children s Speech

Python Machine Learning

UNIVERSITY OF OSLO Department of Informatics. Dialog Act Recognition using Dependency Features. Master s thesis. Sindre Wetjen

QuickStroke: An Incremental On-line Chinese Handwriting Recognition System

Using dialogue context to improve parsing performance in dialogue systems

Introduction to Simulation

CREATING SHARABLE LEARNING OBJECTS FROM EXISTING DIGITAL COURSE CONTENT

arxiv: v1 [math.at] 10 Jan 2016

Rule Learning with Negation: Issues Regarding Effectiveness

Cross-Lingual Dependency Parsing with Universal Dependencies and Predicted PoS Labels

Evolutive Neural Net Fuzzy Filtering: Basic Description

Guidelines for Project I Delivery and Assessment Department of Industrial and Mechanical Engineering Lebanese American University

Parallel Evaluation in Stratal OT * Adam Baker University of Arizona

Chunk Parsing for Base Noun Phrases using Regular Expressions. Let s first let the variable s0 be the sentence tree of the first sentence.

Deep Lexical Segmentation and Syntactic Parsing in the Easy-First Dependency Framework

The Moodle and joule 2 Teacher Toolkit

CSC200: Lecture 4. Allan Borodin

Exemplar 6 th Grade Math Unit: Prime Factorization, Greatest Common Factor, and Least Common Multiple

11/29/2010. Statistical Parsing. Statistical Parsing. Simple PCFG for ATIS English. Syntactic Disambiguation

An Introduction to Simio for Beginners

A Dataset of Syntactic-Ngrams over Time from a Very Large Corpus of English Books

Designing a Rubric to Assess the Modelling Phase of Student Design Projects in Upper Year Engineering Courses

IT Students Workshop within Strategic Partnership of Leibniz University and Peter the Great St. Petersburg Polytechnic University

Unsupervised Dependency Parsing without Gold Part-of-Speech Tags

CS 446: Machine Learning

A Reinforcement Learning Variant for Control Scheduling

WHEN THERE IS A mismatch between the acoustic

A Version Space Approach to Learning Context-free Grammars

Active Learning. Yingyu Liang Computer Sciences 760 Fall

Rover Races Grades: 3-5 Prep Time: ~45 Minutes Lesson Time: ~105 minutes

SEDETEP Transformation of the Spanish Operation Research Simulation Working Environment

Linking Task: Identifying authors and book titles in verbose queries

Parsing with Treebank Grammars: Empirical Bounds, Theoretical Models, and the Structure of the Penn Treebank

Houghton Mifflin Harcourt Trophies Grade 5

An Estimating Method for IT Project Expected Duration Oriented to GERT

Rule Learning With Negation: Issues Regarding Effectiveness

The Nature of Exploratory Testing

Evaluation of Usage Patterns for Web-based Educational Systems using Web Mining

Evaluation of Usage Patterns for Web-based Educational Systems using Web Mining

Seminar - Organic Computing

Language properties and Grammar of Parallel and Series Parallel Languages

AGENDA LEARNING THEORIES LEARNING THEORIES. Advanced Learning Theories 2/22/2016

Integrating simulation into the engineering curriculum: a case study

Abstractions and the Brain

Grade 8: Module 4: Unit 1: Lesson 8 Reading for Gist and Answering Text-Dependent Questions: Local Sustainable Food Chain

Parsing of part-of-speech tagged Assamese Texts

How To Enroll using the Stout Mobile App

A Framework for Customizable Generation of Hypertext Presentations

AP Statistics Summer Assignment 17-18

ADVANCED MACHINE LEARNING WITH PYTHON BY JOHN HEARTY DOWNLOAD EBOOK : ADVANCED MACHINE LEARNING WITH PYTHON BY JOHN HEARTY PDF

Prediction of Maximal Projection for Semantic Role Labeling

On Human Computer Interaction, HCI. Dr. Saif al Zahir Electrical and Computer Engineering Department UBC

Transcription:

From Dependency Parsing to Imitation Learning CMSC 723 / LING 723 / INST 725 Marine Carpuat Fig credits: Joakim Nivre, Yoav Goldberg, Hal Daume III

Today s topics: Addressing compounding error Improving on gold parse oracle Research highlight: [Goldberg & Nivre, 2012] Imitation learning for structured prediction CIML ch 18

Improving the oracle in transition-based dependency parsing Issues with oracle we ve used so far Based on configuration sequence that produces gold tree What if there are multiple sequences for a single gold tree? How can we recover if the parser deviates from gold sequence? Goldberg & Nivre [2012] propose an improved oracle

Exercise: which of these transition sequences produces the gold tree on the left?

Stack Buffer Dependency Arcs Arc from position j to position i, with dependency label l

Which of these transition sequences does the oracle algorithm produce?

Improving the oracle in transition-based dependency parsing Issues with oracle we ve used so far Based on configuration sequence that produces gold tree What if there are multiple sequences for a single gold tree? How can we recover if the parser deviates from gold sequence? Goldberg & Nivre [2012] propose an improved oracle

SHIFT At test time, suppose the 4 th transition predicted is SHIFT instead of RAIOBJ What happens if we apply the oracle next?

Measuring distance from gold tree Labeled attachment loss: number of arcs in gold tree that are not found in the predicted tree Loss = 3 Loss = 1

Improving the oracle in transition-based dependency parsing Issues with oracle we ve used so far Based on configuration sequence that produces gold tree What if there are multiple sequences for a single gold tree? How can we recover if the parser deviates from gold sequence? Goldberg & Nivre [2012] propose an improved oracle

Proposed solution: 2 key changes to training algorithm Any transition that can possibly lead to a correct tree is considered correct Explore non-optimal transitions

Proposed solution: 2 key changes to training algorithm

Defining the cost of a transition Loss difference between minimum loss trees achievable before and after transition Loss for trees nicely decomposes into losses for arcs We can compute transition cost by counting gold arcs that are no longer reachable after transition

Today s topics Addressing compounding error Improving on gold parse oracle Research highlight: [Goldberg & Nivre, 2012] Imitation learning for structured prediction CIML ch 18

Imitation Learning aka learning by demonstration Sequential decision making problem At each point in time t Receive input information x t Take action a t Suffer loss l t Move to next time step until time T Goal learn a policy function f(x t ) = y t That minimizes expected total loss over all trajectories enabled by f

Supervised Imitation Learning

Supervised Imitation Learning Problem with supervised approach: Compounding error

How can we train system to make better predictions off the expert path? We want a policy f that leads to good performance in configurations that f encounters A chicken and egg problem Can be addressed by iterative approach

DAGGER: simple & effective imitation learning via Data AGGregation Requires interaction with expert!

When is DAGGER used in practice? Interaction with expert is not always possible Classic use case Expert = slow algorithm Use DAGGER to learn a faster algorithm that imitates expert Example: game playing where expert = brute-force search in simulation mode But also structured prediction

Sequence labeling via imitation learning What is the expert here? Given a loss function (e.g., Hamming loss) Expert takes action that minimizes long-term loss Output prefix at time t Loss of best reachable output starting with prefix y a When expert can be computed exactly, it is called an oracle Key advantages Can define features No restriction to Markov features

Today s topics Improving on gold parse oracle Research highlight: [Goldberg & Nivre, 2012] Imitation learning for structured prediction CIML ch 18