Acquiring Competence from Performance Data

Similar documents
Towards a Robuster Interpretive Parsing

Lecture 1: Machine Learning Basics

On the Combined Behavior of Autonomous Resource Management Agents

An Evaluation of the Interactive-Activation Model Using Masked Partial-Word Priming. Jason R. Perry. University of Western Ontario. Stephen J.

On Human Computer Interaction, HCI. Dr. Saif al Zahir Electrical and Computer Engineering Department UBC

Generative models and adversarial training

4.0 CAPACITY AND UTILIZATION

Personnel Administrators. Alexis Schauss. Director of School Business NC Department of Public Instruction

Lecture 2: Quantifiers and Approximation

Learning Methods for Fuzzy Systems

Major Milestones, Team Activities, and Individual Deliverables

BUILDING CONTEXT-DEPENDENT DNN ACOUSTIC MODELS USING KULLBACK-LEIBLER DIVERGENCE-BASED STATE TYING

Designing a Rubric to Assess the Modelling Phase of Student Design Projects in Upper Year Engineering Courses

Shockwheat. Statistics 1, Activity 1

SARDNET: A Self-Organizing Feature Map for Sequences

Seminar - Organic Computing

Probability Therefore (25) (1.33)

arxiv:cmp-lg/ v1 22 Aug 1994

A Minimalist Approach to Code-Switching. In the field of linguistics, the topic of bilingualism is a broad one. There are many

Speech Recognition at ICSI: Broadcast News and beyond

Integrating simulation into the engineering curriculum: a case study

ENME 605 Advanced Control Systems, Fall 2015 Department of Mechanical Engineering

Level 1 Mathematics and Statistics, 2015

An argument from speech pathology

Executive Guide to Simulation for Health

Curriculum and Assessment Policy

Spanish progressive aspect in stochastic OT

Learning Structural Correspondences Across Different Linguistic Domains with Synchronous Neural Language Models

Artificial Neural Networks written examination

Reinforcement Learning by Comparing Immediate Reward

Approaches to control phenomena handout Obligatory control and morphological case: Icelandic and Basque

Canadian raising with language-specific weighted constraints Joe Pater, University of Massachusetts Amherst

Reviewed by Florina Erbeli

A simulated annealing and hill-climbing algorithm for the traveling tournament problem

Truth Inference in Crowdsourcing: Is the Problem Solved?

Reducing Features to Improve Bug Prediction

Stopping rules for sequential trials in high-dimensional data

A Reinforcement Learning Variant for Control Scheduling

FF+FPG: Guiding a Policy-Gradient Planner

The Success Principles How to Get from Where You Are to Where You Want to Be

Calibration of Confidence Measures in Speech Recognition

The Strong Minimalist Thesis and Bounded Optimality

Purdue Data Summit Communication of Big Data Analytics. New SAT Predictive Validity Case Study

Concept mapping instrumental support for problem solving

(Sub)Gradient Descent

Evidence for Reliability, Validity and Learning Effectiveness

College Pricing and Income Inequality

Speeding Up Reinforcement Learning with Behavior Transfer

College Pricing and Income Inequality

DO YOU HAVE THESE CONCERNS?

U VA THE CHANGING FACE OF UVA STUDENTS: SSESSMENT. About The Study

On-the-Fly Customization of Automated Essay Scoring

Centralized Assignment of Students to Majors: Evidence from the University of Costa Rica. Job Market Paper

XXII BrainStorming Day

Chapter 10 APPLYING TOPIC MODELING TO FORENSIC DATA. 1. Introduction. Alta de Waal, Jacobus Venter and Etienne Barnard

Axiom 2013 Team Description Paper

PROJECT MANAGEMENT AND COMMUNICATION SKILLS DEVELOPMENT STUDENTS PERCEPTION ON THEIR LEARNING

Rachel E. Baker, Ann R. Bradlow. Northwestern University, Evanston, IL, USA

TABLE OF CONTENTS TABLE OF CONTENTS COVER PAGE HALAMAN PENGESAHAN PERNYATAAN NASKAH SOAL TUGAS AKHIR ACKNOWLEDGEMENT FOREWORD

A COMPARATIVE STUDY BETWEEN NATURAL APPROACH AND QUANTUM LEARNING METHOD IN TEACHING VOCABULARY TO THE STUDENTS OF ENGLISH CLUB AT SMPN 1 RUMPIN

MYCIN. The MYCIN Task

Alpha provides an overall measure of the internal reliability of the test. The Coefficient Alphas for the STEP are:

Age Effects on Syntactic Control in. Second Language Learning

CSL465/603 - Machine Learning

Laboratorio di Intelligenza Artificiale e Robotica

Firms and Markets Saturdays Summer I 2014

Continual Curiosity-Driven Skill Acquisition from High-Dimensional Video Inputs for Humanoid Robots

Essentials of Ability Testing. Joni Lakin Assistant Professor Educational Foundations, Leadership, and Technology

Predicting Students Performance with SimStudent: Learning Cognitive Skills from Observation

SELF-STUDY QUESTIONNAIRE FOR REVIEW of the COMPUTER SCIENCE PROGRAM

The CTQ Flowdown as a Conceptual Model of Project Objectives

An Empirical and Computational Test of Linguistic Relativity

Measuring Web-Corpus Randomness: A Progress Report

An Introduction to Simio for Beginners

Adaptive Generation in Dialogue Systems Using Dynamic User Modeling

A Comparison of Annealing Techniques for Academic Course Scheduling

GCSE Mathematics B (Linear) Mark Scheme for November Component J567/04: Mathematics Paper 4 (Higher) General Certificate of Secondary Education

Gridlocked: The impact of adapting survey grids for smartphones. Ashley Richards 1, Rebecca Powell 1, Joe Murphy 1, Shengchao Yu 2, Mai Nguyen 1

A GENERIC SPLIT PROCESS MODEL FOR ASSET MANAGEMENT DECISION-MAKING

An Effective Framework for Fast Expert Mining in Collaboration Networks: A Group-Oriented and Cost-Based Method

Laboratorio di Intelligenza Artificiale e Robotica

Document number: 2013/ Programs Committee 6/2014 (July) Agenda Item 42.0 Bachelor of Engineering with Honours in Software Engineering

Application of Virtual Instruments (VIs) for an enhanced learning environment

Simulation of Multi-stage Flash (MSF) Desalination Process

SCIENCE DISCOURSE 1. Peer Discourse and Science Achievement. Richard Therrien. K-12 Science Supervisor. New Haven Public Schools

Targetsim Toolbox. Business Board Simulations: Features, Value, Impact. Dr. Gudrun G. Vogt Targetsim Founder & Managing Partner

The Role of Test Expectancy in the Build-Up of Proactive Interference in Long-Term Memory

Cooperative Game Theoretic Models for Decision-Making in Contexts of Library Cooperation 1

An Introduction to Simulation Optimization

Procedia - Social and Behavioral Sciences 143 ( 2014 ) CY-ICER Teacher intervention in the process of L2 writing acquisition

Writing Research Articles

CSC200: Lecture 4. Allan Borodin

Corrective Feedback and Persistent Learning for Information Extraction

SOFTWARE EVALUATION TOOL

Chinese Language Parsing with Maximum-Entropy-Inspired Parser

Vorlesung Mensch-Maschine-Interaktion

A New Perspective on Combining GMM and DNN Frameworks for Speaker Adaptation

Data Integration through Clustering and Finding Statistical Relations - Validation of Approach

Navigating the PhD Options in CMS

Transcription:

Acquiring Competence from Performance Data Online learnability of OT and HG with simulated annealing Tamás Biró ACLC, University of Amsterdam (UvA) Computational Linguistics in the Netherlands, February 5, 2010 Tamás Biró Acquiring Competence from Performance Data 1/17

The language acquisition problem Tamás Biró Acquiring Competence from Performance Data 2/17

Learning from competence? Tamás Biró Acquiring Competence from Performance Data 3/17

Learning from performance! Tamás Biró Acquiring Competence from Performance Data 4/17

Distance of teacher s and learner s performance Tamás Biró Acquiring Competence from Performance Data 5/17

Overview 1 Modelling linguistic performance 2 Learning 3 Results 4 Conclusions Tamás Biró Acquiring Competence from Performance Data 6/17

Overview 1 Modelling linguistic performance 2 Learning 3 Results 4 Conclusions Tamás Biró Acquiring Competence from Performance Data 7/17

Errors and mental computations Tamás Biró Acquiring Competence from Performance Data 8/17

Competence and performance models Competence models: SF(U) = arg opt H(w) w Gen(U) C i (w) elementary functions on the candidates ( constraints a misnomer). Optimality Theory: H(w) = (C n (w),..., C 1 (w)) arg opt: lexicographic order. q-harmony Grammar: H(w) = C n (w) q n +... + C i (w) q. Large q: OT-like strict domination. Small q: ganging-up effects. Performance models: Exhaustive search: returns global optimum. Simulated annealing: returns some local optimum. Run slowly: frequently the globally optimal one. Run quickly: global opt. less frequent, more often performance errors. Tamás Biró Acquiring Competence from Performance Data 9/17

Overview 1 Modelling linguistic performance 2 Learning 3 Results 4 Conclusions Tamás Biró Acquiring Competence from Performance Data 10/17

Online learning algorithms Constraint C i has rank r i. In each learning cycle: learning data (winner) produced by teacher compared to form produced by learner (loser). Update rule: update the rank r i of every constraint C i, depending on whether C i prefers the winner or the loser. Boersma (1997): increase rank by ɛ if winner-preferring; decrease rank by ɛ if loser-preferring constraint. Magri (2009): increase rank of all winner-preferring constraints by ɛ; decrease rank of highest ranked loser-preferring constraint by W ɛ, where W is the number of winner-preferring constraints. Tamás Biró Acquiring Competence from Performance Data 11/17

Learn until performance converges Convergence of performance, and not of competence. Child may acquire different grammar. Sample of teacher vs. sample of learner (sample size = 100). Convergence criterion: JSD between sample produced by target grammar and sample produced by learner s current grammar average JSD of two samples produced by target grammar. Jensen-Shannon divergence: measures the distance of two distributions where D(P Q) = P(x) x P(x) log Q(x) JSD(P Q) = D(P M) + D(Q M) 2 P(x)+Q(x) (relative entropy, Kullback-Leibler divergence), M(x) =. 2 Symmetric: JSD(P Q) = JSD(Q P). Non-negative: JSD(P Q) 0. JSD(P Q) 1. JSD(P Q) = 0 if and only if P(x) = Q(x), x. JSD(P Q) = 1 if and only if P(x) Q(x) = 0, x. Same language: JSD(L t L l ) = 0. Not a single overlap: JSD(L t L l ) = 1. Tamás Biró Acquiring Competence from Performance Data 12/17

Overview 1 Modelling linguistic performance 2 Learning 3 Results 4 Conclusions Tamás Biró Acquiring Competence from Performance Data 13/17

Results: number of learning steps until convergence 2000 times learning (rnd target, rnd underlying form) per grammar type production method learning method. Measure the number of learning steps until convergence. Distribution of the number of required learning steps: OT 10-HG 4-HG 1.5-HG gramm. M 13 ; 27 ; 45 13 ; 28 ; 46 12 ; 27 ; 48 15 ; 30 ; 47 B 23 ; 43 ; 65 22 ; 41 ; 64 22 ; 42 ; 64 23 ; 40 ; 60 sa, M 53 ; 109 ; 233 63 ; 140 ; 328 60 ; 148 ; 366 83 ; 199 ; 508 t step = 0.1 B 80 ; 171 ; 462 92 ; 240 ; 772 92 ; 239 ; 785 117 ; 290 ; 694 sa, M 64 ; 131 ; 305 62 ; 134 ; 304 63 ; 137 ; 329 72 ; 163 ; 437 t step = 1 B 90 ; 212 ; 560 92 ; 233 ; 572 84 ; 212 ; 646 101 ; 242 ; 616 ( 1st quartile ; median ; 3rd quartile) Tamás Biró Acquiring Competence from Performance Data 14/17

Methodological notes Paradigm: Measure number of learning steps until converging performance. Statistics on the distribution of the required learning step number. Under different learning conditions. Distributions have extremely long tails. Significance of differences: using non-parametric tests. Does learning speed depend on initial grammar? On learning data? Run two learners learning the same target grammar: with same initial grammar: strong correlation in nr. of learning steps. Learning data not the same: slightly decreased correlation. with different initial grammars: correlation (almost) lost. Long tail: children must start with same initial grammar, but need not receive same (correct or erroneous) data (if learning algorithm is correct). Tamás Biró Acquiring Competence from Performance Data 15/17

Conclusions Proposed paradigm for the learnability of a grammar framework: Competence = grammar framework (e.g., OT or HG). Performance = imperfect implementation of competence model. Learning from performance data, only partially reflecting competence. Learner does not have access to teacher s competence directly: converge on performance. Convergence measure using Jensen-Shannon divergence. Argument for same initial grammar in children? Implemented on OTKit. Tamás Biró Acquiring Competence from Performance Data 16/17

Thank you for your attention! Tamás Biró: t.s.biro@uva.nl Work supported by: Tools for Optimality Theory http://www.birot.hu/otkit/ Tamás Biró Acquiring Competence from Performance Data 17/17