A Generative Model for Parsing Natural Language to Meaning Representations

Similar documents
Chinese Language Parsing with Maximum-Entropy-Inspired Parser

11/29/2010. Statistical Parsing. Statistical Parsing. Simple PCFG for ATIS English. Syntactic Disambiguation

Probabilistic Latent Semantic Analysis

2/15/13. POS Tagging Problem. Part-of-Speech Tagging. Example English Part-of-Speech Tagsets. More Details of the Problem. Typical Problem Cases

Artificial Neural Networks written examination

Speech Recognition at ICSI: Broadcast News and beyond

Using dialogue context to improve parsing performance in dialogue systems

Assignment 1: Predicting Amazon Review Ratings

Iterative Cross-Training: An Algorithm for Learning from Unlabeled Web Pages

AQUA: An Ontology-Driven Question Answering System

Learning Methods for Fuzzy Systems

CS Machine Learning

Clickthrough-Based Translation Models for Web Search: from Word Models to Phrase Models

Lecture 1: Machine Learning Basics

Experiments with SMS Translation and Stochastic Gradient Descent in Spanish Text Author Profiling

Large vocabulary off-line handwriting recognition: A survey

BAUM-WELCH TRAINING FOR SEGMENT-BASED SPEECH RECOGNITION. Han Shu, I. Lee Hetherington, and James Glass

Semi-supervised methods of text processing, and an application to medical concept extraction. Yacine Jernite Text-as-Data series September 17.

Linking Task: Identifying authors and book titles in verbose queries

Natural Language Processing: Interpretation, Reasoning and Machine Learning

System Implementation for SemEval-2017 Task 4 Subtask A Based on Interpolated Deep Neural Networks

Bridging Lexical Gaps between Queries and Questions on Large Online Q&A Collections with Compact Translation Models

ESSLLI 2010: Resource-light Morpho-syntactic Analysis of Highly

(Sub)Gradient Descent

Prediction of Maximal Projection for Semantic Role Labeling

arxiv: v1 [cs.cv] 10 May 2017

PREDICTING SPEECH RECOGNITION CONFIDENCE USING DEEP LEARNING WITH WORD IDENTITY AND SCORE FEATURES

Speech Recognition using Acoustic Landmarks and Binary Phonetic Feature Classifiers

Machine Learning and Data Mining. Ensembles of Learners. Prof. Alexander Ihler

Cross Language Information Retrieval

A Neural Network GUI Tested on Text-To-Phoneme Mapping

Multi-label classification via multi-target regression on data streams

LEARNING A SEMANTIC PARSER FROM SPOKEN UTTERANCES. Judith Gaspers and Philipp Cimiano

Machine Learning from Garden Path Sentences: The Application of Computational Linguistics

HLTCOE at TREC 2013: Temporal Summarization

Radius STEM Readiness TM

Software Maintenance

Syntax Parsing 1. Grammars and parsing 2. Top-down and bottom-up parsing 3. Chart parsers 4. Bottom-up chart parsing 5. The Earley Algorithm

INPE São José dos Campos

A DISTRIBUTIONAL STRUCTURED SEMANTIC SPACE FOR QUERYING RDF GRAPH DATA

CS 446: Machine Learning

Target Language Preposition Selection an Experiment with Transformation-Based Learning and Aligned Bilingual Data

Python Machine Learning

Objectives. Chapter 2: The Representation of Knowledge. Expert Systems: Principles and Programming, Fourth Edition

arxiv: v1 [cs.cl] 2 Apr 2017

An Effective Framework for Fast Expert Mining in Collaboration Networks: A Group-Oriented and Cost-Based Method

SARDNET: A Self-Organizing Feature Map for Sequences

AUTOMATIC DETECTION OF PROLONGED FRICATIVE PHONEMES WITH THE HIDDEN MARKOV MODELS APPROACH 1. INTRODUCTION

Foundations of Knowledge Representation in Cyc

QuickStroke: An Incremental On-line Chinese Handwriting Recognition System

Switchboard Language Model Improvement with Conversational Data from Gigaword

METHODS FOR EXTRACTING AND CLASSIFYING PAIRS OF COGNATES AND FALSE FRIENDS

The stages of event extraction

SINGLE DOCUMENT AUTOMATIC TEXT SUMMARIZATION USING TERM FREQUENCY-INVERSE DOCUMENT FREQUENCY (TF-IDF)

A Case Study: News Classification Based on Term Frequency

Lecture 10: Reinforcement Learning

have to be modeled) or isolated words. Output of the system is a grapheme-tophoneme conversion system which takes as its input the spelling of words,

Netpix: A Method of Feature Selection Leading. to Accurate Sentiment-Based Classification Models

The Strong Minimalist Thesis and Bounded Optimality

Memory-based grammatical error correction

Rule Learning With Negation: Issues Regarding Effectiveness

Extracting Opinion Expressions and Their Polarities Exploration of Pipelines and Joint Models

Likelihood-Maximizing Beamforming for Robust Hands-Free Speech Recognition

Parsing of part-of-speech tagged Assamese Texts

Universidade do Minho Escola de Engenharia

Modeling Attachment Decisions with a Probabilistic Parser: The Case of Head Final Structures

Introduction to Ensemble Learning Featuring Successes in the Netflix Prize Competition

Lip reading: Japanese vowel recognition by tracking temporal changes of lip shape

ACADEMIC AFFAIRS GUIDELINES

Language Acquisition Fall 2010/Winter Lexical Categories. Afra Alishahi, Heiner Drenhaus

Truth Inference in Crowdsourcing: Is the Problem Solved?

On document relevance and lexical cohesion between query terms

The 9 th International Scientific Conference elearning and software for Education Bucharest, April 25-26, / X

NCU IISR English-Korean and English-Chinese Named Entity Transliteration Using Different Grapheme Segmentation Approaches

BYLINE [Heng Ji, Computer Science Department, New York University,

arxiv:cmp-lg/ v1 7 Jun 1997 Abstract

A Comparison of Two Text Representations for Sentiment Analysis

Unsupervised Acoustic Model Training for Simultaneous Lecture Translation in Incremental and Batch Mode

Generative models and adversarial training

An Evaluation of the Interactive-Activation Model Using Masked Partial-Word Priming. Jason R. Perry. University of Western Ontario. Stephen J.

Learning Computational Grammars

Evolutive Neural Net Fuzzy Filtering: Basic Description

WE GAVE A LAWYER BASIC MATH SKILLS, AND YOU WON T BELIEVE WHAT HAPPENED NEXT

Lecture 1: Basic Concepts of Machine Learning

Automating the E-learning Personalization

Finding Translations in Scanned Book Collections

Reducing Features to Improve Bug Prediction

Test Effort Estimation Using Neural Network

A Domain Ontology Development Environment Using a MRD and Text Corpus

CROSS-LANGUAGE INFORMATION RETRIEVAL USING PARAFAC2

Using Web Searches on Important Words to Create Background Sets for LSI Classification

Language Independent Passage Retrieval for Question Answering

Cross-Lingual Text Categorization

Learning to Rank with Selection Bias in Personal Search

COMPUTER-ASSISTED INDEPENDENT STUDY IN MULTIVARIATE CALCULUS

TIMSS ADVANCED 2015 USER GUIDE FOR THE INTERNATIONAL DATABASE. Pierre Foy

Learning From the Past with Experiment Databases

Detecting English-French Cognates Using Orthographic Edit Distance

Twitter Sentiment Classification on Sanders Data using Hybrid Approach

Compositional Semantics

Transcription:

A Generative Model for Parsing Natural Language to Meaning Representations Jake Vasilakes March 9, 2015

Outline Background Key Concepts Purpose and Structure Generative Model Process Tree probability Parameters Decoding Discriminative reranking Averaged Perceptron Evaluation Methodology Results

Key Concepts Outline Background Key Concepts Purpose and Structure Generative Model Process Tree probability Parameters Decoding Discriminative reranking Averaged Perceptron Evaluation Methodology Results

Key Concepts Meaning Representation (MR): Formal representation of meaning. Written using a meaning representation language (MRL).

Key Concepts Meaning Representation (MR): Formal representation of meaning. Written using a meaning representation language (MRL). Semantic Category Function Symbol Arguments

Key Concepts Meaning Representation (MR): Formal representation of meaning. Written using a meaning representation language (MRL). Semantic Category Function Symbol Arguments NUM : count(state)

Key Concepts Meaning Representation (MR): Formal representation of meaning. Written using a meaning representation language (MRL). Semantic Category Function Symbol Arguments NUM : count(state) Semantic Parsing: Mapping of natural language (NL) sentences to meaning representations.

Purpose and Structure Outline Background Key Concepts Purpose and Structure Generative Model Process Tree probability Parameters Decoding Discriminative reranking Averaged Perceptron Evaluation Methodology Results

Purpose and Structure Purpose Learn a generative model to map NL sentences to MR trees. Learn an implicit grammar.

Purpose and Structure Purpose Learn a generative model to map NL sentences to MR trees. Learn an implicit grammar. System Structure

Process Outline Background Key Concepts Purpose and Structure Generative Model Process Tree probability Parameters Decoding Discriminative reranking Averaged Perceptron Evaluation Methodology Results

Process Goal Simultaneous generation of NL sentence and MR structure.

Process Goal Simultaneous generation of NL sentence and MR structure. How many states do not have rivers?

Tree probability Outline Background Key Concepts Purpose and Structure Generative Model Process Tree probability Parameters Decoding Discriminative reranking Averaged Perceptron Evaluation Methodology Results

Tree probability P(ŵ, m, T ) = P(M a ) P(m a M a ) P(w 1 M b w 2 M c m a ) P(m b m a, arg = 1) P(... m b ) P(m c m a, arg = 2) P(... m c ) ŵ: words m: MR structures T : hybrid tree

Tree probability P(ŵ, m, T ) = P(M a ) P(m a M a ) P(w 1 M b w 2 M c m a ) P(m b m a, arg = 1) P(... m b ) P(m c m a, arg = 2) P(... m c ) ŵ: words m: MR structures T : hybrid tree P(w 1 M b w 2 M c m a ) =P(m wywz m a ) P(w 1 m a ) P(M b m a, w 1 ) P(w 2 m a, w 1, M b ) P(M c m a, w 1, M b, w 2 ) P(END m a, w 1, M b, w 2, M c )

Parameters Outline Background Key Concepts Purpose and Structure Generative Model Process Tree probability Parameters Decoding Discriminative reranking Averaged Perceptron Evaluation Methodology Results

Parameters MR model parameters: m ρ(m m j, arg = k) = 1 for all j and k = 1,2 Pattern parameters: r φ(r m j) = 1 for all j r: hybrid pattern, e.g. wywz Emission parameters: t θ(t m j, Λ) = 1 for all j t: any node in T Λ: preceding context

Parameters Different contexts (Λ) result in different models. Model I: θ(t k m j, Λ) = P(t k m j ) (Unigram) Model II: θ(t k m j, Λ) = P(t k m j, t k 1 ) (Bigram) Model III: θ(t k m j, Λ) = 1 2 (Model I + Model II) (Interpolation)

Parameters Estimation MR model parameters: count and normalize. Pattern and Emission parameters: EM algorithm Unknown alignment between NL words and MR structures in training data.

Parameters EM: inside and outside probabilities Inside and outside probabilities used to calculate estimated counts. O(n 6 m) time for 1 EM iteration, where n is length of NL sentence and m the size of the MR structure. Modification implemented to bring complexity down to O(n 3 m).

Parameters Modification Idea: aggregate probabilities of NL-MR subsequences to use in subsequent computations. Aggregate probabilities for a given NL-MR subsequence m v, w v and a given pattern r, e.g. wywz. This aggregate probability can be used to calculate the partial inside or outside probability for a given m v, w v. By summing over all r, we get the total inside or outside probability.

Decoding Outline Background Key Concepts Purpose and Structure Generative Model Process Tree probability Parameters Decoding Discriminative reranking Averaged Perceptron Evaluation Methodology Results

Decoding Goal: Most probable MR structure m given NL sentence ŵ. m = argmax m P( m, T ŵ) But summing over all possible trees T is expensive. Approximate with the most likely tree (Viterbi approximation). T m = argmax m max P( m, T ŵ) = argmax T m max P(ŵ, m, T ) T In practice, ranked list of k best trees is output.

Averaged Perceptron Outline Background Key Concepts Purpose and Structure Generative Model Process Tree probability Parameters Decoding Discriminative reranking Averaged Perceptron Evaluation Methodology Results

Averaged Perceptron

Averaged Perceptron Generative model cannot model long range dependencies within trees. Use discriminative classifier to rerank the list of k best trees generated by the generative model (k = 50). Averaged perceptron with separating plane.

Averaged Perceptron Feature function maps a given tree T to a feature vector Φ(T ). Weight vector w associated with Φ(T ). T with highest score based on weights is picked as output.

Averaged Perceptron Feature function maps a given tree T to a feature vector Φ(T ). Weight vector w associated with Φ(T ). T with highest score based on weights is picked as output. Separating Plane After w is learned, set a threshold score value b. Reject a given T if it s score is less than b. Choose b that results in maximum F-score

Averaged Perceptron Features Features 1-5 are binary {0,1}. Feature 6 is real valued.

Methodology Outline Background Key Concepts Purpose and Structure Generative Model Process Tree probability Parameters Decoding Discriminative reranking Averaged Perceptron Evaluation Methodology Results

Methodology Evaluated on two corpora: GEOQUERY and ROBOCUP. Precision, recall, and F-score reported. GEOQUERY: MR structure considered correct if it retrieves the same answer as the reference MR structure when used as a query to the database, regardless of differences in the string representation. ROBOCUP: MR structure considered correct if it has the same string representation as the reference MR structure.

Results Outline Background Key Concepts Purpose and Structure Generative Model Process Tree probability Parameters Decoding Discriminative reranking Averaged Perceptron Evaluation Methodology Results

Results

Results Comparison to previous work

Results Comparison to previous work (Evaluated on a subset of GEOQUERY.)

Summary Learn a generative model which outputs a list of k best NL-MR hybrid trees from a given NL sentence. Rerank the k best list according to score assigned by the averaged perceptron with separating plane. Choose tree with highest score as output.

Appendix References References I W. Lu, H. T. Ng, W. S. Lee, L. S. Zettlemoyer. A Generative Model for Parsing Natural Language to Meaning Representations. Conference on Empirical Methods on Natural Language Processing, 2008.