Intelligent user interfaces. Information extraction. Blocks world [Winograd 1971] Web search. Deep but narrow. Broad but shallow

Similar documents
Semi-supervised methods of text processing, and an application to medical concept extraction. Yacine Jernite Text-as-Data series September 17.

Probabilistic Latent Semantic Analysis

Linking Task: Identifying authors and book titles in verbose queries

Active Learning. Yingyu Liang Computer Sciences 760 Fall

Assignment 1: Predicting Amazon Review Ratings

Rule Learning With Negation: Issues Regarding Effectiveness

Chinese Language Parsing with Maximum-Entropy-Inspired Parser

AQUA: An Ontology-Driven Question Answering System

CSL465/603 - Machine Learning

Python Machine Learning

Lecture 1: Machine Learning Basics

Learning From the Past with Experiment Databases

11/29/2010. Statistical Parsing. Statistical Parsing. Simple PCFG for ATIS English. Syntactic Disambiguation

CS Machine Learning

Human Emotion Recognition From Speech

The 9 th International Scientific Conference elearning and software for Education Bucharest, April 25-26, / X

Objectives. Chapter 2: The Representation of Knowledge. Expert Systems: Principles and Programming, Fourth Edition

arxiv: v1 [cs.cl] 2 Apr 2017

Enhancing Unlexicalized Parsing Performance using a Wide Coverage Lexicon, Fuzzy Tag-set Mapping, and EM-HMM-based Lexical Probabilities

Using Web Searches on Important Words to Create Background Sets for LSI Classification

Business Analytics and Information Tech COURSE NUMBER: 33:136:494 COURSE TITLE: Data Mining and Business Intelligence

Compositional Semantics

A study of speaker adaptation for DNN-based speech synthesis

CS 446: Machine Learning

Natural Language Processing. George Konidaris

Discriminative Learning of Beam-Search Heuristics for Planning

Knowledge-Based - Systems

An Introduction to Simio for Beginners

Custom essay writing services 1 aa >>>CLICK HERE<<<

BMBF Project ROBUKOM: Robust Communication Networks

The Good Judgment Project: A large scale test of different methods of combining expert predictions

Analysis of Emotion Recognition System through Speech Signal Using KNN & GMM Classifier

Rule Learning with Negation: Issues Regarding Effectiveness

Generative models and adversarial training

Reading writing listening. speaking skills.

(Sub)Gradient Descent

Mining Association Rules in Student s Assessment Data

Speech Emotion Recognition Using Support Vector Machine

Unsupervised Learning of Word Semantic Embedding using the Deep Structured Semantic Model

A Comparison of Two Text Representations for Sentiment Analysis

Lecture 1: Basic Concepts of Machine Learning

Twitter Sentiment Classification on Sanders Data using Hybrid Approach

Distant Supervised Relation Extraction with Wikipedia and Freebase

Speech Recognition at ICSI: Broadcast News and beyond

Calibration of Confidence Measures in Speech Recognition

Multi-Lingual Text Leveling

Learning Structural Correspondences Across Different Linguistic Domains with Synchronous Neural Language Models

arxiv: v1 [cs.cv] 10 May 2017

Machine Learning and Data Mining. Ensembles of Learners. Prof. Alexander Ihler

University of Alberta. Large-Scale Semi-Supervised Learning for Natural Language Processing. Shane Bergsma

Exploration. CS : Deep Reinforcement Learning Sergey Levine

Parsing of part-of-speech tagged Assamese Texts

Introduction to Simulation

Treebank mining with GrETEL. Liesbeth Augustinus Frank Van Eynde

Ensemble Technique Utilization for Indonesian Dependency Parser

Citrine Informatics. The Latest from Citrine. Citrine Informatics. The data analytics platform for the physical world

A New Perspective on Combining GMM and DNN Frameworks for Speaker Adaptation

Module 12. Machine Learning. Version 2 CSE IIT, Kharagpur

Top US Tech Talent for the Top China Tech Company

Experiments with SMS Translation and Stochastic Gradient Descent in Spanish Text Author Profiling

Introduction to Ensemble Learning Featuring Successes in the Netflix Prize Competition

Machine Learning from Garden Path Sentences: The Application of Computational Linguistics

Word Segmentation of Off-line Handwritten Documents

ADVANCED MACHINE LEARNING WITH PYTHON BY JOHN HEARTY DOWNLOAD EBOOK : ADVANCED MACHINE LEARNING WITH PYTHON BY JOHN HEARTY PDF

DEVELOPMENT OF A MULTILINGUAL PARALLEL CORPUS AND A PART-OF-SPEECH TAGGER FOR AFRIKAANS

Lecture 2: Quantifiers and Approximation

Syntax Parsing 1. Grammars and parsing 2. Top-down and bottom-up parsing 3. Chart parsers 4. Bottom-up chart parsing 5. The Earley Algorithm

Courses in English. Application Development Technology. Artificial Intelligence. 2017/18 Spring Semester. Database access

Common Core Exemplar for English Language Arts and Social Studies: GRADE 1

Cross Language Information Retrieval

Knowledge based expert systems D H A N A N J A Y K A L B A N D E

MGT/MGP/MGB 261: Investment Analysis

Evaluation of Learning Management System software. Part II of LMS Evaluation

Grounding Language for Interactive Task Learning

Learning Methods for Fuzzy Systems

POS tagging of Chinese Buddhist texts using Recurrent Neural Networks

Prediction of Maximal Projection for Semantic Role Labeling

Mathematics process categories

learning collegiate assessment]

Predicting Students Performance with SimStudent: Learning Cognitive Skills from Observation

Association Between Categorical Variables

Using dialogue context to improve parsing performance in dialogue systems

ScienceDirect. A Framework for Clustering Cardiac Patient s Records Using Unsupervised Learning Techniques

Extracting Opinion Expressions and Their Polarities Exploration of Pipelines and Joint Models

UNIVERSITY OF OSLO Department of Informatics. Dialog Act Recognition using Dependency Features. Master s thesis. Sindre Wetjen

Switchboard Language Model Improvement with Conversational Data from Gigaword

Indian Institute of Technology, Kanpur

Australian Journal of Basic and Applied Sciences

Target Language Preposition Selection an Experiment with Transformation-Based Learning and Aligned Bilingual Data

Firms and Markets Saturdays Summer I 2014

Comment-based Multi-View Clustering of Web 2.0 Items

LEARNING A SEMANTIC PARSER FROM SPOKEN UTTERANCES. Judith Gaspers and Philipp Cimiano

Specification and Evaluation of Machine Translation Toy Systems - Criteria for laboratory assignments

Jacqueline C. Kowtko, Patti J. Price Speech Research Program, SRI International, Menlo Park, CA 94025

Notes on The Sciences of the Artificial Adapted from a shorter document written for course (Deciding What to Design) 1

BOOK INFORMATION SHEET. For all industries including Versions 4 to x 196 x 20 mm 300 x 209 x 20 mm 0.7 kg 1.1kg

The Smart/Empire TIPSTER IR System

Towards a Machine-Learning Architecture for Lexical Functional Grammar Parsing. Grzegorz Chrupa la

ALL-IN-ONE MEETING GUIDE THE ECONOMICS OF WELL-BEING

TIMSS ADVANCED 2015 USER GUIDE FOR THE INTERNATIONAL DATABASE. Pierre Foy

Transcription:

Learning to Execute atural Language Percy Liang Why natural language? Computer Forum April 16, 2013 1 Information extraction Intelligent user interfaces Observation: free form text contains a wealth of information (e.g., news articles, blogs, scientific journals, etc.) Goal: extract structured information from unstructured natural language Tasks: find information, perform data analytics, buy tickets, send email, etc. Goal: allow people to accomplish more complex tasks efficiently using natural language Book a non stop flight from SF to Tokyo when the price drops by 20%. 2 3 Web search Broad but shallow Blocks world [Winograd 1971] Person: Pick up a big red block. Computer: OK. Person: Grasp the pyramid. Computer: I don't understand which pyramid you mean. Person (changing their mind): Find a block which is taller than the one you are holding and put it into the box. Computer: By "it", I assume you mean the block which is taller than the one I am holding. Computer: OK. Person: What does the box contain? Computer: The blue pyramid and the blue block. Person: What is the pyramid supported by? Computer: The box. Person: How many blocks are not in the box? Computer: Four of them. Person: Is at least one of them narrower than the one which I told you to pick up? Computer: Yes, the red cube. Deep but narrow 4 5

Models in LP How do we get deep and broad systems? Basic models: Topic models (e.g., Latent Dirichlet Allocation) gram language models Sequence models (e.g., HMM, conditional random fields) More structured models (our focus): Syntactic models over parse trees Semantic models over logical forms 6 7 Deep question answering semantic parsing execute database query Egypt Point: to answer question, need to model the logical form Training a semantic parser Detailed supervision: manually annotate logical forms What's Bulgaria's capital? When was Google started? What movies has Tom Cruise been in?...... Requires experts slow and expensive, doesn't scale up! Example: Penn Treebank (50K sentences annotated with parse trees) took 3 years 8 9 Training a semantic parser Shallow supervision: question/answers pairs What's Bulgaria's capital? Sofia When was Google started? 1998 What movies has Tom Cruise been in? TopGun,VanillaSky,......... Get answers via crowdsourcing (no expertise required) or by scraping the web fast and cheap (but noisy), scales up Logical forms modeled as latent variables Summary so far: Modeling deep semantics of natural language is important eed to learn from natural/weak supervision to obtain broad coverage Rest of talk: Spectral methods for learning latent variable models Learning a broad coverage semantic parser 10 11

Latent variable models natural/weak supervision latent variables Spectral methods for learning latent variable models (joint work with Daniel Hsu, Sham Kakade, Arun Chaganty) Many applications: Relation extraction Machine translation Speech recognition... 12 13 Unsupervised learning In general, latent variable models lead to non convex optimization problems (finding global optimum is P hard) Local optimization Algorithms: EM, Gibbs sampling, variational methods Problem: get stuck in local optima Solution (heuristic): careful initialization, annealing, multiple restarts 14 15 Method of moments (global) Method of moments (global) Use of data Computation Global optimization efficient inefficient Local optimization no guarantees [Anandkumar/Hsu/Kakade, 2012] Algorithm (has rigorous theoretical guarantees): Compute aggregate statistics over data (trivial to parallelize) Perform simple linear algebra operations to obtain parameter estimates Method of moments inefficient efficient In Big Data regime, method of moments is a win! Missing: structural uncertainty, discriminative modeling 16 17

Structural uncertainty : I like algorithms. Discriminative latent variable models Generative models (e.g., aive Bayes): S S : I V VP or V V algorithms Discriminative models (e.g., logistic regression, SVMs): like algorithms I like Our algorithm: unmixing [IPS 2012] Our algorithm: for mixture of linear regressions [ICML 2013] 18 19 semantic parsing (joint work with Jonathan Berant, Andrew Chou) execute database query Egypt 20 21 Training data Experimental results Expensive: logical forms Cheap: answers Task: US geography question/answering benchmark [Zelle & Mooney, 1996; Zettlemoyer & Collins, 2005] [Clarke et al., 2010] [Wong & Mooney, 2006; Kwiatkowski et al., 2010] [Liang et al., 2011] What is the most populous city in California? How many states border Oregon? What is the most populous city in California? Los Angeles How many states border Oregon? 3 Can we learn with no annotated logical forms? Punchline: our system (without logical forms) matches previous work (with logical forms) 22 23

Towards broad coverage Collecting question answering dataset from the Web: What shows has David Spade been in? What are the main rivers in Egypt? What year did Vince Young get drafted? In what year was President Kennedy shot?... Compared to previous datasets: Domain: from US geography to general facts Database size: from 500 to 400,000,000 (Freebase) umber of database predicates: from 40 to 30,000 Alignment Challenge: figure out how words (e.g., born) map onto predicates (e.g., PlaceOfBirth) Raw text: 1B web pages grew up in born in married in born in Freebase: 400M assertions DateOfBirth PlaceOfBirth Marriage.StartDate PlacesLived.Location Output: noisy mapping from words to predicates Final step: train semantic parser using this mapping 24 25 Experimental results Summary Goal: deep natural language semantics from shallow supervision Consequence: need to learn latent variable models Spectral methods: from intractable to easy by trading off computation and information paradigm shift in learning Punchline: using alignment, can get same accuracy with 10 times fewer question/answer pairs : state of the art results learning only from question answer pairs 26 27 Real world impact Increasing demand for deep language understanding Thank you! 28 29