Wrapup: IE, QA, and Dialog. Mausam

Similar documents
Semi-supervised methods of text processing, and an application to medical concept extraction. Yacine Jernite Text-as-Data series September 17.

Python Machine Learning

Module 12. Machine Learning. Version 2 CSE IIT, Kharagpur

System Implementation for SemEval-2017 Task 4 Subtask A Based on Interpolated Deep Neural Networks

ReinForest: Multi-Domain Dialogue Management Using Hierarchical Policies and Knowledge Ontology

Introduction to Ensemble Learning Featuring Successes in the Netflix Prize Competition

Probabilistic Latent Semantic Analysis

Machine Learning and Data Mining. Ensembles of Learners. Prof. Alexander Ihler

Deep search. Enhancing a search bar using machine learning. Ilgün Ilgün & Cedric Reichenbach

Training a Neural Network to Answer 8th Grade Science Questions Steven Hewitt, An Ju, Katherine Stasaski

Knowledge-Based - Systems

Distant Supervised Relation Extraction with Wikipedia and Freebase

A Simple VQA Model with a Few Tricks and Image Features from Bottom-up Attention

ADVANCED MACHINE LEARNING WITH PYTHON BY JOHN HEARTY DOWNLOAD EBOOK : ADVANCED MACHINE LEARNING WITH PYTHON BY JOHN HEARTY PDF

Generative models and adversarial training

arxiv: v1 [cs.cv] 10 May 2017

Learning Methods for Fuzzy Systems

AQUA: An Ontology-Driven Question Answering System

POS tagging of Chinese Buddhist texts using Recurrent Neural Networks

(Sub)Gradient Descent

The stages of event extraction

Глубокие рекуррентные нейронные сети для аспектно-ориентированного анализа тональности отзывов пользователей на различных языках

Lecture 1: Machine Learning Basics

An OO Framework for building Intelligence and Learning properties in Software Agents

Linking Task: Identifying authors and book titles in verbose queries

THE world surrounding us involves multiple modalities

On-Line Data Analytics

Active Learning. Yingyu Liang Computer Sciences 760 Fall

Lecture 1: Basic Concepts of Machine Learning

Visual CP Representation of Knowledge

arxiv: v1 [cs.cl] 2 Apr 2017

Lecture 10: Reinforcement Learning

Seminar - Organic Computing

Laboratorio di Intelligenza Artificiale e Robotica

Autoregressive product of multi-frame predictions can improve the accuracy of hybrid models

Artificial Neural Networks written examination

CS 598 Natural Language Processing

Learning From the Past with Experiment Databases

Course Outline. Course Grading. Where to go for help. Academic Integrity. EE-589 Introduction to Neural Networks NN 1 EE

Georgetown University at TREC 2017 Dynamic Domain Track

Evolutive Neural Net Fuzzy Filtering: Basic Description

Exploration. CS : Deep Reinforcement Learning Sergey Levine

Applications of memory-based natural language processing

QuickStroke: An Incremental On-line Chinese Handwriting Recognition System

Learning Structural Correspondences Across Different Linguistic Domains with Synchronous Neural Language Models

BYLINE [Heng Ji, Computer Science Department, New York University,

11/29/2010. Statistical Parsing. Statistical Parsing. Simple PCFG for ATIS English. Syntactic Disambiguation

Short Text Understanding Through Lexical-Semantic Analysis

Calibration of Confidence Measures in Speech Recognition

An Effective Framework for Fast Expert Mining in Collaboration Networks: A Group-Oriented and Cost-Based Method

Reinforcement Learning by Comparing Immediate Reward

Truth Inference in Crowdsourcing: Is the Problem Solved?

Team Formation for Generalized Tasks in Expertise Social Networks

arxiv: v1 [cs.lg] 15 Jun 2015

CS Machine Learning

Laboratorio di Intelligenza Artificiale e Robotica

A Vector Space Approach for Aspect-Based Sentiment Analysis

Discriminative Learning of Beam-Search Heuristics for Planning

Courses in English. Application Development Technology. Artificial Intelligence. 2017/18 Spring Semester. Database access

Extracting Opinion Expressions and Their Polarities Exploration of Pipelines and Joint Models

HIERARCHICAL DEEP LEARNING ARCHITECTURE FOR 10K OBJECTS CLASSIFICATION

Annotation Projection for Discourse Connectives

Cross Language Information Retrieval

Word Segmentation of Off-line Handwritten Documents

Assignment 1: Predicting Amazon Review Ratings

Dialog-based Language Learning

A New Perspective on Combining GMM and DNN Frameworks for Speaker Adaptation

Beyond the Pipeline: Discrete Optimization in NLP

Algebra 1, Quarter 3, Unit 3.1. Line of Best Fit. Overview

SEMAFOR: Frame Argument Resolution with Log-Linear Models

Enhancing Unlexicalized Parsing Performance using a Wide Coverage Lexicon, Fuzzy Tag-set Mapping, and EM-HMM-based Lexical Probabilities

Compositional Semantics

The 9 th International Scientific Conference elearning and software for Education Bucharest, April 25-26, / X

The Smart/Empire TIPSTER IR System

Text-mining the Estonian National Electronic Health Record

Bridging Lexical Gaps between Queries and Questions on Large Online Q&A Collections with Compact Translation Models

Product Feature-based Ratings foropinionsummarization of E-Commerce Feedback Comments

Web as Corpus. Corpus Linguistics. Web as Corpus 1 / 1. Corpus Linguistics. Web as Corpus. web.pl 3 / 1. Sketch Engine. Corpus Linguistics

Probability and Statistics Curriculum Pacing Guide

Semi-Supervised GMM and DNN Acoustic Model Training with Multi-system Combination and Confidence Re-calibration

Firms and Markets Saturdays Summer I 2014

Major Milestones, Team Activities, and Individual Deliverables

Corrective Feedback and Persistent Learning for Information Extraction

Rule Learning With Negation: Issues Regarding Effectiveness

Australian Journal of Basic and Applied Sciences

Learning Methods in Multilingual Speech Recognition

Learning to Schedule Straight-Line Code

A Case Study: News Classification Based on Term Frequency

Knowledge based expert systems D H A N A N J A Y K A L B A N D E

Objectives. Chapter 2: The Representation of Knowledge. Expert Systems: Principles and Programming, Fourth Edition

Ensemble Technique Utilization for Indonesian Dependency Parser

Prediction of Maximal Projection for Semantic Role Labeling

Knowledge Transfer in Deep Convolutional Neural Nets

A Neural Network GUI Tested on Text-To-Phoneme Mapping

Knowledge Elicitation Tool Classification. Janet E. Burge. Artificial Intelligence Research Group. Worcester Polytechnic Institute

FUZZY EXPERT. Dr. Kasim M. Al-Aubidy. Philadelphia University. Computer Eng. Dept February 2002 University of Damascus-Syria

Protocol for using the Classroom Walkthrough Observation Instrument

2/15/13. POS Tagging Problem. Part-of-Speech Tagging. Example English Part-of-Speech Tagsets. More Details of the Problem. Typical Problem Cases

WikiAtoms: Contributions to Wikis as Atomic Units

Online Updating of Word Representations for Part-of-Speech Tagging

Transcription:

Wrapup: IE, QA, and Dialog Mausam

Grading 50% 40% project 20% final exam 15% 20% regular reviews 15% 10% midterm survey 10% presentation Extra credit: participation

Plan (1 st half of the course) Classical papers/problems in IE: Bootstrapping, NELL, Open IE Important techniques for IE: CRFs, tree kernels, distant supervision, joint inference, deep learning, reinforcement learning IE++ coreference paraphrases inference Plan (2 nd half of the course) QA: Conversational agents:

Plan (1 st half++ of the course) Classical papers/problems in IE: Bootstrapping, NELL, Open IE Important techniques for IE: Semi-CRFs, tree kernels, distant supervision, joint inference, topic models, deep learning (CNNs), reinforcement learning IE++: coreference paraphrases Inference: random walks, neural models Plan (2 nd half of the course) QA: open QA, semantic parsing. LSTM, attention, more attention, Recursive NN, deep feature fusion network Conversational agents: Gen. Hierarchical nets, GANs, MemNets

NLP (or any application course) Techniques/Models Bootstrapping (coupled) Semi-SSL PGMs: semi-crf, MultiR, LDA Tree Kernels Multi-instance learning Random walks over graphs Reinforcement learning CNN, LSTM, Bi-LSTM, Recursive NN Attention, MemNets, GANs Problems NER Entity/Rel/Event Extraction Open Rel/Event Extraction Multi-task learning KB inference Open QA Machine comprehension Task-oriented dialog w/ KB General dialog

How much data? Large supervised dataset: supervised learning Trick to compute large supervised dataset w/o noise Semi-CRF, Twit-NER/POS, QuizBowl, SQUaD QA, CNN QA, Movies, Ubuntu, OQA, random walks (negative data can be artificial) Small supervised dataset: semi-supervised learning Bootstrapping, co-training, Graph-based SSL No supervised dataset: unsupervised learning/rules TwitIE ReVerb Trick to compute large supervised dataset with noise: distant supervision MultiR, PCNNs

Non-deep L Ideas: Semi-supervised Bootstrapping (in a loop) automatic generation of training data by matching known facts Multi-view / Multi-task co-training Constraints between tasks; Agreement between multiple classifiers for same concept Graph-based SSL Agreement between nodes of the graph

Non-deep L Ideas: distant supervision KB of facts: known. Extraction supervision: unknown Bootstrap a training dataset: matching sentences with facts Hypothesis 1: all such sentences are positive training for a fact: NOISY Hypothesis 2: all such sentences form a bag. Each bag must have a unique relation: BETTER Hypothesis 3: each bag can have multiple labels: EVEN BETTER Multi-Instance Learning Noisy OR in PGMs maximize the max probability in the bag

Non-deep L Ideas: No Intermediate Supervision QA tasks: (Question, Answer) pairs known; inference chain: unknown Distant Supervision: KB fact known; which sentence to extract from: unknown OQA (which proof is better is not known) Random walk inference (which path is better is not known) MultiR (which sentence in corpus is not known) Approach create a model for scoring each path/proof using weights on properties of each constituent train using known supervision (perceptron style updates) Differences: OQA scores each edge separately, PRA scores path; MultiR mil.

Non-deep L Ideas: Sparsity Tree Kernels: two features (paths) are similar if one has many constituent elements with the other. Similarity weighted by penalty to non-similar elements Paraphrase dataset for QA Open relations as supplements in KB inference

Deep Learning Models Convolutional NNs Handle fixed length contexts Recurrent NNs Handle small variable length histories LSTMs/GRUs Handle larger variable length histories Bi-LSTMs Handle larger variable length histories and futures Recursive NNs Handle variable length partially ordered histories

Deep Learning Models (contd) Hierarchical Recurrent NNs RNN over RNNs Attention models attach non-uniform importance to histories based on evidence (question) Co-attention models attach non-uniform importances to histories in two different NNs MemNets add an external storage with explicit read, write, updates Generative Adversarial Nets a better training procedure using actor-critic architecture

Hierarchical Models Semi-CRFs: joint segmentation and labeling Sentence is a sequence of segments, which are sequence of words Allows segment level features to be added HRED: LSTM over LSTM Document is a sequence of sentences, which is a sequence of words Conversation is a sequence of utterances, which is a sequence of words

RL for Text Two uses Use 1: search the Web to find easy documents for IE Use 2: Policy gradient algorithm for updating weights for generator in GANs.

Bootstrapping [Akshay] Fuzzy matching between seed tuples and text [Shantanu] Named entity tags in patterns [Gagan, Barun] Confidence level for each pattern and fact Semantic drift

NELL Never-ending/lifelong learning Human supervision to guide the learning [many] multi-view multi-task co-training [many] coupling constraints for high precision. [Dinesh] ontology to define the constraints

Open IE [many] ontology-free, scalablity [Surag] data-driven research through extensive error analysis [Dinesh] reusing datasets from one task to another [Partha] open relations as supplementary knowledge to reduce sparsity

Tree Kernels [Shantanu] major info about the relation lies in the shortest path of the dependency parse

Semi-CRFs [many] segment level features in CRF [Dinesh] joint segmentation and labeling? Order L CRFs vs Semi-CRFs

MultiR [Rishab] Use of KB to create a training set [Surag] multi-instance learning in PGMs [Akshay] relationship between sentence-level and aggregate extractions [Gagan] Vitterbi approximation (replace expectation with max)

PCNNs [Haroun] Max pooling to make layers independent of sentence size [Akshay] Piecewise max pooling to capture arg1, rel, arg2 [Akshay] Multi-instance learning in neural nets Positional embeddings

TwitIE [Haroun] tweets are challenging, but redundancy is good [Dinesh] G 2 test for ranking entities for a given date [Shantanu] event type discovery using topic models

RL for IE [many] active querying for gathering external evidence

PRA for KB inference [Haroun, Akshay] low variance sampling [Arindam] learning non-functional relations [Nupur] paths as features in a learning model

Joint MF-TF [Akshay, Shantanu] OOV handling [Nupur] loss function in joint modeling

Open QA [Surag] structured perceptron in a pipeline model [Akshay] paraphrase corpus for question rewriting [Shantanu] mining paraphrase operators from corpus [Arindam] decomposition of scoring over derivation steps

LSTMs [Haroun] attention > depth [Akshay] cool way to construct the dataset [Dinesh] two types of readers

Co-attention [many] iterative refinement of answer span selection*

HRED [Akshay] pretraining dialog model with a QA dataset [Arindam] passing intermediate context improves coherence? [Barun] split of local dialog generator and global state tracker

MSQU [many] partially annotated data [many] natural language -> SQL

GANs [many] teacher forcing [Akshay] interesting heuristics [Arindam] discriminator feedback can be backpropagated despite being non-differentiable

MemNets [Surag] typed OOVs [Haroun] hops [Shantanu, Gagan] subtask-styled evaluation

Open/Next Issues IE: mature? Event extraction Temporal extraction Rapid retargettability KB Inference Long way to go Combining DL and path-based models

Open/Next Issues QA systems Dataset driven research: [MC] SQUaD tremendous progress Answering in the wild: not clear (large answer spaces?) Deep learning for large-scale QA Conversational agents [Task driven] how to get DL model to issue a variety of queries [General] how to get the system to say something interesting? DL: what are the systems really capturing!?

Conclusions Learn key historical developments in IE Learn (some) state of the art in IE, inference, QA and dialog Learn how to critique strengths and weaknesses of a paper Learn how to brainstorm next steps and future directions Learn how to summarize an advanced area of research Learn to do research at the cutting edge

Exam Bring a laptop Internet enabled PDFLatex enabled Bring a mobile Taking a picture Extension cords It is ok even if you have not deeply understood every paper

Project Presentations Motivation & Problem definition 1 Slide of Contribution Background Technical Approach Experiments Analysis Conclusions Future Work