Wrapup: IE, QA, and Dialog. Mausam

Wrapup: IE, QA, and Dialog Mausam

Grading 50% 40% project 20% final exam 15% 20% regular reviews 15% 10% midterm survey 10% presentation Extra credit: participation

Plan (1 st half of the course) Classical papers/problems in IE: Bootstrapping, NELL, Open IE Important techniques for IE: CRFs, tree kernels, distant supervision, joint inference, deep learning, reinforcement learning IE++ coreference paraphrases inference Plan (2 nd half of the course) QA: Conversational agents:

Plan (1 st half++ of the course) Classical papers/problems in IE: Bootstrapping, NELL, Open IE Important techniques for IE: Semi-CRFs, tree kernels, distant supervision, joint inference, topic models, deep learning (CNNs), reinforcement learning IE++: coreference paraphrases Inference: random walks, neural models Plan (2 nd half of the course) QA: open QA, semantic parsing. LSTM, attention, more attention, Recursive NN, deep feature fusion network Conversational agents: Gen. Hierarchical nets, GANs, MemNets

NLP (or any application course) Techniques/Models Bootstrapping (coupled) Semi-SSL PGMs: semi-crf, MultiR, LDA Tree Kernels Multi-instance learning Random walks over graphs Reinforcement learning CNN, LSTM, Bi-LSTM, Recursive NN Attention, MemNets, GANs Problems NER Entity/Rel/Event Extraction Open Rel/Event Extraction Multi-task learning KB inference Open QA Machine comprehension Task-oriented dialog w/ KB General dialog

How much data? Large supervised dataset: supervised learning Trick to compute large supervised dataset w/o noise Semi-CRF, Twit-NER/POS, QuizBowl, SQUaD QA, CNN QA, Movies, Ubuntu, OQA, random walks (negative data can be artificial) Small supervised dataset: semi-supervised learning Bootstrapping, co-training, Graph-based SSL No supervised dataset: unsupervised learning/rules TwitIE ReVerb Trick to compute large supervised dataset with noise: distant supervision MultiR, PCNNs

Non-deep L Ideas: Semi-supervised Bootstrapping (in a loop) automatic generation of training data by matching known facts Multi-view / Multi-task co-training Constraints between tasks; Agreement between multiple classifiers for same concept Graph-based SSL Agreement between nodes of the graph

Non-deep L Ideas: distant supervision KB of facts: known. Extraction supervision: unknown Bootstrap a training dataset: matching sentences with facts Hypothesis 1: all such sentences are positive training for a fact: NOISY Hypothesis 2: all such sentences form a bag. Each bag must have a unique relation: BETTER Hypothesis 3: each bag can have multiple labels: EVEN BETTER Multi-Instance Learning Noisy OR in PGMs maximize the max probability in the bag

Non-deep L Ideas: No Intermediate Supervision QA tasks: (Question, Answer) pairs known; inference chain: unknown Distant Supervision: KB fact known; which sentence to extract from: unknown OQA (which proof is better is not known) Random walk inference (which path is better is not known) MultiR (which sentence in corpus is not known) Approach create a model for scoring each path/proof using weights on properties of each constituent train using known supervision (perceptron style updates) Differences: OQA scores each edge separately, PRA scores path; MultiR mil.

Non-deep L Ideas: Sparsity Tree Kernels: two features (paths) are similar if one has many constituent elements with the other. Similarity weighted by penalty to non-similar elements Paraphrase dataset for QA Open relations as supplements in KB inference

Deep Learning Models Convolutional NNs Handle fixed length contexts Recurrent NNs Handle small variable length histories LSTMs/GRUs Handle larger variable length histories Bi-LSTMs Handle larger variable length histories and futures Recursive NNs Handle variable length partially ordered histories

Deep Learning Models (contd) Hierarchical Recurrent NNs RNN over RNNs Attention models attach non-uniform importance to histories based on evidence (question) Co-attention models attach non-uniform importances to histories in two different NNs MemNets add an external storage with explicit read, write, updates Generative Adversarial Nets a better training procedure using actor-critic architecture

Hierarchical Models Semi-CRFs: joint segmentation and labeling Sentence is a sequence of segments, which are sequence of words Allows segment level features to be added HRED: LSTM over LSTM Document is a sequence of sentences, which is a sequence of words Conversation is a sequence of utterances, which is a sequence of words

RL for Text Two uses Use 1: search the Web to find easy documents for IE Use 2: Policy gradient algorithm for updating weights for generator in GANs.

Bootstrapping [Akshay] Fuzzy matching between seed tuples and text [Shantanu] Named entity tags in patterns [Gagan, Barun] Confidence level for each pattern and fact Semantic drift

NELL Never-ending/lifelong learning Human supervision to guide the learning [many] multi-view multi-task co-training [many] coupling constraints for high precision. [Dinesh] ontology to define the constraints

Open IE [many] ontology-free, scalablity [Surag] data-driven research through extensive error analysis [Dinesh] reusing datasets from one task to another [Partha] open relations as supplementary knowledge to reduce sparsity

Tree Kernels [Shantanu] major info about the relation lies in the shortest path of the dependency parse

Semi-CRFs [many] segment level features in CRF [Dinesh] joint segmentation and labeling? Order L CRFs vs Semi-CRFs

MultiR [Rishab] Use of KB to create a training set [Surag] multi-instance learning in PGMs [Akshay] relationship between sentence-level and aggregate extractions [Gagan] Vitterbi approximation (replace expectation with max)

PCNNs [Haroun] Max pooling to make layers independent of sentence size [Akshay] Piecewise max pooling to capture arg1, rel, arg2 [Akshay] Multi-instance learning in neural nets Positional embeddings

TwitIE [Haroun] tweets are challenging, but redundancy is good [Dinesh] G 2 test for ranking entities for a given date [Shantanu] event type discovery using topic models

RL for IE [many] active querying for gathering external evidence

PRA for KB inference [Haroun, Akshay] low variance sampling [Arindam] learning non-functional relations [Nupur] paths as features in a learning model

Joint MF-TF [Akshay, Shantanu] OOV handling [Nupur] loss function in joint modeling

Open QA [Surag] structured perceptron in a pipeline model [Akshay] paraphrase corpus for question rewriting [Shantanu] mining paraphrase operators from corpus [Arindam] decomposition of scoring over derivation steps

LSTMs [Haroun] attention > depth [Akshay] cool way to construct the dataset [Dinesh] two types of readers

Co-attention [many] iterative refinement of answer span selection*

HRED [Akshay] pretraining dialog model with a QA dataset [Arindam] passing intermediate context improves coherence? [Barun] split of local dialog generator and global state tracker

MSQU [many] partially annotated data [many] natural language -> SQL

GANs [many] teacher forcing [Akshay] interesting heuristics [Arindam] discriminator feedback can be backpropagated despite being non-differentiable

MemNets [Surag] typed OOVs [Haroun] hops [Shantanu, Gagan] subtask-styled evaluation

Open/Next Issues IE: mature? Event extraction Temporal extraction Rapid retargettability KB Inference Long way to go Combining DL and path-based models

Open/Next Issues QA systems Dataset driven research: [MC] SQUaD tremendous progress Answering in the wild: not clear (large answer spaces?) Deep learning for large-scale QA Conversational agents [Task driven] how to get DL model to issue a variety of queries [General] how to get the system to say something interesting? DL: what are the systems really capturing!?

Conclusions Learn key historical developments in IE Learn (some) state of the art in IE, inference, QA and dialog Learn how to critique strengths and weaknesses of a paper Learn how to brainstorm next steps and future directions Learn how to summarize an advanced area of research Learn to do research at the cutting edge

Exam Bring a laptop Internet enabled PDFLatex enabled Bring a mobile Taking a picture Extension cords It is ok even if you have not deeply understood every paper

Project Presentations Motivation & Problem definition 1 Slide of Contribution Background Technical Approach Experiments Analysis Conclusions Future Work