Wrapup: IE, QA, and Dialog. Mausam

Size: px

Start display at page:

Download "Wrapup: IE, QA, and Dialog. Mausam"

Tyler Burns
5 years ago
Views:

1 Wrapup: IE, QA, and Dialog Mausam

2 Grading 50% 40% project 20% final exam 15% 20% regular reviews 15% 10% midterm survey 10% presentation Extra credit: participation

3 Plan (1 st half of the course) Classical papers/problems in IE: Bootstrapping, NELL, Open IE Important techniques for IE: CRFs, tree kernels, distant supervision, joint inference, deep learning, reinforcement learning IE++ coreference paraphrases inference Plan (2 nd half of the course) QA: Conversational agents:

4 Plan (1 st half++ of the course) Classical papers/problems in IE: Bootstrapping, NELL, Open IE Important techniques for IE: Semi-CRFs, tree kernels, distant supervision, joint inference, topic models, deep learning (CNNs), reinforcement learning IE++: coreference paraphrases Inference: random walks, neural models Plan (2 nd half of the course) QA: open QA, semantic parsing. LSTM, attention, more attention, Recursive NN, deep feature fusion network Conversational agents: Gen. Hierarchical nets, GANs, MemNets

5 NLP (or any application course) Techniques/Models Bootstrapping (coupled) Semi-SSL PGMs: semi-crf, MultiR, LDA Tree Kernels Multi-instance learning Random walks over graphs Reinforcement learning CNN, LSTM, Bi-LSTM, Recursive NN Attention, MemNets, GANs Problems NER Entity/Rel/Event Extraction Open Rel/Event Extraction Multi-task learning KB inference Open QA Machine comprehension Task-oriented dialog w/ KB General dialog

6 How much data? Large supervised dataset: supervised learning Trick to compute large supervised dataset w/o noise Semi-CRF, Twit-NER/POS, QuizBowl, SQUaD QA, CNN QA, Movies, Ubuntu, OQA, random walks (negative data can be artificial) Small supervised dataset: semi-supervised learning Bootstrapping, co-training, Graph-based SSL No supervised dataset: unsupervised learning/rules TwitIE ReVerb Trick to compute large supervised dataset with noise: distant supervision MultiR, PCNNs

7 Non-deep L Ideas: Semi-supervised Bootstrapping (in a loop) automatic generation of training data by matching known facts Multi-view / Multi-task co-training Constraints between tasks; Agreement between multiple classifiers for same concept Graph-based SSL Agreement between nodes of the graph

8 Non-deep L Ideas: distant supervision KB of facts: known. Extraction supervision: unknown Bootstrap a training dataset: matching sentences with facts Hypothesis 1: all such sentences are positive training for a fact: NOISY Hypothesis 2: all such sentences form a bag. Each bag must have a unique relation: BETTER Hypothesis 3: each bag can have multiple labels: EVEN BETTER Multi-Instance Learning Noisy OR in PGMs maximize the max probability in the bag

9 Non-deep L Ideas: No Intermediate Supervision QA tasks: (Question, Answer) pairs known; inference chain: unknown Distant Supervision: KB fact known; which sentence to extract from: unknown OQA (which proof is better is not known) Random walk inference (which path is better is not known) MultiR (which sentence in corpus is not known) Approach create a model for scoring each path/proof using weights on properties of each constituent train using known supervision (perceptron style updates) Differences: OQA scores each edge separately, PRA scores path; MultiR mil.

10 Non-deep L Ideas: Sparsity Tree Kernels: two features (paths) are similar if one has many constituent elements with the other. Similarity weighted by penalty to non-similar elements Paraphrase dataset for QA Open relations as supplements in KB inference

11 Deep Learning Models Convolutional NNs Handle fixed length contexts Recurrent NNs Handle small variable length histories LSTMs/GRUs Handle larger variable length histories Bi-LSTMs Handle larger variable length histories and futures Recursive NNs Handle variable length partially ordered histories

12 Deep Learning Models (contd) Hierarchical Recurrent NNs RNN over RNNs Attention models attach non-uniform importance to histories based on evidence (question) Co-attention models attach non-uniform importances to histories in two different NNs MemNets add an external storage with explicit read, write, updates Generative Adversarial Nets a better training procedure using actor-critic architecture

13 Hierarchical Models Semi-CRFs: joint segmentation and labeling Sentence is a sequence of segments, which are sequence of words Allows segment level features to be added HRED: LSTM over LSTM Document is a sequence of sentences, which is a sequence of words Conversation is a sequence of utterances, which is a sequence of words

14 RL for Text Two uses Use 1: search the Web to find easy documents for IE Use 2: Policy gradient algorithm for updating weights for generator in GANs.

15 Bootstrapping [Akshay] Fuzzy matching between seed tuples and text [Shantanu] Named entity tags in patterns [Gagan, Barun] Confidence level for each pattern and fact Semantic drift

16 NELL Never-ending/lifelong learning Human supervision to guide the learning [many] multi-view multi-task co-training [many] coupling constraints for high precision. [Dinesh] ontology to define the constraints

17 Open IE [many] ontology-free, scalablity [Surag] data-driven research through extensive error analysis [Dinesh] reusing datasets from one task to another [Partha] open relations as supplementary knowledge to reduce sparsity

18 Tree Kernels [Shantanu] major info about the relation lies in the shortest path of the dependency parse

19 Semi-CRFs [many] segment level features in CRF [Dinesh] joint segmentation and labeling? Order L CRFs vs Semi-CRFs

20 MultiR [Rishab] Use of KB to create a training set [Surag] multi-instance learning in PGMs [Akshay] relationship between sentence-level and aggregate extractions [Gagan] Vitterbi approximation (replace expectation with max)

21 PCNNs [Haroun] Max pooling to make layers independent of sentence size [Akshay] Piecewise max pooling to capture arg1, rel, arg2 [Akshay] Multi-instance learning in neural nets Positional embeddings

22 TwitIE [Haroun] tweets are challenging, but redundancy is good [Dinesh] G 2 test for ranking entities for a given date [Shantanu] event type discovery using topic models

23 RL for IE [many] active querying for gathering external evidence

24 PRA for KB inference [Haroun, Akshay] low variance sampling [Arindam] learning non-functional relations [Nupur] paths as features in a learning model

25 Joint MF-TF [Akshay, Shantanu] OOV handling [Nupur] loss function in joint modeling

26 Open QA [Surag] structured perceptron in a pipeline model [Akshay] paraphrase corpus for question rewriting [Shantanu] mining paraphrase operators from corpus [Arindam] decomposition of scoring over derivation steps

27 LSTMs [Haroun] attention > depth [Akshay] cool way to construct the dataset [Dinesh] two types of readers

28 Co-attention [many] iterative refinement of answer span selection*

29 HRED [Akshay] pretraining dialog model with a QA dataset [Arindam] passing intermediate context improves coherence? [Barun] split of local dialog generator and global state tracker

30 MSQU [many] partially annotated data [many] natural language -> SQL

31 GANs [many] teacher forcing [Akshay] interesting heuristics [Arindam] discriminator feedback can be backpropagated despite being non-differentiable

32 MemNets [Surag] typed OOVs [Haroun] hops [Shantanu, Gagan] subtask-styled evaluation

33 Open/Next Issues IE: mature? Event extraction Temporal extraction Rapid retargettability KB Inference Long way to go Combining DL and path-based models

34 Open/Next Issues QA systems Dataset driven research: [MC] SQUaD tremendous progress Answering in the wild: not clear (large answer spaces?) Deep learning for large-scale QA Conversational agents [Task driven] how to get DL model to issue a variety of queries [General] how to get the system to say something interesting? DL: what are the systems really capturing!?

35 Conclusions Learn key historical developments in IE Learn (some) state of the art in IE, inference, QA and dialog Learn how to critique strengths and weaknesses of a paper Learn how to brainstorm next steps and future directions Learn how to summarize an advanced area of research Learn to do research at the cutting edge

36 Exam Bring a laptop Internet enabled PDFLatex enabled Bring a mobile Taking a picture Extension cords It is ok even if you have not deeply understood every paper

37 Project Presentations Motivation & Problem definition 1 Slide of Contribution Background Technical Approach Experiments Analysis Conclusions Future Work

Semi-supervised methods of text processing, and an application to medical concept extraction. Yacine Jernite Text-as-Data series September 17.

Semi-supervised methods of text processing, and an application to medical concept extraction Yacine Jernite Text-as-Data series September 17. 2015 What do we want from text? 1. Extract information 2. Link