Natural language processing: syntactic and semantic tagging. IFT Réseaux neuronaux

Similar documents
Chunk Parsing for Base Noun Phrases using Regular Expressions. Let s first let the variable s0 be the sentence tree of the first sentence.

Enhancing Unlexicalized Parsing Performance using a Wide Coverage Lexicon, Fuzzy Tag-set Mapping, and EM-HMM-based Lexical Probabilities

Semi-supervised methods of text processing, and an application to medical concept extraction. Yacine Jernite Text-as-Data series September 17.

Context Free Grammars. Many slides from Michael Collins

11/29/2010. Statistical Parsing. Statistical Parsing. Simple PCFG for ATIS English. Syntactic Disambiguation

2/15/13. POS Tagging Problem. Part-of-Speech Tagging. Example English Part-of-Speech Tagsets. More Details of the Problem. Typical Problem Cases

arxiv: v1 [cs.cl] 20 Jul 2015

BANGLA TO ENGLISH TEXT CONVERSION USING OPENNLP TOOLS

Learning Structural Correspondences Across Different Linguistic Domains with Synchronous Neural Language Models

Grammars & Parsing, Part 1:

Heuristic Sample Selection to Minimize Reference Standard Training Set for a Part-Of-Speech Tagger

Prediction of Maximal Projection for Semantic Role Labeling

The stages of event extraction

Unsupervised Learning of Word Semantic Embedding using the Deep Structured Semantic Model

Exploiting Wikipedia as External Knowledge for Named Entity Recognition

Natural Language Processing. George Konidaris

System Implementation for SemEval-2017 Task 4 Subtask A Based on Interpolated Deep Neural Networks

A deep architecture for non-projective dependency parsing

POS tagging of Chinese Buddhist texts using Recurrent Neural Networks

The Internet as a Normative Corpus: Grammar Checking with a Search Engine

Boosting Named Entity Recognition with Neural Character Embeddings

A Vector Space Approach for Aspect-Based Sentiment Analysis

SEMAFOR: Frame Argument Resolution with Log-Linear Models

ARNE - A tool for Namend Entity Recognition from Arabic Text

Language Acquisition Fall 2010/Winter Lexical Categories. Afra Alishahi, Heiner Drenhaus

University of Alberta. Large-Scale Semi-Supervised Learning for Natural Language Processing. Shane Bergsma

Linking Task: Identifying authors and book titles in verbose queries

EdIt: A Broad-Coverage Grammar Checker Using Pattern Grammar

Argument structure and theta roles

ENGBG1 ENGBL1 Campus Linguistics. Meeting 2. Chapter 7 (Morphology) and chapter 9 (Syntax) Pia Sundqvist

Module 12. Machine Learning. Version 2 CSE IIT, Kharagpur

Basic Syntax. Doug Arnold We review some basic grammatical ideas and terminology, and look at some common constructions in English.

LTAG-spinal and the Treebank

arxiv: v1 [cs.cv] 10 May 2017

Construction Grammar. University of Jena.

Introduction to HPSG. Introduction. Historical Overview. The HPSG architecture. Signature. Linguistic Objects. Descriptions.

Target Language Preposition Selection an Experiment with Transformation-Based Learning and Aligned Bilingual Data

A study of speaker adaptation for DNN-based speech synthesis

Product Feature-based Ratings foropinionsummarization of E-Commerce Feedback Comments

Second Exam: Natural Language Parsing with Neural Networks

What Can Neural Networks Teach us about Language? Graham Neubig a2-dlearn 11/18/2017

Deep search. Enhancing a search bar using machine learning. Ilgün Ilgün & Cedric Reichenbach

UNIVERSITY OF OSLO Department of Informatics. Dialog Act Recognition using Dependency Features. Master s thesis. Sindre Wetjen

Extracting Verb Expressions Implying Negative Opinions

Autoregressive product of multi-frame predictions can improve the accuracy of hybrid models

Compositional Semantics

Formulaic Language and Fluency: ESL Teaching Applications

Basic Parsing with Context-Free Grammars. Some slides adapted from Julia Hirschberg and Dan Jurafsky 1

Chapter 4: Valence & Agreement CSLI Publications

CS 598 Natural Language Processing

The Evolution of Random Phenomena

LIM-LIG at SemEval-2017 Task1: Enhancing the Semantic Similarity for Arabic Sentences with Vectors Weighting

Knowledge Transfer in Deep Convolutional Neural Nets

National Academies STEM Workforce Summit

Developing a TT-MCTAG for German with an RCG-based Parser

Python Machine Learning

A Ruled-Based Part of Speech (RPOS) Tagger for Malay Text Articles

Training a Neural Network to Answer 8th Grade Science Questions Steven Hewitt, An Ju, Katherine Stasaski

Words come in categories

Extracting Opinion Expressions and Their Polarities Exploration of Pipelines and Joint Models

THE world surrounding us involves multiple modalities

Named Entity Recognition: A Survey for the Indian Languages

CS224d Deep Learning for Natural Language Processing. Richard Socher, PhD

Twitter Sentiment Classification on Sanders Data using Hybrid Approach

Word Sense Disambiguation

Parsing with Treebank Grammars: Empirical Bounds, Theoretical Models, and the Structure of the Penn Treebank

ELD CELDT 5 EDGE Level C Curriculum Guide LANGUAGE DEVELOPMENT VOCABULARY COMMON WRITING PROJECT. ToolKit

BUILDING CONTEXT-DEPENDENT DNN ACOUSTIC MODELS USING KULLBACK-LEIBLER DIVERGENCE-BASED STATE TYING

NEURAL DIALOG STATE TRACKER FOR LARGE ONTOLOGIES BY ATTENTION MECHANISM. Youngsoo Jang*, Jiyeon Ham*, Byung-Jun Lee, Youngjae Chang, Kee-Eung Kim

arxiv: v4 [cs.cl] 28 Mar 2016

ADVANCED MACHINE LEARNING WITH PYTHON BY JOHN HEARTY DOWNLOAD EBOOK : ADVANCED MACHINE LEARNING WITH PYTHON BY JOHN HEARTY PDF

J j W w. Write. Name. Max Takes the Train. Handwriting Letters Jj, Ww: Words with j, w 321

Inleiding Taalkunde. Docent: Paola Monachesi. Blok 4, 2001/ Syntax 2. 2 Phrases and constituent structure 2. 3 A minigrammar of Italian 3

The Indiana Cooperative Remote Search Task (CReST) Corpus

1/20 idea. We ll spend an extra hour on 1/21. based on assigned readings. so you ll be ready to discuss them in class

Name of Course: French 1 Middle School. Grade Level(s): 7 and 8 (half each) Unit 1

Online Updating of Word Representations for Part-of-Speech Tagging

Learning Methods for Fuzzy Systems

A Simple VQA Model with a Few Tricks and Image Features from Bottom-up Attention

Derivational: Inflectional: In a fit of rage the soldiers attacked them both that week, but lost the fight.

Georgetown University at TREC 2017 Dynamic Domain Track

AUTOMATIC DETECTION OF PROLONGED FRICATIVE PHONEMES WITH THE HIDDEN MARKOV MODELS APPROACH 1. INTRODUCTION

Parsing of part-of-speech tagged Assamese Texts

A Named Entity Recognition Method using Rules Acquired from Unlabeled Data

Using dialogue context to improve parsing performance in dialogue systems

Hindi Aspectual Verb Complexes

Switchboard Language Model Improvement with Conversational Data from Gigaword

Adjectives tell you more about a noun (for example: the red dress ).

Beyond the Pipeline: Discrete Optimization in NLP

Part of Speech Template

Lecture 1: Machine Learning Basics

Derivational and Inflectional Morphemes in Pak-Pak Language

Two methods to incorporate local morphosyntactic features in Hindi dependency

Intension, Attitude, and Tense Annotation in a High-Fidelity Semantic Representation

Глубокие рекуррентные нейронные сети для аспектно-ориентированного анализа тональности отзывов пользователей на различных языках

THE VERB ARGUMENT BROWSER

Lessons from a Massive Open Online Course (MOOC) on Natural Language Processing for Digital Humanities

Tailoring i EW-MFA (Economy-Wide Material Flow Accounting/Analysis) information and indicators

Towards a Machine-Learning Architecture for Lexical Functional Grammar Parsing. Grzegorz Chrupa la

Short Text Understanding Through Lexical-Semantic Analysis

Transcription:

Natural language processing: syntactic and semantic tagging IFT 725 - Réseaux neuronaux

Topics: word tagging WORD TAGGING In many NLP applications, it is useful to augment text data with syntactic and semantic information we would like to add syntactic/semantic labels to each word This problem can be tackled using a conditional random field with neural network unary potentials we will describe the model developed by Ronan Collobert and Jason Weston in: A Unified Architecture for Natural Language Processing: Deep Neural Networks with Multitask Learning Collobert and Weston, 2008 (see Natural Language Processing (Almost) from Scratch for the journal version) 2

WORD TAGGING Topics: part-of-speech tagging Tag each word with its part of speech category noun, verb, adverb, etc. might want to distinguish between singular/plural, present tense/past tense, etc. see Penn Treebank POS tags set for an example Example: The little yellow dog barked at the cat DT JJ JJ NN VBD IN DT NN (from Stanislas Lauly) 3

WORD TAGGING Topics: chunking Segment phrases into syntactic phrases noun phrase, verb phrase, etc. Segments are identified with IOBES encoding single word phrase (S- prefix). Ex.: S-NP multiword phrase (B-, I-, E- prefixes). Ex.: B-VP I-VP I-VP E-VP words outside of syntactic phrases: O He reckons the current account deficit S-NP S-VP B-NP I-NP I-NP E-NP (from Stanislas Lauly) 4

WORD TAGGING Topics: chunking Segment phrases into syntactic phrases noun phrase, verb phrase, etc. Segments are identified with IOBES encoding single word phrase (S- prefix). Ex.: S-NP multiword phrase (B-, I-, E- prefixes). Ex.: B-VP I-VP I-VP E-VP words outside of syntactic phrases: O NP He reckons the current account deficit S-NP S-VP B-NP I-NP I-NP E-NP (from Stanislas Lauly) 4

WORD TAGGING Topics: chunking Segment phrases into syntactic phrases noun phrase, verb phrase, etc. Segments are identified with IOBES encoding single word phrase (S- prefix). Ex.: S-NP multiword phrase (B-, I-, E- prefixes). Ex.: B-VP I-VP I-VP E-VP words outside of syntactic phrases: O NP VB He reckons the current account deficit S-NP S-VP B-NP I-NP I-NP E-NP (from Stanislas Lauly) 4

WORD TAGGING Topics: chunking Segment phrases into syntactic phrases noun phrase, verb phrase, etc. Segments are identified with IOBES encoding single word phrase (S- prefix). Ex.: S-NP multiword phrase (B-, I-, E- prefixes). Ex.: B-VP I-VP I-VP E-VP words outside of syntactic phrases: O NP VB NP He reckons the current account deficit S-NP S-VP B-NP I-NP I-NP E-NP (from Stanislas Lauly) 4

WORD TAGGING Topics: named entity recognition (NER) Identify phrases referring to a named entity person location organization Example: U.N. official Ekeus heads for Baghdad S-ORG O S-PER O O S-LOC (from Stanislas Lauly) 5

WORD TAGGING Topics: semantic role labeling (SRL) For each verb, identify the role of other words with respect to that verb Example: V: verb A0: acceptor A1: thing accepted A2: accepted from A3: attribute AM-MOD: modal AM-NEG: negation He would n t accept anything of value S-A0 S-AM-MOD S-AM-NEG V B-A1 I-A1 E-A1 (from Stanislas Lauly) 6

Topics: labeled corpus WORD TAGGING The raw data looks like this: The DT B-NP O B-A0 B-A0 $ $ I-NP O I-A0 I-A0 1.4 CD I-NP O I-A0 I-A0 billion CD I-NP O I-A0 I-A0 robot NN I-NP O I-A0 I-A0 spacecraft NN E-NP O E-A0 E-A0 faces VBZ S-VP O S-V O a DT B-NP O B-A1 O six-year JJ I-NP O I-A1 O journey NN E-NP O I-A1 O to TO B-VP O I-A1 O explore VB E-VP O I-A1 S-V Jupiter NNP S-NP S-ORG I-A1 B-A1 and CC O O I-A1 I-A1 its PRP$ B-NP O I-A1 I-A1 16 CD I-NP O I-A1 I-A1 known JJ I-NP O I-A1 I-A1 moons NNS E-NP O E-A1 E-A1.. O O O O 7

SENTENCE NEURAL NETWORK Topics: sentence convolutional network How to model each label sequence could use a CRF with neural network unary potentials, based on a window (context) of words Input Sentence Text The cat sat on the mat Feature 1 w1 1 w2 1... wn 1. Lookup Table LT W 1. LT W K Padding Feature K w K 1 w K 2... w K N Padding d - not appropriate for semantic role labeling, because relevant context might be very far away Collobert and Weston suggest a convolutional network over the whole sentence - prediction at a given position can exploit information from any word in the sentence Convolution Max Over Time max( ) Linear M 2 HardTanh M 1 n 1 hu n 2 hu n 1 hu Linear M 3 n 3 hu = #tags 8

SENTENCE NEURAL NETWORK Topics: sentence convolutional network Each word can be represented by more then one feature feature of the word itself substring features - prefix: eating eat - suffix: eating ing Input Sentence Text The cat sat on the mat Feature 1 w1 1 w2 1... wn 1. Padding Feature K w K 1 w K 2... w K N Padding gazetteer features - whether the word belong to a list of known locations, persons, etc. These features are treated like word features, with their own lookup tables 9

SENTENCE NEURAL NETWORK Topics: sentence convolutional network Feature must encode for which word we are making a prediction done by adding the relative position i-posw, where posw is the position of the current word this feature also has its lookup table Input Sentence Text The cat sat on the mat Feature 1 w1 1 w2 1... wn 1. Padding Feature K w K 1 w K 2... w K N Padding For SRL, must know the roles for which verb we are predicting also add the relative position of that verb i-posv 10

SENTENCE NEURAL NETWORK Topics: sentence convolutional network Lookup table: for each word concatenate the representations of its features Lookup Table LT W 1. LT W K d Convolution: at every position, compute linear activations from a window of representations Convolution M 1 n 1 hu this is a convolution in 1D Max pooling: Max Over Time obtain a fixed hidden layer with a max across positions max( ) n 1 hu 11

SENTENCE NEURAL NETWORK Topics: sentence convolutional network Regular neural network: the pooled representation serves as the input of a regular neural network they proposed using a hard version of the tanh activation function Linear M 2 HardTanh Linear M 3 n 2 hu n 3 hu = #tags The outputs are used as the unary potential of a chain CRF over the labels no connections between the CRFs of the different task (one CRF per task) a separate neural network is used for each task 12

SENTENCE NEURAL NETWORK Topics: multitask learning Could share vector representations of the features across tasks simply use the same lookup tables across tasks the other parameters of the neural networks are not tied Lookup Table Linear HardTanh n 1 hu LT W 1. LT W K M 1 HardTanh Lookup Table Linear n 1 hu Linear Linear This is referred to as multitask learning M 2 (t1) M 2 (t2) Task 1 n 2 hu,(t1) = #tags n2 hu,(t2) = #tags Task 2 the idea is to transfer knowledge learned within the word representations across the different task 13

SENTENCE NEURAL NETWORK Topics: language model We can design other tasks without any labeled data identify whether the middle word of a window of text is an impostor cat sat on the mat vs cat sat think the mat can generate impostor examples from unlabeled text (Wikipedia) - pick a window of words from unlabeled corpus - replace middle word with a different, randomly chosen word train a neural network (with word representations) to assign a higher score to the original window { max 0, 1 f θ (x)+ f θ (x (w) ) original window } impostor window with word w similar to language modeling, except we predict the word in the middle 14

SENTENCE NEURAL NETWORK Topics: experimental comparison From Natural Language Processing (Almost) from Scratch by Collobert et al. Approach POS CHUNK NER SRL (PWA) (F1) (F1) (F1) Benchmark Systems 97.24 94.29 89.31 77.92 NN+SLL 96.37 Window 90.33Approach 81.47 70.99 15

SENTENCE NEURAL NETWORK Topics: experimental comparison From Natural Language Processing (Almost) from Scratch by Collobert et al. Approach POS CHUNK NER SRL (PWA) (F1) (F1) (F1) Benchmark Systems 97.24 94.29 89.31 77.92 NN+SLL 96.37 Window 90.33Approach 81.47 70.99 NN+SLL+LM2 97.12 93.37 88.78 74.15 NN+SLL+LM2+MTL 97.22 93.75 88.27 74.29 15

SENTENCE NEURAL NETWORK Topics: experimental comparison From Natural Language Processing (Almost) from Scratch by Collobert et al. Approach POS CHUNK NER SRL (PWA) (F1) (F1) (F1) Benchmark Systems 97.24 94.29 89.31 77.92 NN+SLL 96.37 Window 90.33Approach 81.47 70.99 NN+SLL+LM2 97.12 93.37 88.78 74.15 NN+SLL+LM2+MTL 97.22 93.75 88.27 74.29 NN+SLL+LM2+Suffix2 97.29 NN+SLL+LM2+Gazetteer 89.59 NN+SLL+LM2+POS 94.32 88.67 NN+SLL+LM2+CHUNK 74.72 15

SENTENCE NEURAL NETWORK Topics: experimental comparison Nearest neighbors in word representation space: FRANCE JESUS XBOX REDDISH SCRATCHED MEGABITS 454 1973 6909 11724 29869 87025 AUSTRIA GOD AMIGA GREENISH NAILED OCTETS BELGIUM SATI PLAYSTATION BLUISH SMASHED MB/S GERMANY CHRIST MSX PINKISH PUNCHED BIT/S ITALY SATAN IPOD PURPLISH POPPED BAUD GREECE KALI SEGA BROWNISH CRIMPED CARATS SWEDEN INDRA PSNUMBER GREYISH SCRAPED KBIT/S NORWAY VISHNU HD GRAYISH SCREWED MEGAHERTZ EUROPE ANANDA DREAMCAST WHITISH SECTIONED MEGAPIXELS HUNGARY PARVATI GEFORCE SILVERY SLASHED GBIT/S SWITZERLAND GRACE CAPCOM YELLOWISH RIPPED AMPERES For a 2D visualization: http://www.cs.toronto.edu/~hinton/turian.png 16

CONCLUSION We saw a particular architecture for tagging words with syntactic and semantic information it exploits the idea of learning vector representations of words it uses a convolutional architecture, to use the whole sentence as context it demonstrates that unsupervised learning can help a lot in learning good representations it can incorporate additional features that are known to work well in certain NLP problems even without them, it almost reaches state of the art performances 17