Processing/Speech, NLP and the Web

Similar documents
Leveraging Sentiment to Compute Word Similarity

Word Sense Disambiguation

CS 598 Natural Language Processing

Cross Language Information Retrieval

Target Language Preposition Selection an Experiment with Transformation-Based Learning and Aligned Bilingual Data

Chunk Parsing for Base Noun Phrases using Regular Expressions. Let s first let the variable s0 be the sentence tree of the first sentence.

Robust Sense-Based Sentiment Classification

A Bayesian Learning Approach to Concept-Based Document Classification

Semi-supervised methods of text processing, and an application to medical concept extraction. Yacine Jernite Text-as-Data series September 17.

Language Acquisition Fall 2010/Winter Lexical Categories. Afra Alishahi, Heiner Drenhaus

11/29/2010. Statistical Parsing. Statistical Parsing. Simple PCFG for ATIS English. Syntactic Disambiguation

Context Free Grammars. Many slides from Michael Collins

Vocabulary Usage and Intelligibility in Learner Language

Assessing System Agreement and Instance Difficulty in the Lexical Sample Tasks of SENSEVAL-2

Short Text Understanding Through Lexical-Semantic Analysis

Exploiting Wikipedia as External Knowledge for Named Entity Recognition

Parsing of part-of-speech tagged Assamese Texts

Prediction of Maximal Projection for Semantic Role Labeling

Natural Language Processing. George Konidaris

THE VERB ARGUMENT BROWSER

! # %& ( ) ( + ) ( &, % &. / 0!!1 2/.&, 3 ( & 2/ &,

Applications of memory-based natural language processing

Combining a Chinese Thesaurus with a Chinese Dictionary

On document relevance and lexical cohesion between query terms

Basic Parsing with Context-Free Grammars. Some slides adapted from Julia Hirschberg and Dan Jurafsky 1

Compositional Semantics

2.1 The Theory of Semantic Fields

BANGLA TO ENGLISH TEXT CONVERSION USING OPENNLP TOOLS

SEMAFOR: Frame Argument Resolution with Log-Linear Models

Linking Task: Identifying authors and book titles in verbose queries

Ensemble Technique Utilization for Indonesian Dependency Parser

SINGLE DOCUMENT AUTOMATIC TEXT SUMMARIZATION USING TERM FREQUENCY-INVERSE DOCUMENT FREQUENCY (TF-IDF)

The Role of the Head in the Interpretation of English Deverbal Compounds

CROSS LANGUAGE INFORMATION RETRIEVAL: IN INDIAN LANGUAGE PERSPECTIVE

AQUA: An Ontology-Driven Question Answering System

2/15/13. POS Tagging Problem. Part-of-Speech Tagging. Example English Part-of-Speech Tagsets. More Details of the Problem. Typical Problem Cases

Enhancing Unlexicalized Parsing Performance using a Wide Coverage Lexicon, Fuzzy Tag-set Mapping, and EM-HMM-based Lexical Probabilities

The Smart/Empire TIPSTER IR System

TINE: A Metric to Assess MT Adequacy

EdIt: A Broad-Coverage Grammar Checker Using Pattern Grammar

LEXICAL COHESION ANALYSIS OF THE ARTICLE WHAT IS A GOOD RESEARCH PROJECT? BY BRIAN PALTRIDGE A JOURNAL ARTICLE

A Semantic Similarity Measure Based on Lexico-Syntactic Patterns

Argument structure and theta roles

The MEANING Multilingual Central Repository

Developing a TT-MCTAG for German with an RCG-based Parser

Basic Syntax. Doug Arnold We review some basic grammatical ideas and terminology, and look at some common constructions in English.

Chapter 4: Valence & Agreement CSLI Publications

Unsupervised Learning of Narrative Schemas and their Participants

University of Alberta. Large-Scale Semi-Supervised Learning for Natural Language Processing. Shane Bergsma

Indian Institute of Technology, Kanpur

Introduction to Text Mining

A Domain Ontology Development Environment Using a MRD and Text Corpus

BYLINE [Heng Ji, Computer Science Department, New York University,

Distant Supervised Relation Extraction with Wikipedia and Freebase

Accuracy (%) # features

Grammars & Parsing, Part 1:

Part III: Semantics. Notes on Natural Language Processing. Chia-Ping Chen

Control and Boundedness

Underlying and Surface Grammatical Relations in Greek consider

Beyond the Pipeline: Discrete Optimization in NLP

Using dialogue context to improve parsing performance in dialogue systems

THE ROLE OF DECISION TREES IN NATURAL LANGUAGE PROCESSING

Modeling full form lexica for Arabic

Multilingual Sentiment and Subjectivity Analysis

Extracting Opinion Expressions and Their Polarities Exploration of Pipelines and Joint Models

Derivational: Inflectional: In a fit of rage the soldiers attacked them both that week, but lost the fight.

The stages of event extraction

Named Entity Recognition: A Survey for the Indian Languages

Twitter Sentiment Classification on Sanders Data using Hybrid Approach

Syntax Parsing 1. Grammars and parsing 2. Top-down and bottom-up parsing 3. Chart parsers 4. Bottom-up chart parsing 5. The Earley Algorithm

Construction Grammar. University of Jena.

Some Principles of Automated Natural Language Information Extraction

The Interface between Phrasal and Functional Constraints

Linguistic Variation across Sports Category of Press Reportage from British Newspapers: a Diachronic Multidimensional Analysis

Formulaic Language and Fluency: ESL Teaching Applications

Graph Alignment for Semi-Supervised Semantic Role Labeling

ESSLLI 2010: Resource-light Morpho-syntactic Analysis of Highly

Chapter 9 Banked gap-filling

The Discourse Anaphoric Properties of Connectives

SAMPLE PAPER SYLLABUS

5 th Grade Language Arts Curriculum Map

Building an HPSG-based Indonesian Resource Grammar (INDRA)

Product Feature-based Ratings foropinionsummarization of E-Commerce Feedback Comments

LTAG-spinal and the Treebank

A Comparative Evaluation of Word Sense Disambiguation Algorithms for German

MULTILINGUAL INFORMATION ACCESS IN DIGITAL LIBRARY

1. Introduction. 2. The OMBI database editor

ScienceDirect. Malayalam question answering system

Modeling Attachment Decisions with a Probabilistic Parser: The Case of Head Final Structures

Analysis of Probabilistic Parsing in NLP

Extracting and Ranking Product Features in Opinion Documents

Chinese Language Parsing with Maximum-Entropy-Inspired Parser

ENGBG1 ENGBL1 Campus Linguistics. Meeting 2. Chapter 7 (Morphology) and chapter 9 (Syntax) Pia Sundqvist

What the National Curriculum requires in reading at Y5 and Y6

Universiteit Leiden ICT in Business

Learning Structural Correspondences Across Different Linguistic Domains with Synchronous Neural Language Models

Using Semantic Relations to Refine Coreference Decisions

A Minimalist Approach to Code-Switching. In the field of linguistics, the topic of bilingualism is a broad one. There are many

Stakeholder Debate: Wind Energy

First Grade Curriculum Highlights: In alignment with the Common Core Standards

Transcription:

CS460/626 : Natural Language Processing/Speech, NLP and the Web (Lecture 24 WSD) Pushpak Bhattacharyya CSE Dept., IIT Bombay 5 th March, 2012

Layers of NLP Problem Parsing Semantics NLP Trinity Discourse and Coreference Part of Speech Tagging Increased Analysis Marathi French Complexity Of Processing Semantics Parsing CRF Morph HMM MEMM Hindi English Language Chunking Algorithm POS tagging Morphology

Motivation WSD: At the Heart of NLP CLIR MT SRL : Semantic Role Labeling TE NER TE : Text Entailment WSD SA CLIR NER MT SP SA WSD CFILT - IITB : Cross Lingual Information Retrieval : Named Entity Recognition : Machine Translation : Shallow Parsing : Sentiment Analysis : Word Sense Disambiguation SRL SP 3

LEARNING BASED v/s HYBRID APPROACHES Knowledge Based Approaches Rely on knowledge resources like WordNet, Thesaurus etc. May use grammar rules for disambiguation. May use hand coded d rules for disambiguation. Machine Learning Based Approaches Rely on corpus evidence. Train a model using tagged or untagged corpus. Probabilistic/Statistical models. HbidA Hybrid Approaches Use corpus evidence as well as semantic relations form WordNet. CFILT - IITB 4

Bird s eye view WSD Approaches Machine Learning Knowledge Based CFILT - IITB Supervised Unsupervised Semisupervised Hybrid 5

KNOWLEDGE BASED APPROACHES 6

WSD USING SELECTIONAL PREFERENCES AND ARGUMENTS Sense 1 Sense 2 This airlines serves dinner in the evening flight. serve (Verb) agent object edible This airlines serves the sector between Agra & Delhi. serve (Verb) agent object sector CFILT - IITB Requires exhaustive enumeration of: Argument-structure of verbs. Selectional preferences of arguments. Description of properties of words such that meeting the selectional preference criteria can be decided. E.g. This flight serves the region between Mumbai and Delhi How do you decide if region is compatible with sector 7 7

SELECTIONAL PREFERENCES (INDIAN TRADITION) Desire of some words in the sentence ( aakaangksha ). I saw the boy with long hair. The verb saw and the noun boy desire an object here. Appropriateness of some other words in the sentence to fulfil that desire ( yogyataa ). I saw the boy with long hair. The PP with long hair can be appropriately connected only to boy and not saw. In case, the ambiguity is still present, proximity ( sannidhi ) can determine the meaning. E.g. I saw the boy with a telescope. The PP with a telescope can be attached to both boy and saw,, so ambiguity still present. It is then attached to boy using the proximity check. 8 8

SELECTIONAL PREFERENCES (RECENT LINGUISTIC THEORY) There are words which demand arguments, like, verbs, prepositions, adjectives and sometimes nouns. These arguments are typically nouns. Arguments must have the property to fulfil the demand. They must satisfy selectional preferences. Example Give (verb) agent animate obj direct obj indirect I gave him the book I gave him the book (yesterday in the school) -> adjunct How does this help in WSD? One type of contextual information is the information about the type of arguments that a word takes. 9 9

Verb Argument frame Structure expressing the desire of a word is called the Argument Frame Selectional Preference Properties of the Supply Words meeting Properties of the Supply Words meeting the desire of the previous set

Argument frame (example) Sentence: I am fond of X Fond { Arg1: Prepositional Phrase (PP) { PP: of NP { N: somebody/something } } }

Verb Argument frame (example) Verb: give Give { agent: <the give>animate direct object: <the thing given> indirect object: <beneficiary>animate/organization } [I] agent gave a [book] dobj to [Ram] iobj.

Resources for Verbs VerbNet (http://verbs.colorado.edu/~mpalmer/projects/verbnet.html) Propbank (http://en.wikipedia.org/wiki/propbank) VerbOcean VerbOcean (http://demo.patrickpantel.com/demos/verbocean/)

CRITIQUE Requires exhaustive enumeration in machine-readable form of: Argument-structure of verbs. Selectional preferences of arguments. Description of properties of words such that meeting the selectional preference criteria can be decided. E.g. This flight serves the region between Mumbai and Delhi How do you decide if region is compatible with sector Accuracy 44% on Brown corpus. 14 14

OVERLAP BASED APPROACHES Require a Machine Readable Dictionary (MRD). Find the overlap between the features of different senses of an ambiguous word (sense bag) and the features of the words in its context (context bag). These features could be sense definitions, iti example sentences, hypernyms etc. The features could also be given weights. CFILT - IITB The sense which has the maximum overlap is selected as the contextually appropriate p sense. 15 15

LESK S ALGORITHM Sense Bag: contains the words in the definition of a candidate sense of the ambiguous word. Context Bag: contains the words in the definition of each sense of each context word. E.g. On burning coal we get ash. From Wordnet The noun ash has 3 senses (first 2 from tagged texts) 1. (2) ash -- (the residue that remains when something is burned) 2. (1) ash, ash tree -- (any of various deciduous pinnate-leaved ornamental or timber trees of the genus Fraxinus) 3. ash -- (strong elastic wood of any of various ash trees; used for furniture and tool handles and sporting goods such as baseball bats) The verb ash has 1 sense (no senses from tagged texts) 1. ash -- (convert into ashes) 16

CRITIQUE Proper nouns in the context of an ambiguous word can act as strong disambiguators. Eg E.g. Sachin Tendulkar will be a strong indicator of the category sports. Sachin Tendulkar plays cricket. Proper nouns are not present in the thesaurus. Hence this approach fails to capture the strong clues provided by proper nouns. Accuracy 50% when tested on 10 highly polysemous English words. 17

Extended Lesk s algorithm Original algorithm is sensitive towards exact words in the definition. iti Extension includes glosses of semantically related senses from WordNet (e.g. hypernyms, hyponyms, etc.). The scoring function becomes: score ext ( S) = context( w) I gloss( s ) s rel( s) or s s where, gloss(s) is the gloss of sense S from the lexical resource. Context(W) is the gloss of each sense of each context word. rel(s) gives the senses related to s in WordNet under some relations.

WordNet Sub-Graph Hyponymy Hypernymy Dwelling,abode Hyponymy Meronymy kitchen bckyard veranda M e r o n y m y house,home Hyponymy Gloss bedroom A place that serves as the living quarters of one or mor efamilies study guestroom hermitage cottage

Example: Extended Lesk On combustion of coal we get ash From Wordnet The noun ash has 3 senses (first 2 from tagged texts) 1. (2) ash -- (the residue that remains when something is burned) 2. (1) ash, ash tree -- (any of various deciduous pinnate-leaved ornamental or timber trees of the genus Fraxinus) 3. ash -- (strong elastic wood of any of various ash trees; used for furniture and tool handles and sporting goods such as baseball bats) The verb ash has 1 sense (no senses from tagged texts) 1. ash -- (convert into ashes)

Example: Extended Lesk (cntd) On combustion of coal we get ash From Wordnet (through hyponymy) ash -- (the residue that remains when something is burned) => >fly ash -- (fine solid particles of ash that are carried into the air when fuel is combusted) => bone ash -- (ash left when bones burn; high in calcium phosphate; used as fertilizer and in bone china)

Critique of Extended Lesk Larger region of matching in WordNet Increased chance of Matching BUT Increased chance of Topic Drift