Semantic Textual Similarity & more on Alignment

Similar documents
Compositional Semantics

Syntax Parsing 1. Grammars and parsing 2. Top-down and bottom-up parsing 3. Chart parsers 4. Bottom-up chart parsing 5. The Earley Algorithm

Basic Parsing with Context-Free Grammars. Some slides adapted from Julia Hirschberg and Dan Jurafsky 1

Proof Theory for Syntacticians

CS 598 Natural Language Processing

Natural Language Processing. George Konidaris

(Sub)Gradient Descent

Improved Reordering for Shallow-n Grammar based Hierarchical Phrase-based Translation

Grammars & Parsing, Part 1:

Chapter 4: Valence & Agreement CSLI Publications

Inleiding Taalkunde. Docent: Paola Monachesi. Blok 4, 2001/ Syntax 2. 2 Phrases and constituent structure 2. 3 A minigrammar of Italian 3

Constraining X-Bar: Theta Theory

Parsing of part-of-speech tagged Assamese Texts

What Can Neural Networks Teach us about Language? Graham Neubig a2-dlearn 11/18/2017

Probing for semantic evidence of composition by means of simple classification tasks

Noisy SMS Machine Translation in Low-Density Languages

Chinese Language Parsing with Maximum-Entropy-Inspired Parser

Language Model and Grammar Extraction Variation in Machine Translation

Some Principles of Automated Natural Language Information Extraction

A Minimalist Approach to Code-Switching. In the field of linguistics, the topic of bilingualism is a broad one. There are many

Argument structure and theta roles

Context Free Grammars. Many slides from Michael Collins

Developing a TT-MCTAG for German with an RCG-based Parser

Copyright Corwin 2015

Semi-supervised methods of text processing, and an application to medical concept extraction. Yacine Jernite Text-as-Data series September 17.

Introduction to HPSG. Introduction. Historical Overview. The HPSG architecture. Signature. Linguistic Objects. Descriptions.

Derivational: Inflectional: In a fit of rage the soldiers attacked them both that week, but lost the fight.

The Interface between Phrasal and Functional Constraints

Program in Linguistics. Academic Year Assessment Report

A Case Study: News Classification Based on Term Frequency

Language Acquisition Fall 2010/Winter Lexical Categories. Afra Alishahi, Heiner Drenhaus

11/29/2010. Statistical Parsing. Statistical Parsing. Simple PCFG for ATIS English. Syntactic Disambiguation

UNIVERSITY OF OSLO Department of Informatics. Dialog Act Recognition using Dependency Features. Master s thesis. Sindre Wetjen

Cross Language Information Retrieval

An Introduction to the Minimalist Program

Constructing Parallel Corpus from Movie Subtitles

Learning Structural Correspondences Across Different Linguistic Domains with Synchronous Neural Language Models

BANGLA TO ENGLISH TEXT CONVERSION USING OPENNLP TOOLS

Which verb classes and why? Research questions: Semantic Basis Hypothesis (SBH) What verb classes? Why the truth of the SBH matters

Construction Grammar. University of Jena.

Informatics 2A: Language Complexity and the. Inf2A: Chomsky Hierarchy

Chunk Parsing for Base Noun Phrases using Regular Expressions. Let s first let the variable s0 be the sentence tree of the first sentence.

Target Language Preposition Selection an Experiment with Transformation-Based Learning and Aligned Bilingual Data

1/20 idea. We ll spend an extra hour on 1/21. based on assigned readings. so you ll be ready to discuss them in class

Controlled vocabulary

Enhancing Unlexicalized Parsing Performance using a Wide Coverage Lexicon, Fuzzy Tag-set Mapping, and EM-HMM-based Lexical Probabilities

and secondary sources, attending to such features as the date and origin of the information.

Statewide Framework Document for:

Linking Task: Identifying authors and book titles in verbose queries

Linguistics. Undergraduate. Departmental Honors. Graduate. Faculty. Linguistics 1

arxiv: v1 [cs.cl] 2 Apr 2017

Mastering Team Skills and Interpersonal Communication. Copyright 2012 Pearson Education, Inc. publishing as Prentice Hall.

Lecture 1: Machine Learning Basics

Highlighting and Annotation Tips Foundation Lesson

LQVSumm: A Corpus of Linguistic Quality Violations in Multi-Document Summarization

Control and Boundedness

AQUA: An Ontology-Driven Question Answering System

CSC200: Lecture 4. Allan Borodin

"f TOPIC =T COMP COMP... OBJ

A Framework for Customizable Generation of Hypertext Presentations

ENGBG1 ENGBL1 Campus Linguistics. Meeting 2. Chapter 7 (Morphology) and chapter 9 (Syntax) Pia Sundqvist

Ohio s Learning Standards-Clear Learning Targets

A Computational Evaluation of Case-Assignment Algorithms

How to analyze visual narratives: A tutorial in Visual Narrative Grammar

Towards a Machine-Learning Architecture for Lexical Functional Grammar Parsing. Grzegorz Chrupa la

Machine Learning from Garden Path Sentences: The Application of Computational Linguistics

THE ROLE OF DECISION TREES IN NATURAL LANGUAGE PROCESSING

Prediction of Maximal Projection for Semantic Role Labeling

NCU IISR English-Korean and English-Chinese Named Entity Transliteration Using Different Grapheme Segmentation Approaches

Case government vs Case agreement: modelling Modern Greek case attraction phenomena in LFG

MYCIN. The MYCIN Task

Semantic Inference at the Lexical-Syntactic Level for Textual Entailment Recognition

The Smart/Empire TIPSTER IR System

Measuring the relative compositionality of verb-noun (V-N) collocations by integrating features

First Grade Standards

METHODS FOR EXTRACTING AND CLASSIFYING PAIRS OF COGNATES AND FALSE FRIENDS

Age Effects on Syntactic Control in. Second Language Learning

1.1 Examining beliefs and assumptions Begin a conversation to clarify beliefs and assumptions about professional learning and change.

Natural Language Arguments: A Combined Approach

CS Machine Learning

Approaches to control phenomena handout Obligatory control and morphological case: Icelandic and Basque

Efficient Normal-Form Parsing for Combinatory Categorial Grammar

Regression for Sentence-Level MT Evaluation with Pseudo References

Objectives. Chapter 2: The Representation of Knowledge. Expert Systems: Principles and Programming, Fourth Edition

WE GAVE A LAWYER BASIC MATH SKILLS, AND YOU WON T BELIEVE WHAT HAPPENED NEXT

Tap vs. Bottled Water

The presence of interpretable but ungrammatical sentences corresponds to mismatches between interpretive and productive parsing.

Leveraging Sentiment to Compute Word Similarity

COMPUTATIONAL COMPLEXITY OF LEFT-ASSOCIATIVE GRAMMAR

Grammar Extraction from Treebanks for Hindi and Telugu

Postprint.

Lecture 2: Quantifiers and Approximation

Emotional Variation in Speech-Based Natural Language Generation

Probability and Statistics Curriculum Pacing Guide

LING 329 : MORPHOLOGY

Clickthrough-Based Translation Models for Web Search: from Word Models to Phrase Models

Maximizing Learning Through Course Alignment and Experience with Different Types of Knowledge

The MSR-NRC-SRI MT System for NIST Open Machine Translation 2008 Evaluation

MYP Language A Course Outline Year 3

Type Theory and Universal Grammar

Transcription:

Semantic Textual Similarity & more on Alignment CMSC 723 / LING 723 / INST 725 MARINE CARPUAT marine@cs.umd.edu

2 topics today P3 task: Semantic Textual Similarity Including Monolingual alignment Beyond IBM word alignment Synchronous CFGs

Semantic Textual Similarity Series of tasks at international workshop on semantic evaluations (SemEval), since 2012 http://alt.qcri.org/semeval2017/task1/

What is Semantic Textual Similarity? Hnh whdun duuhj js ijd dj iow oijd oidj dk uwhd8 yh djhdhwuih jhu h uh jhihk, jdhhii, gdytysla, yuiyduinsjsh, iodpisomkncijsi. Kjhhuduh, dhdhhd hhduhd jjhuiq Welcome to my world, trust me you will never be disappointed djijdp idiowdiw I iwfiow ifiwoufowi ioiowruo iyfi I wioiwf oid oi iwoiwy iowuouwr ujjd hihi iohoihiof uouo ou o oufois f uhdiy oioi oo ouiosufoisuf iouiouf paidp paudoi uiu fh uhhioiof 안녕하세요제가당신에게전화했지만아무소용이있을려고... 당신이시간을즐기고있었다희망 Shjkahsiunu iuhndhau dhdkhn hdhaud8 kdhikahdi dhjhd dhjh jiidh iihiiohio hihiahdiod Yo! Come over here, you will be pleasantly surprised idoasd io idjioio jidjduio iodio oi iiouio oiudoi ifuiosu fiuoi oiuiou oi io hiyuify 8iy ih iouoiu ou o ooihyiush iuh fhdfosiip upouosu oiu oi o oisyoisy oi sih oiiou ios oisuois uois oudiosu doi soiddu os oso iio oioisosuo. Semantic Similarity جدالكجد يدجياجد يجدي يج جي وغو يحيح يحسيفحس يحيحفي سف ي جي جيييدج كجساكجاس حفجحسوجح ج. كححسح حيحي حوحوس دح حدي يجدي يو جي جيحجفححكسحجسكحك حفحسوحوشيحيدويويد وي يوسحفوفوفوطبس تعالى ومالكش دعوه هتبنبسط اخر انبساط Добро пожаловать в мой мир, поверьте мне вы никогда не будете разочарованы Quantitative Graded Similarity Score Confidence Score Principled Interpretability, which semantic components/features led to results (hopefully will lead to us gaining a better understanding of semantics)

Why Semantic Textual Similarity? Most NLP applications need some notion of semantic similarity to overcome brittleness and sparseness Provides evaluation beyond surface text processing A hub for semantic processing as a black box in applications beyond NLP Lends itself to an extrinsic evaluation of scattered semantic components

What is STS? The graded process by which two snippets of text (t1 and t2) are deemed equivalent semantically, i.e. bear the same meaning An STS system will quantifiably inform us on how similar t1 and t2 are, resulting in a similarity score An STS system will tell us why t1 and t2 are similar giving a nuanced interpretation of similarity based on semantic components contributions

What is STS? Word similarity has been relatively well studied For example according to WN cord smile 0.02 rooster voyage 0.04 noon string 0.04 fruit furnace 0.05... hill woodland 1.48 car journey 1.55 cemetery mound 1.69... cemetery graveyard 3.88 automobile car 3.92 More similar

What is STS? Fewer datasets for similarity between sentences A forest is a large area where trees grow close together. VS. The coast is an area of land that is next to the sea. [0.25]

What is STS? Fewer datasets for similarity between sentences A forest is a large area where trees grow close together. VS. Woodland is land with a lot of trees. [2.51]

What is STS? Fewer datasets for similarity between sentences Once there was a Czar who had three lovely daughters. VS. There were three beautiful girls, whose father was a Czar. [4.3]

Related tasks Paraphrase detection Are 2 sentences equivalent in meaning? Textual Entailment Does premise P entail hypothesis H? STS provides graded similarity judgments

Annotation: crowd-sourcing

Annotation: crowd-sourcing English annotation process Pairs annotated in batches of 20 Annotators paid $1 per batch 5 annotations per pair Workers need to have Mturk master qualification Defining gold standard judgments Median value of annotations After filtering low quality annotators (<0.80 correlation with leave-on-out gold & <0.20 Kappa)

Diverse data sources

Evaluation: a shared task Subset of 2016 results (Score: Pearson correlation)

STS models from word to sentence vectors Can we perform STS by comparing sentence vector representation? This approach works well for word level similarity But can we capture the meaning of a sentence in a single vector?

Composing by averaging g( shots fired at residence ) = 1 4 + + + shots fired at residence [Tai et al. 2015, Wieting et al. 2016]

How can we induce word vectors for composition? English paraphrases [Wieting et al. 2016] Bilingual sentence pairs [Hermann & Blunsom 2014] x 1 By our fellow members Thus in fact by our fellow members x 2 By our colleagues As que podramos nuestra colega disputado Bilingual phrase pairs by our fellow member de nuestra colega

STS models: monolingual alignment

Idea One (of many) approaches to monolingual entailment Exploit not only similarity between words But also similarity between their contexts See Sultan et al. 2013 https://github.com/masultan/

2 topics today P3 task: Semantic Textual Similarity Including Monolingual alignment Beyond IBM word alignment Synchronous CFGs

Aligning words & constituents Alignment: mapping between spans of text in lang1 and spans of text in lang2 Sentences in document pairs Words in sentence pairs Syntactic constituents in sentence pairs Today: 2 methods for aligning constituents Parse and match biparse

Parse & Match

Parse(-Parse)-Match Idea Align spans that are consistent with existing structure Pros Builds on existing NLP tools Cons Assume availability of lots of resources Assume that representations can be matched

Aligning words & constituents 2 methods for aligning constituents: Parse and match assume existing parses and alignment Biparse alignment = structure

A straw man hypothesis: All languages have same grammar

A straw man hypothesis: All languages have same grammar

A straw man hypothesis: All languages have same grammar

A straw man hypothesis: All languages have same grammar

The biparsing hypothesis: All languages have nearly the same grammar

The biparsing hypothesis: All languages have nearly the same grammar

Example for the biparsing hypothesis: All languages have nearly the same grammar

The biparsing hypothesis: All languages have nearly the same grammar

The biparsing hypothesis: All languages have nearly the same grammar Dekai Wu and Pascale Fung, IJCNLP-2005 HKUST Human Language Technology Center

The biparsing hypothesis : All languages have nearly the same grammar Dekai Wu and Pascale Fung, IJCNLP-2005 HKUST Human Language Technology Center

The biparsing hypothesis : All languages have nearly the same grammar Dekai Wu and Pascale Fung, IJCNLP-2005 HKUST Human Language Technology Center

The biparsing hypothesis: All languages have nearly the same grammar Permuted SDTG/SCFG VP VV PP ; 1 2 VP VV PP ; 2 1 Indexed SDTG/SCFG notation VP VV (1) PP (2), VV (1) PP (2) VP VV (1) PP (2), PP (2) VV (1) SDTG/SCFG notation VP VV PP, VV PP VP VV PP, PP VV ITG shorthand VP [ VV PP ] VP VV PP Dekai Wu and Pascale Fung, IJCNLP-2005 HKUST Human Language Technology Center

Synchronous Context Free Grammars Context free grammars (CFG) Common way of representing syntax in (monolingual) NLP Synchronous context free grammars (SCFG) Generate pairs of strings Align sentences by parsing them Translate sentences by parsing them Key algorithm: how to parse with SCFGs?

SCFG trade off Expressiveness SCFGs cannot represent all sentence pairs in all languages Efficiency SCFGs let us view alignment as parsing & benefit from well-studied formalism

Synchronous parsing cannot represent all sentence pairs

Synchronous parsing cannot represent all sentence pairs

Synchronous parsing cannot represent all sentence pairs

A subclass of SCFGs: Inversion Transduction Grammars ITGs are the subclass of SDTGs/SCFGs: with only straight and inverted transduction rules equivalent with only transduction rules of rank < 2 with only transduction rules of rank < 3 ITGs are context-free (like SCFGs).

For length-4 phrases (or frames), ITGs can express 22 out of 24 permutations!

ITGs enable efficient DP algorithms [Wu 1995] e 0 e 1 e 2 e 3 e 4 e 5 e 6 e 7 c 0 c 1 c 2 c 3 c 4 c 5 c 6

ITGs enable efficient DP algorithms [Wu 1995] e 0 e 1 e 2 e 3 e 4 e 5 e 6 e 7 c 0 c 1 c 2 c 3 c 4 c 5 c 6

ITGs enable efficient DP algorithms [Wu 1995] e 0 e 1 e 2 e 3 e 4 e 5 e 6 e 7 c 0 c 1 c 2 c 3 c 4 c 5 c 6

ITGs enable efficient DP algorithms [Wu 1995] e 0 e 1 e 2 e 3 e 4 e 5 e 6 e 7 c 0 c 1 c 2 c 3 c 4 c 5 c 6

ITGs enable efficient DP algorithms [Wu 1995] e 0 e 1 e 2 e 3 e 4 e 5 e 6 e 7 c 0 c 1 c 2 c 3 c 4 c 5 c 6

ITGs enable efficient DP algorithms [Wu 1995] e 0 e 1 e 2 e 3 e 4 e 5 e 6 e 7 c 0 c 1 c 2 c 3 c 4 c 5 c 6

ITGs enable efficient DP algorithms [Wu 1995] e 0 e 1 e 2 e 3 e 4 e 5 e 6 e 7 c 0 c 1 c 2 c 3 c 4 c 5 c 6

Biparsing with CKY Given the following SCFG A -> fat, gordos A -> thin, delgados N -> cats, gatos VP -> eats, comen NP -> A (1) N (2),N (2) A (1) S -> NP (1) VP (2), NP (1) VP (2) Let s parse a sentence pair fat cats eat gatos gordos comen Example by Matt Post (JHU)

Biparsing with CKY A -> fat, gordos A -> thin, delgados N -> cats, gatos VP -> eats, comen NP -> A (1) N (2),N (2) A (1) S -> NP (1) VP (2), NP (1) VP (2) 3 comen 2 gordos 1 gatos fat cats eats 1 2 3 Chart now enumerates pairs of spans

Biparsing with CKY A -> fat, gordos A -> thin, delgados N -> cats, gatos VP -> eats, comen NP -> A (1) N (2),N (2) A (1) S -> NP (1) VP (2), NP (1) VP (2) 3 comen 2 gordos 1 gatos A ((1,1), (2,2)) N ((2,2), (1,1)) VP ((3,3), (3,3)) fat cats eats 1 2 3 Apply lexical rules

Biparsing with CKY A -> fat, gordos A -> thin, delgados N -> cats, gatos VP -> eats, comen NP -> A (1) N (2),N (2) A (1) S -> NP (1) VP (2), NP (1) VP (2) 3 comen 2 gordos 1 gatos A S ((1,1), ((3,3), (2,2)) NP (3,3)) ((1,2), N (1,2)) ((2,2), (1,1)) VP ((3,3), (3,3)) fat cats eats For each block, apply straight & inverted rules 1 2 3

Biparsing with CKY 3 comen 2 gordos 1 gatos O(GN 3 M 3 ) fat cats eats 1 2 3

Aligning words & constituents 2 different ways of looking at this problem: parse-parse-match assume existing parses and alignment biparse alignment = structure