Textual Entailment. Alina Petrova. February 22, 2012 EMCL TUD, HLT FBK. Textual Entailment

Similar documents
Semi-supervised methods of text processing, and an application to medical concept extraction. Yacine Jernite Text-as-Data series September 17.

Linking Task: Identifying authors and book titles in verbose queries

AQUA: An Ontology-Driven Question Answering System

Compositional Semantics

Applications of memory-based natural language processing

Semantic Inference at the Lexical-Syntactic Level

LQVSumm: A Corpus of Linguistic Quality Violations in Multi-Document Summarization

TINE: A Metric to Assess MT Adequacy

Natural Language Arguments: A Combined Approach

Detecting English-French Cognates Using Orthographic Edit Distance

The stages of event extraction

Using Semantic Relations to Refine Coreference Decisions

Combining a Chinese Thesaurus with a Chinese Dictionary

Semantic Inference at the Lexical-Syntactic Level for Textual Entailment Recognition

On document relevance and lexical cohesion between query terms

have to be modeled) or isolated words. Output of the system is a grapheme-tophoneme conversion system which takes as its input the spelling of words,

Constructing Parallel Corpus from Movie Subtitles

TextGraphs: Graph-based algorithms for Natural Language Processing

Distant Supervised Relation Extraction with Wikipedia and Freebase

Trend Survey on Japanese Natural Language Processing Studies over the Last Decade

arxiv: v1 [cs.cl] 2 Apr 2017

The College Board Redesigned SAT Grade 12

Proof Theory for Syntacticians

CS 598 Natural Language Processing

Matching Similarity for Keyword-Based Clustering

Accuracy (%) # features

Short Text Understanding Through Lexical-Semantic Analysis

Assessing Entailer with a Corpus of Natural Language From an Intelligent Tutoring System

The Smart/Empire TIPSTER IR System

A Case Study: News Classification Based on Term Frequency

BYLINE [Heng Ji, Computer Science Department, New York University,

Chunk Parsing for Base Noun Phrases using Regular Expressions. Let s first let the variable s0 be the sentence tree of the first sentence.

University of Alberta. Large-Scale Semi-Supervised Learning for Natural Language Processing. Shane Bergsma

Using dialogue context to improve parsing performance in dialogue systems

THE ROLE OF DECISION TREES IN NATURAL LANGUAGE PROCESSING

Impact of Controlled Language on Translation Quality and Post-editing in a Statistical Machine Translation Environment

Assignment 1: Predicting Amazon Review Ratings

Lecture 2: Quantifiers and Approximation

Learning Structural Correspondences Across Different Linguistic Domains with Synchronous Neural Language Models

Language Independent Passage Retrieval for Question Answering

Word Sense Disambiguation

A Domain Ontology Development Environment Using a MRD and Text Corpus

SINGLE DOCUMENT AUTOMATIC TEXT SUMMARIZATION USING TERM FREQUENCY-INVERSE DOCUMENT FREQUENCY (TF-IDF)

Individual Component Checklist L I S T E N I N G. for use with ONE task ENGLISH VERSION

Training a Neural Network to Answer 8th Grade Science Questions Steven Hewitt, An Ju, Katherine Stasaski

Cross-Media Knowledge Extraction in the Car Manufacturing Industry

Syntax Parsing 1. Grammars and parsing 2. Top-down and bottom-up parsing 3. Chart parsers 4. Bottom-up chart parsing 5. The Earley Algorithm

Target Language Preposition Selection an Experiment with Transformation-Based Learning and Aligned Bilingual Data

Word Segmentation of Off-line Handwritten Documents

Using Games with a Purpose and Bootstrapping to Create Domain-Specific Sentiment Lexicons

Basic Parsing with Context-Free Grammars. Some slides adapted from Julia Hirschberg and Dan Jurafsky 1

Universiteit Leiden ICT in Business

Multilingual Document Clustering: an Heuristic Approach Based on Cognate Named Entities

METHODS FOR EXTRACTING AND CLASSIFYING PAIRS OF COGNATES AND FALSE FRIENDS

MULTILINGUAL INFORMATION ACCESS IN DIGITAL LIBRARY

Intension, Attitude, and Tense Annotation in a High-Fidelity Semantic Representation

Innovative Methods for Teaching Engineering Courses

Longest Common Subsequence: A Method for Automatic Evaluation of Handwritten Essays

EdIt: A Broad-Coverage Grammar Checker Using Pattern Grammar

Learning a Cross-Lingual Semantic Representation of Relations Expressed in Text

Intra-talker Variation: Audience Design Factors Affecting Lexical Selections

Probabilistic Latent Semantic Analysis

2.1 The Theory of Semantic Fields

Fragment Analysis and Test Case Generation using F- Measure for Adaptive Random Testing and Partitioned Block based Adaptive Random Testing

Cross Language Information Retrieval

Clickthrough-Based Translation Models for Web Search: from Word Models to Phrase Models

The MEANING Multilingual Central Repository

An Introduction to the Minimalist Program

Bridging Lexical Gaps between Queries and Questions on Large Online Q&A Collections with Compact Translation Models

MYCIN. The MYCIN Task

Which verb classes and why? Research questions: Semantic Basis Hypothesis (SBH) What verb classes? Why the truth of the SBH matters

Task Tolerance of MT Output in Integrated Text Processes

Summary / Response. Karl Smith, Accelerations Educational Software. Page 1 of 8

POS tagging of Chinese Buddhist texts using Recurrent Neural Networks

Informatics 2A: Language Complexity and the. Inf2A: Chomsky Hierarchy

Language Acquisition Fall 2010/Winter Lexical Categories. Afra Alishahi, Heiner Drenhaus

Natural Language Processing. George Konidaris

Specification and Evaluation of Machine Translation Toy Systems - Criteria for laboratory assignments

Generation of Referring Expressions: Managing Structural Ambiguities

On Human Computer Interaction, HCI. Dr. Saif al Zahir Electrical and Computer Engineering Department UBC

Segmented Discourse Representation Theory. Dynamic Semantics with Discourse Structure

Learning Methods in Multilingual Speech Recognition

A Graph Based Authorship Identification Approach

Beyond the Pipeline: Discrete Optimization in NLP

Chapter 10 APPLYING TOPIC MODELING TO FORENSIC DATA. 1. Introduction. Alta de Waal, Jacobus Venter and Etienne Barnard

Abstractions and the Brain

Control and Boundedness

NCU IISR English-Korean and English-Chinese Named Entity Transliteration Using Different Grapheme Segmentation Approaches

Machine Learning from Garden Path Sentences: The Application of Computational Linguistics

Communication around Interactive Tables

Get Semantic With Me! The Usefulness of Different Feature Types for Short-Answer Grading

How to analyze visual narratives: A tutorial in Visual Narrative Grammar

School of Innovative Technologies and Engineering

Vorlesung Mensch-Maschine-Interaktion

Introduction to Text Mining

FBK-HLT-NLP at SemEval-2016 Task 2: A Multitask, Deep Learning Approach for Interpretable Semantic Textual Similarity

Outline. Web as Corpus. Using Web Data for Linguistic Purposes. Ines Rehbein. NCLT, Dublin City University. nclt

A cognitive perspective on pair programming

Ontologies vs. classification systems

PAGE(S) WHERE TAUGHT If sub mission ins not a book, cite appropriate location(s))

Transcription:

February 22, 2012

Introduction (TE): What is it? a notion from classical logic is applied to natural language using NLP technologies Which techniques can be applied? relevant features for detecting TE via machine learning What is done by the community? RTE Challenge

Introduction (TE): What is it? a notion from classical logic is applied to natural language using NLP technologies Which techniques can be applied? relevant features for detecting TE via machine learning What is done by the community? RTE Challenge Fondazione Bruno Kessler, Human Language Technology group RTE-7 Challenge participation

Natural Language Processing Nowadays Definition NLP is an interdisciplinary field which seeks to enable computer to process, understand and generate natural language.

Natural Language Processing Nowadays Definition NLP is an interdisciplinary field which seeks to enable computer to process, understand and generate natural language. Modern NLP consists of multiple subareas which can be defined by the tasks they aim to solve. Machine Translation Information Retrieval Question Answering Word Sense Disambiguation... Recognizing

Intuition: Recognizing is a generic task that captures major semantic inference between pieces of text. Definition Given two text fragments, Text (T) and Hypothesis (H): T entails H iff the meaning of H can be inferred from the meaning of T by human reading.

Intuition: Recognizing is a generic task that captures major semantic inference between pieces of text. Definition Given two text fragments, Text (T) and Hypothesis (H): T entails H iff the meaning of H can be inferred from the meaning of T by human reading. Notes: why human reading? what is a text fragment? Example: T: If you help the needy, God will reward you. H: Giving money to a poor man has good consequences.

TE: How-To 2 opposite approaches: Using formal sematics: translation of natural language fragments into some logical systems classical approach which brings together logic, language and psychology successful for narrow domains, but not working on comprehensive data! few training data Using surface structure: counterintuitive, but proved to be fruitful. Why? A wide range of entailments follow general patterns that arise from surface (lexical and syntactic) considerations.

TE: How-To cont d

Surface approach Main feature is lexical similarity. naive word overlap n-grams (= sequences of neighboring words) overlap Ex: A student Computational Logic workshop took place in Vienna. Workshop took place in Vienna. normalized forms working = work, brought = bring paraphrasing (different lexical forms with similar meaning) Ex: A student workshop was organised in the capital of Austria. A student workshop took place in Vienna.

Surface Approach - cont d The entailment holds iff the word overlap reaches a certain threshold. It is set via supervised learning.

Surface Approach - cont d The entailment holds iff the word overlap reaches a certain threshold. It is set via supervised learning. Statistics on F-measure (2010 data): best performance - 48.01% average performance - 33.77% up to 40% using only lexical matching But this seems to be a limit for lexical matching.

NLP vs.

NLP contribution to TE Using extra features from other areas of NLP improve lexical match results: etc. Semantic Roles Named Entity Recognition lexical knowledge bases (VerbOcean, WordNet) coreference syntactic parsing

Applications Textual entailment recognition is used in several NLP tasks: Question Answering Information Extraction Information Retrieval Text Summarization and many more.

Applications Textual entailment recognition is used in several NLP tasks: Question Answering Information Extraction Information Retrieval Text Summarization and many more. What is it? How TE is used?

Applications Textual entailment recognition is used in several NLP tasks: Question Answering Information Extraction Information Retrieval Text Summarization and many more. What is it? How TE is used? Example: T: The technological triumph known as GPS was incubated in the mind of Ivan Getting. entails (1) H: X invented the GPS

in the Community Recognizing challenge. Main Task: given a corpus of T (real data) and a set of H, determine such pairs T-H in which one fragment entails the other.

in the Community Recognizing challenge. Main Task: given a corpus of T (real data) and a set of H, determine such pairs T-H in which one fragment entails the other. compares the performance of TE systems launched in 2004 by FBK supported by Microsoft Research Mehdad, Negri, de Souza, Petrova. FBK Participation in the RTE-7 Main Task. Text Analysis Conference, 2011

FBK System for RTE-7 Multifeature system with lexical similarity being the key feature. An algorithm to compute n-gram match scores for every level of n: start from 5-grams eliminate a string when matched repeat for (n-1) level

FBK System for RTE-7 Multifeature system with lexical similarity being the key feature. An algorithm to compute n-gram match scores for every level of n: start from 5-grams eliminate a string when matched repeat for (n-1) level Extra NLP features: Semantic Roles, Named Entities, Wordnet, Syntactic Dependencies

Conclusion TE is an example of how logical notion can be projected to natural language. Area of active research. Straightforward surface techniques outperform semantic representation approaches......but clever way of computing lexical similarity should be found to achieve high performance.

Bibliography Mehdad, Negri, de Souza, Petrova. FBK Participation in the RTE-7 Main Task. Text Analysis Conference, 2011 Jia, Huang, Ma, Wan, Xiao. RKUTM Participation at TAC 2010 RTE and Summarization Track. Text Analysis Conference, 2010 Majumdar, Bhattacharyya. Lexical Based Text Entailment System for Main Task of RTE6. Text Analysis Conference, 2010