Semantic Inference at the Lexical-Syntactic Level

Similar documents
Semantic Inference at the Lexical-Syntactic Level

Semantic Inference at the Lexical-Syntactic Level for Textual Entailment Recognition

QUERY TRANSLATION FOR CROSS-LANGUAGE INFORMATION RETRIEVAL BY PARSING CONSTRAINT SYNCHRONOUS GRAMMAR

Reinforcement Learning-based Feature Selection For Developing Pedagogically Effective Tutorial Dialogue Tactics

Identifying Intention Posts in Discussion Forums

In ths paper we want to show that the possble analyses of ths problem wthn the framework of PSG are lmted by combnatons of the followng basc assumpton

Available online at Procedia Economics and Finance 2 ( 2012 )

Factors Affecting Students' Performance. 16 July 2006

Non-Profit Academic Project, developed under the Open Acces Initiative

The Differential in Earnings Premia Between Academically and Vocationally Trained Males in the United Kingdom

CSCI 104 Sorting Algorithms. Mark Redekopp David Kempe

TEACHING SPEAKING USING THE INFORMATION GAP TECHNIQUE. By Dewi Sartika * ABSTRACT

Cultural Shift or Linguistic Drift? Comparing Two Computational Measures of Semantic Change

Efficient Estimation of Time-Invariant and Rarely Changing Variables in Finite Sample Panel Analyses with Unit Fixed Effects

Semantic Inference at the Lexical-Syntactic Level

Improvement of Text Dependent Speaker Identification System Using Neuro-Genetic Hybrid Algorithm in Office Environmental Conditions

Scenario Development Approach to Management Simulation Games

Long Distance Wh-movement in Seereer: Implications for Intermediate Movement

Alignment USING COURSE, PROGRAM, AND INSTITUTIONAL CURRICULAR MAPS FOR ALIGNMENT

vision values& Values & Vision NEVADA ARTS COUNCIL STRATEGIC PLAN A society is reflected by the state of its arts. All of Nevada deserves

intellect edison.dadeschools.net i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i College board academy

AQUA: An Ontology-Driven Question Answering System

A MULTIOBJECTIVE OPTIMIZATION FOR THE EWMA AND MEWMA QUALITY CONTROL CHARTS

A Training Manual for Educators K16

Extracting Lexical Reference Rules from Wikipedia

1 st HALF YEARLY MONITORING REPORT OF (JAMIA MILLIA ISLAMIA) on MDM for the State of UTTAR PRADESH

Chunk Parsing for Base Noun Phrases using Regular Expressions. Let s first let the variable s0 be the sentence tree of the first sentence.

Enhancing Unlexicalized Parsing Performance using a Wide Coverage Lexicon, Fuzzy Tag-set Mapping, and EM-HMM-based Lexical Probabilities

Compositional Semantics

Prediction of Maximal Projection for Semantic Role Labeling

Training a Neural Network to Answer 8th Grade Science Questions Steven Hewitt, An Ju, Katherine Stasaski

THE VERB ARGUMENT BROWSER

The Internet as a Normative Corpus: Grammar Checking with a Search Engine

Proof Theory for Syntacticians

Patterson, Carter see new county jail in the future

Natural Language Arguments: A Combined Approach

The College Board Redesigned SAT Grade 12

Using dialogue context to improve parsing performance in dialogue systems

Information Status in Generation Ranking

Linking Task: Identifying authors and book titles in verbose queries

Objectives. Chapter 2: The Representation of Knowledge. Expert Systems: Principles and Programming, Fourth Edition

Context Free Grammars. Many slides from Michael Collins

A Comparison of Two Text Representations for Sentiment Analysis

arxiv: v1 [cs.cl] 2 Apr 2017

Loughton School s curriculum evening. 28 th February 2017

Improved Effects of Word-Retrieval Treatments Subsequent to Addition of the Orthographic Form

Rule Learning With Negation: Issues Regarding Effectiveness

Predicting Student Attrition in MOOCs using Sentiment Analysis and Neural Networks

Basic Parsing with Context-Free Grammars. Some slides adapted from Julia Hirschberg and Dan Jurafsky 1

Some Principles of Automated Natural Language Information Extraction

Ohio s Learning Standards-Clear Learning Targets

Intension, Attitude, and Tense Annotation in a High-Fidelity Semantic Representation

Web as Corpus. Corpus Linguistics. Web as Corpus 1 / 1. Corpus Linguistics. Web as Corpus. web.pl 3 / 1. Sketch Engine. Corpus Linguistics

The stages of event extraction

A Case Study: News Classification Based on Term Frequency

Vocabulary Usage and Intelligibility in Learner Language

A Graph Based Authorship Identification Approach

A Version Space Approach to Learning Context-free Grammars

TABE 9&10. Revised 8/2013- with reference to College and Career Readiness Standards

Probabilistic Latent Semantic Analysis

PAGE(S) WHERE TAUGHT If sub mission ins not a book, cite appropriate location(s))

Language Acquisition Fall 2010/Winter Lexical Categories. Afra Alishahi, Heiner Drenhaus

"f TOPIC =T COMP COMP... OBJ

Assessing System Agreement and Instance Difficulty in the Lexical Sample Tasks of SENSEVAL-2

A Minimalist Approach to Code-Switching. In the field of linguistics, the topic of bilingualism is a broad one. There are many

Fragment Analysis and Test Case Generation using F- Measure for Adaptive Random Testing and Partitioned Block based Adaptive Random Testing

CS Machine Learning

Rule Learning with Negation: Issues Regarding Effectiveness

Create Quiz Questions

UNIVERSITY OF OSLO Department of Informatics. Dialog Act Recognition using Dependency Features. Master s thesis. Sindre Wetjen

California Department of Education English Language Development Standards for Grade 8

Twitter Sentiment Classification on Sanders Data using Hybrid Approach

Developing a TT-MCTAG for German with an RCG-based Parser

Target Language Preposition Selection an Experiment with Transformation-Based Learning and Aligned Bilingual Data

Pre-AP Geometry Course Syllabus Page 1

The Ups and Downs of Preposition Error Detection in ESL Writing

Function Tables With The Magic Function Machine

Speech Recognition at ICSI: Broadcast News and beyond

Grammars & Parsing, Part 1:

CS 598 Natural Language Processing

K5 Math Practice. Free Pilot Proposal Jan -Jun Boost Confidence Increase Scores Get Ahead. Studypad, Inc.

Indian Institute of Technology, Kanpur

Exploiting Wikipedia as External Knowledge for Named Entity Recognition

The lab is designed to remind you how to work with scientific data (including dealing with uncertainty) and to review experimental design.

POS tagging of Chinese Buddhist texts using Recurrent Neural Networks

Cross-Media Knowledge Extraction in the Car Manufacturing Industry

Connect Microbiology. Training Guide

Formulaic Language and Fluency: ESL Teaching Applications

Presentation Advice for your Professional Review

DIDACTIC MODEL BRIDGING A CONCEPT WITH PHENOMENA

Inleiding Taalkunde. Docent: Paola Monachesi. Blok 4, 2001/ Syntax 2. 2 Phrases and constituent structure 2. 3 A minigrammar of Italian 3

EdIt: A Broad-Coverage Grammar Checker Using Pattern Grammar

The Evolution of Random Phenomena

Beyond the Blend: Optimizing the Use of your Learning Technologies. Bryan Chapman, Chapman Alliance

WE GAVE A LAWYER BASIC MATH SKILLS, AND YOU WON T BELIEVE WHAT HAPPENED NEXT

University of Waterloo School of Accountancy. AFM 102: Introductory Management Accounting. Fall Term 2004: Section 4

NCEO Technical Report 27

AGS THE GREAT REVIEW GAME FOR PRE-ALGEBRA (CD) CORRELATED TO CALIFORNIA CONTENT STANDARDS

Analysis: Evaluation: Knowledge: Comprehension: Synthesis: Application:

What the National Curriculum requires in reading at Y5 and Y6

Transcription:

Semantc Inference at the Lexcal-Syntactc Level Roy Bar-Ham and Ido Dagan Computer Scence Department Bar-Ilan Unversty Ramat-Gan 52900, Israel {barhar, dagan}@cs.bu.ac.l Iddo Greental Lngustcs Department Tel Avv Unversty Ramat Avv 69978, Israel greenta@post.tau.ac.l Eyal Shnarch Computer Scence Department Bar-Ilan Unversty Ramat-Gan 52900, Israel shey@cs.bu.ac.l Abstract Semantc nference s an mportant component n many natural language understandng applcatons. Classcal approaches to semantc nference rely on complex logcal representatons. However, practcal applcatons usually adopt shallower lexcal or lexcal-syntactc representatons, but lack a prncpled nference framework. We propose a generc semantc nference framework that operates drectly on syntactc trees. New trees are nferred by applyng entalment rules, whch provde a unfed representaton for varyng types of nferences. Rules were generated by manual and automatc methods, coverng generc lngustc structures as well as specfc lexcal-based nferences. Intal emprcal evaluaton n a Relaton Extracton settng supports the valdty of our approach. Introducton Accordng to the tradtonal formal semantcs approach nference s conducted at the logcal level. Texts are frst translated nto some logcal form and then new propostons are nferred from nterpreted texts by a logcal theorem prover. However, practcal text understandng systems usually employ shallower lexcal and lexcal-syntactc representatons, sometmes augmented wth partal semantc annotatons lke word senses, named-entty classes and semantc roles. Ths state of affars was clearly demonstrated n the recent PASCAL Recognzng Textual Entalment (RTE) Challenges (Dagan, Glckman, & Magnn 2006; Bar-Ham et al. 2006), a popular framework for evaluatng applcatonndependent semantc nference, where only a few systems appled logcal nference (Rana, Ng, & Mannng 2005; Tatu & Moldovan 2006; Bos & Markert 2006). Whle practcal semantc nference s mostly performed over lngustc rather than logcal representatons, such practces are typcally partal and qute ad-hoc, and lack a clear formalsm that specfes how nference knowledge should be represented and appled. The current paper proposes a step towards fllng ths gap, by defnng a prncpled semantc nference mechansm over parse-based representatons. Wthn the textual entalment settng a system s requred to recognze whether a hypotheszed statement h can be n- Copyrght c 2007, Assocaton for the Advancement of Artfcal Intellgence (www.aaa.org). All rghts reserved. ferred from an asserted text t. Overall, the task conssts of two dfferent types of nference. Some nferences can be based on avalable knowledge, such as nformaton about synonyms, paraphrases, world knowledge relatonshps etc. In the general case, however, some knowledge gaps arse and t s not possble to derve a complete proof based on avalable nference knowledge. Such stuatons are typcally handled through approxmate matchng methods. Ths paper focuses on the frst type of knowledge-based nference. We defne a proof system that operates over syntactc parse trees. New trees are derved usng entalment rules, whch provde a prncpled and unform mechansm for ncorporatng a wde varety of crtcal nference knowledge. Notably, ths approach allows easy ncorporaton of rules learned by unsupervsed methods, whch seems essental for scalng nference systems. Interpretaton nto stpulated semantc representatons, whch s often dffcult and s nherently a supervsed semantc task for learnng, s crcumvented altogether. Our overall research goal s to explore how far we can get wth such an nference approach, and dentfy the scope n whch semantc nterpretaton may not be needed. The remander of the paper presents our nference framework, the ncorporated entalment rules, whch address both generc lngustc structures and lexcal-based nferences, an ntal evaluaton that supports the proposed approach, and some comparson to related work. Inference Framework Gven two syntactcally parsed text fragments, termed text (t) and hypothess (h), the goal of the nference system (or prover) s to determne whether t entals h. The prover tres to generate h from t by applyng entalment rules that am to transform t nto h, through a sequence of ntermedate parse trees. If such a proof s found, the prover concludes that entalment holds. Lke logc-based systems, our nference framework s composed of propostons and nference rules. The propostons nclude t (the assumpton), h (the goal), and ntermedate premses nferred durng the proof. The nference (entalment) rules defne how new propostons are derved from prevously establshed ones. 871

ROOT ran VERB expletve wha when ADJ t OTHER see VERB by be Mary NOUN be VERB by PREP pcomp n John NOUN beautful ADJ yesterday NOUN Source: t raned when beautful Mary was seen by John yesterday ROOT ran VERB expletve wha when ADJ t OTHER see VERB subj Mary NOUN yesterday NOUN John NOUN beautful ADJ Derved: t raned when John saw beautful Mary yesterday (a) Passve-to-actve tree transformaton L V VERB by be N1 NOUN be VERB by PREP pcomp n V VERB subj N2 NOUN N1 NOUN R N2 NOUN (b) Passve to actve substtuton rule. The dotted arc represents algnment. Fgure 1: Applcaton of an nference rule. POS and relaton labels are based on Mnpar (Ln 1998) Propostons The general nference framework assumes that propostons are represented by some form of parse trees. In ths paper we focus on dependency tree representaton, whch s often preferred to capture drectly predcate-argument relatons (Fgure 1(a)). Nodes represent words and hold a set of features and ther values. These features nclude the word lemma and part-of-speech, and addtonal features that may be added durng the proof process. Edges are annotated wth dependency relatons. Entalment Rules At each step of the proof an entalment rule generates a derved tree d from a source tree s. A rule L R s prmarly composed of two templates, termed left-hand-sde (L), and rght-hand-sde (R). Templates are dependency subtrees whch may contan varables. Fgure 1(b) shows an entalment rule, where V, N1 and N2 are common varables shared by L and R. L specfes the subtree of s to be fed, and R specfes the new generated subtree. Rule applcaton conssts of the followng steps: L matchng The prover frst tres to match L n s. L s matched n s f there exsts a one-to-one node mappng functon f from L to s, such that: () For each node u n L, f(u) has the same features and feature values as u. Varables match any lemma value n f(u). () For each edge u v n L, there s an edge f(u) f(v) n s, wth the same dependency relaton. If matchng fals, the rule s not applcable to s. Otherwse, successful matchng nduces varable bndng b(x), for each varable X n L, defned as the full subtree rooted n f(x) f X s a leaf, and f(x) alone otherwse. We denote by l the subtree n s to whch L was mapped (as llustrated n bold n the left part of Fgure 1(a)). R nstantaton An nstantaton of R, whch we denote r, s generated n two steps: () creatng a copy of R; () replacng each varable X wth a copy of ts bndng b(x) (as set durng L matchng). In our example ths results n the subtree John saw beautful Mary. Algnment copyng Part of the rule defnton s an algnment relaton between pars of nodes n L and R that specfes whch fers n l that are not part of the rule structure need to be coped to the generated r. Formally, for any two nodes u n l and v n r whose matchng nodes n L and R are algned, we copy the daughter subtrees of u n s, whch are not already part of l, to become daughter subtrees of v n r. The bold nodes n the rght part of Fgure 1(b) correspond to r after algnment copyng. yesterday was coped to r due to the algnment of ts parent verb node. Derved tree generaton by rule type Our formalsm has two methods for generatng the derved tree: substtuton and ntroducton, as specfed by the rule type. Wth substtuton rules, the derved tree d s obtaned by makng a local fcaton to the source tree s. Except for ths fcaton s and d are dentcal (a typcal example s a lexcal rule, such as buy purchase). For ths type, d s formed by copyng s whle replacng l (and the descendants of l s 872

L ROOT V1 VERB wha when ADJ V2 VERB ROOT V2 VERB Fgure 2: Temporal clausal fer extracton (ntroducton rule) nodes) wth r. Ths s the case for the passve rule. The rght part of Fgure 1(a) shows the derved tree for the passve rule applcaton. By contrast, ntroducton rules are used to make nferences from a subtree of s, whle the other parts of s are gnored and do not effect d. A typcal example s nference of a proposton embedded as a relatve clause n s. In ths case the derved tree d s smply taken to be r. Fgure 2 presents such a rule whch enables to derve propostons that are embedded wthn temporal fers. Note that the derved tree does not depend on the man clause. Applyng ths rule to the rght part of Fgure 1(b) yelds the proposton John saw beautful Mary yesterday. Annotaton Rules Annotaton rules add features to parse tree nodes, and are used n our system to annotate negaton and alty. Annotaton rules do not have an R, but rather each node of L may contan annotaton features. If L s matched n a tree then the annotatons are coped to the matched nodes. Annotaton rules are appled to the orgnal text t, and to each nferred premse, pror to any entalment rule applcaton. Snce the annotated features would be checked durng subsequent L matchng, these addtonal features may block napproprate subsequent rule applcatons, such as for negated predcates. Template Hypotheses For many applcatons t s useful to allow the hypothess h to be a template rather than a proposton, that s, to contan varables. The varables n ths case are exstentally quantfed: t entals h f there exsts a proposton h, obtaned from h by varable nstantaton, so that t entals h. The obtaned varable nstantatons may stand for sought answers n questons or slots to be flled n relaton extracton. For example, applyng ths framework n a queston-answerng settng, the queston Who klled Kennedy? may be translated nto the hypothess X klled Kennedy. A successful proof of h from the sentence The assassnaton of Kennedy by Oswald shook the naton would nstantate X wth Oswald. R Rules for Generc Lngustc Structures Based on the above framework we have manually created a rule base for generc lngustc phenomena. The current rule base was developed under the assumpton that the hypothess h has a relatvely smple structure and s postve (nonnegated) and non-al, whch s often the case n applcatons such as queston answerng and nformaton extracton. Accordngly, the rules am to smplfy and decompose the source proposton, and to block nference from negated and al predcates. Syntactc-Based Rules These rules capture entalment nferences assocated wth common syntactc structures. The rules have three major functons: (1) smplfcaton and canonzaton of the source tree (categores 6 and 7 n Table 1); (2) extractng embedded propostons (categores 1, 2, 3); (3) nferrng propostons from non-propostonal subtrees of the source tree (category 4). Polarty-Based Rules Consder the followng two examples: John knows that Mary s here Mary s here. John beleves that Mary s here Mary s here. Vald nference of propostons embedded as verb complements depends on the verb propertes, and the polarty of the context n whch the verb appears (postve, negatve, or unknown) (Narn, Condoravd, & Karttunen. 2006). We extracted from the polarty lexcon of Narn et al. a lst of verbs for whch nference s allowed n postve polarty context, and generated entalment rules for these verbs (category 8 n Table 1). The lst was complemented wth a few reportng verbs, such as say and announce, snce nformaton n the news doman s often gven n reported speech, whle the speaker s usually consdered relable. Negaton and Modalty Annotaton Rules We use annotaton rules to mark negaton and alty of predcates (manly verbs), based on ther descendent fers. Snce annotaton rules may capture subtrees of any sze, we can use them to dentfy negaton and alty phenomena n complex subtrees where the source of the phenomenon s not n the mmedate daughter node of the predcate. Negaton rules dentfy full and contracted verbal negaton, as well as negaton mpled by certan determners and nouns. Modalty rules dentfy alty expressed by the use of al verbs such as should, as well as condtonal sentences and al adverbals. Category 9 n Table 1 llustrates a negaton rule, annotatng the verb seen for negaton due to the presence of never. Generc Default Rules Generc default rules are used to defne default behavor, n stuatons where no case-by-case rules are avalable. We used one default rule that allows removal of any fers from nodes. Desrably, specfc rules should be specfed n future work to capture more precsely many cases that are currently handled by ths default rule. Lexcal-Syntactc Rules Lexcal-Syntactc rules nclude open-class lexcal components wthn varyng syntactc structures. Accordngly these 873

# Category Example: source Example: derved 1 Conjunctons Helena s very experenced and has played a long tme Helena has played a long tme on the tour. on the tour. 2 Clausal ferserved But celebratons were muted as many Iranans ob- Many Iranans observed a Sh te mournng month. a Sh te mournng month. 3 Relatve The assalants fred sx bullets at the car, whch carred The car carred Vladmr Skobtsov. clauses Vladmr Skobtsov. 4 Appostves Frank Robnson, a one-tme manager of the Indans, has the dstncton for the NL. Frank Robnson s a one-tme manager of the Indans. 5 Determners The plantffs fled ther lawsut last year n U.S. Dstrct Court n Mam. The plantffs fled a lawsut last year n U.S. Dstrct Court n Mam. 6 Passve We have been approached by the nvestment banker. The nvestment banker approached us. 7 Gentve fer Malaysa s crude palm ol output s estmated to have rsen by up to sx percent. The crude palm ol output of Malasa s estmated to have rsen by up to sx percent. 8 Polarty Yadav was forced to resgn. Yadav resgned. 9 Negaton, alty What we ve never seen s actual costs come down. What we ve never seen s actual costs come down. ( What we ve seen s actual costs come down.) Table 1: Summary of rule base for generc lngustc structures. rules are numerous compared to the generc rules of the prevous secton, and have been acqured ether lexcographcally or automatcally (e.g. paraphrases). We ncorporated several sources of such rules. Nomnalzaton Rules Entalment rules such as X s acquston of Y X acqured Y capture the relatons between verbs and ther nomnalzatons. These rules were derved automatcally (Ron 2006) from Nomlex, a hand-coded database of Englsh nomnalzatons (Macleod et al. 1998), and from WordNet. Automatcally Learned Rules DIRT (Ln & Pantel 2001) and TEASE (Szpektor et al. 2004) are two state-of-the-art unsupervsed algorthms that learn lexcal-syntactc nference rules. 1 Some of the learned rules are lngustc paraphrases, e.g. X confrm Y X approve Y, whle others capture world knowledge, e.g. X fle lawsut aganst Y X accuse Y. These algorthms do not learn the entalment drecton, whch reduces ther accuracy when appled n any gven drecton. For each system, we consdered the top 15 b-drectonal rules learned for each template. Evaluaton As the current work s concerned wth performng exact proofs, we should evaluate ts precson over text-hypothess pars for whch a complete proof chan s found, usng the avalable rules. We note that the PASCAL RTE datasets are not sutable for ths purpose. These rather small datasets nclude many pars for whch entalment recognton requres approxmate matchng, as currently t s not realstc to assume suffcent knowledge that wll enable a complete exact proof. As an alternatve we chose a Relaton Extracton (RE) settng, for whch complete proofs can be acheved for 1 Ther output s publcly avalable at the ACLWk Textual Entalment Resources Pool. a large number of corpus sentences. In ths settng, the system needs to dentfy n sentences pars of arguments for a target semantc relaton (e.g. X buy Y). Evaluaton Process We use a sample of test template hypotheses that correspond to typcal RE relatons, such as X approve Y. We then dentfy n a large test corpus sentences from whch an nstantaton of the test hypothess s proved. For example, the sentence the budget was approved by the parlament s found to prove the nstantated hypothess parlament approve budget. Fnally, a sample of such sentences-hypothess pars are judged manually for true entalment. The process was repeated to compare dfferent system confguratons. We amed to test hypotheses that are covered by all our lexcal-syntactc resources. Snce the publcly avalable output of TEASE s much smaller than the other resources, we selected from ths resource 9 transtve verbs that may correspond to typcal RE predcates, 2 formng test templates by addng subject and ect varable nodes. For each test template h we need to dentfy n the corpus sentences from whch t s proved. To fnd effcently proof chans that generate h from corpus sentences we combned forward and backward (Breadth-Frst) search over the avalable rules. Frst, backward search s used over the lexcalsyntactc rules, startng wth rules whose rght-hand-sde s dentcal to the test template. Whle backward channg the DIRT/TEASE and nomnalzaton rules, ths process generates a set of templates t, all of them provng (dervng) h. For example, for the hypothess X approve Y we may generate the template X confrm Y, through backward applcaton of a DIRT/TEASE rule, and then further generate the template confrmaton of Y by X, through a nomnalzaton rule. Snce the templates t are generated by lexcalsyntactc rules, whch fy open-class lexcal tems, they may be consdered as lexcal expansons of h. 2 The verbs are approach, approve, consult, lead, observe, play, seek, sgn, strke. 874

# Confguraton Precson Yeld 1 BASELINE 67.0% 2,414 2 PROOF 78.5% 1,426 3 +GEN 74.8% 2,967 4 +GEN+LEXSYN 23.6% 18,809 Table 2: Emprcal evaluaton - results. Next, for each specfc t we generate a search engne query composed of the open-class words n t. Ths query fetches from the corpus canddate sentences, from whch t mght be proven usng the generc lngustc rules (recall that these rules do not fy open-class words). To that end we apply a forward search that apples the generc rules, startng from a canddate sentence s and tryng to derve t by a sequence of rule applcatons. If successful, ths process nstantates the varables n t wth the approprate varable bndngs to elements n s. Consequently, we know that, under the same varable nstantatons, h can be proved from s (snce s derves t whch n turn derves h). The above search for sentences that prove each test template was performed over the Reuters RCV1 corpus, CD#2, applyng Mnpar (Ln 1998) for parsng. Through random samplng we obtaned 30 sentences that prove each of the 9 test templates, yeldng a total of 270 pars of a sentence and an nstantated hypothess for each of the four tested confguratons (1080 pars overall). These pars were splt for entalment judgment between two human annotators. The annotators acheved, on a sample of 100 shared examples, agreement of 87%, and a Kappa value of 0.71 (correspondng to substantal agreement ). Results We tested 4 confguratons of the proof system: 1. BASELINE The baselne confguraton follows the promnent approach n graph-based entalment systems (see next secton): the system smply tres to embed the gven hypothess anywhere n the text tree, whle only alty or negaton (detected by the annotaton rules) may block embeddng. 2. PROOF: The basc confguraton of our prover. h has to be strctly generated from t, rather than embedded n t. The only nference rule avalable s the default rule for removng fers (annotaton rules are actve as n BASE- LINE). 3. +GEN: AsPROOF, plus generc lngustc rules. 4. +GEN+LEXSYN: As+GEN, plus lexcal-syntactc rules. For each system confguraton we measure precson, the percentage of examples judged as correct (entalng), and average extrapolated yeld, whch s the expected number of truly entalng sentences n the corpus that would be proved as entalng by the system. 3 We note that, smlar to IR eval- 3 The extrapolated yeld for a specfc template s calculated as the number of sample sentences judged as entalng, multpled by the samplng proporton. The average s calculated over all test templates. uatons, t s not possble to compute true recall n our settng snce the total number of entalng sentences n the corpus s not known (recall s equal to the yeld dvded by ths total). However, t s straghtforward to measure relatve recall dfferences among dfferent confguratons based on the yeld. Thus, usng these two measures estmated from a large corpus t s possble to conduct robust comparson between dfferent confguratons, and relably estmate the mpact of dfferent rule types. Such analyss s not possble wth the RTE datasets, whch are rather small, and ther hand-pcked examples do not represent the actual dstrbuton of lngustc phenomena. The results are reported n Table 2. Frst, t s observed that the requrement for exact proof rather than embeddng mproves the precson consderably over the baselne (by 11.5%), whle reducng the yeld by nearly 40%. Remarkably, usng the generc nference rules, our system s able to gan back the lost yeld n PROOF and further surpass the yeld of the baselne confguraton. In addton, a hgher precson than the baselne s obtaned (a 7.8% dfference), whch s sgnfcant at a p < 0.05 level, usng z test for proportons. Ths demonstrates that our prncpled proof approach appears to be superor to the more heurstc baselne embeddng approach, and exemplfes the contrbuton of our generc rule base. Overall, generc rules were used n 46% of the proofs. Addng the lexcal-syntactc rules the prover was able to ncrease the yeld by a factor of sx(!). Ths shows the mportance of acqurng lexcal-syntactc varablty patterns. However, the precson of DIRT and TEASE s currently qute low, causng overall low precson. Manual flterng of rules learned by these systems s currently requred n order to obtan reasonable precson. Error analyss revealed that for the thrd confguraton (+GEN), a sgnfcant 65% of the errors are due to parsng errors, most notably ncorrect dependency relaton assgnment, ncorrect POS assgnment, ncorrect argument selecton, ncorrect analyss of complex verbs (e.g. play down n the text vs. play n the hypothess) and ungrammatcal sentence fragments. Another 30% of the errors represent condtonals, negaton and alty phenomena, most of whch could be handled by addtonal rules, some makng use of more elaborate syntactc nformaton such as verb tense. The remanng, and rather small, 5% of the errors represent truly ambguous sentences whch would requre consderable world knowledge for successful analyss. Related Work Most prevous work on lexcal-syntactc entalment focused on approxmate matchng. In partcular, many works tred to drectly match or embed the hypothess wthn the text, usng tree-edt dstance or other cost functons to measure the dstance between the text and hypothess (Kouylekov & Magnn 2005; Haghgh, Ng, & Mannng 2005). Rather lmted amount of nference knowledge was utlzed (as observed at the RTE-2 Challenge), typcally to determne cost values. These mechansms do not provde a clear separaton between approxmate matchng heurstcs and justfed nferences based on avalable knowledge. 875

An ntal theoretcal proposal for an nference system based on lexcal-syntactc rules was outlned n (Dagan & Glckman 2004). The current work may be vewed as a realzaton of that general drecton. (de Salvo Braz et al. 2005) were the frst to ncorporate augmented syntactc-based entalment rules n a comprehensve entalment system. In ther system, entalment rules are appled as one of several nference mechansms, over hybrd syntactc-semantc structures called concept graphs. When the left hand sde of a rule s matched n the concept graph, the graph s augmented wth an nstantaton of the rght hand sde of the rule, creatng a complex structure whose semantcs were not fully specfed. Eventually, ther system attempts to embed the hypothess n the graph. By contrast, we presented a clearly formalzed framework whch s based solely on entalment rules, derves a sngle proposton at a tme, and fully generates the hypothess tself rather than heurstcally embeddng t n the text. (Romano et al. 2006) nvestgated smple applcaton of a small set of generc syntactc-based entalment rules together wth lexcal-syntactc entalment rules produced by TEASE. They tested ther method on a sngle relaton (proten nteracton). Whle followng ther general approach, the current work substantally extends ther prelmnary work by ntroducng a detaled nference formalsm and a much rcher spectrum of entalment rules. Fnally, none of the earler works has presented a robust component-wse evaluaton of a varety of entalment rule sources, based on samples from a large corpus. Concluson Ths paper defned a novel framework for semantc nference at the lexcal-syntactc level. Our formalsm was found sutable for descrbng a wde spectrum of entalment rules, both automatcally derved and manually created. We also presented a much-needed evaluaton methodology for ndvdual components n knowledge-based nference systems. The emprcal results demonstrate that our exact proof approach s feasble for real-world applcatons such as relaton extracton, and outperforms the more heurstc common practce of hypothess embeddng. We plan to enhance our framework to allow nference from multple sentences, as well as to ncorporate addtonal types of rules, such as lexcal rules (e.g. dog anmal). Future research wll also nvestgate ntegraton of the proof system wth dfferent methods for approxmate matchng, whch would enable ts applcaton n addtonal settngs. Acknowledgments Ths work was partally supported by ISF grant 1095/05, the IST Programme of the European Communty under the PASCAL Network of Excellence IST-2002-506778 and the Israel Internet Assocaton (ISOC-IL), grant 9022. We are grateful to Cleo Condoravd for makng the polarty lexcon developed at PARC avalable for ths research. We also thank Dan Roth and Idan Szpektor for helpful dscussons. References Bar-Ham, R.; Dagan, I.; Dolan, B.; Ferro, L.; Gampccolo, D.; Magnn, B.; and Szpektor, I. 2006. The Second PASCAL Recognsng Textual Entalment Challenge. In Second PASCAL Challenges Workshop on Recognzng Textual Entalment. Bos, J., and Markert, K. 2006. When logcal nference helps determnng textual entalment (and when t doesn t). In Second PASCAL Challenges Workshop on Recognzng Textual Entalment. Dagan, I., and Glckman, O. 2004. Probablstc textual entalment: Generc appled elng of language varablty. PASCAL workshop on Text Understandng and Mnng. Dagan, I.; Glckman, O.; and Magnn, B. 2006. The PASCAL Recognsng Textual Entalment Challenge. In Quñonero-Candela et al., ed., MLCW 2005, LNAI Volume 3944. Sprnger-Verlag. 177 190. de Salvo Braz, R.; Grju, R.; Punyakanok, V.; Roth, D.; and Sammons, M. 2005. An nference el for semantc entalment n natural language. In AAAI, 1043 1049. Haghgh, A. D.; Ng, A. Y.; and Mannng, C. D. 2005. Robust textual nference va graph matchng. In Proceedngs of EMNLP 2005, 387 394. Kouylekov, M., and Magnn, B. 2005. Tree edt dstance for textual entalment. In Recent Advances n Natural Language Processng (RANLP). Ln, D., and Pantel, P. 2001. Dscovery of nference rules for queston answerng. Natural Language Engneerng 4(7):343 360. Ln, D. 1998. Dependency-based evaluaton of mnpar. In Proceedngs of the Workshop on Evaluaton of Parsng Systems at LREC 1998. Macleod, C.; Grshman, R.; Meyers, A.; Barrett, L.; and Reeves, R. 1998. Nomlex: A lexcon of nomnalzatons. In EURALEX. Narn, R.; Condoravd, C.; and Karttunen., L. 2006. Computng relatve polarty for textual nference. In Proceedngs of Internatonal workshop on Inference n Computatonal Semantcs (ICoS-5). Rana, R.; Ng, A. Y.; and Mannng, C. D. 2005. Robust textual nference va learnng and abductve reasonng. In AAAI, 1099 1105. Romano, L.; Kouylekov, M.; Szpektor, I.; Dagan, I.; and Lavell, A. 2006. Investgatng a generc paraphrase-based approach for relaton extracton. In Proceedngs of EACL 2006, 409 416. Ron, T. 2006. Generatng entalment rules based on onlne lexcal resources. Master s thess, Computer Scence Department, Bar-Ilan Unversty, Ramat-Gan, Israel. Szpektor, I.; Tanev, H.; Dagan, I.; and Coppola, B. 2004. Scalng web-based acquston of entalment relatons. In Proceedngs of EMNLP 2004, 41 48. Tatu, M., and Moldovan, D. 2006. A logc-based semantc approach to recognzng textual entalment. In Proceedngs of the COLING/ACL 2006 Man Conference Poster Sessons, 819 826. 876