Update on Soar-based language processing

Similar documents
Argument structure and theta roles

Approaches to control phenomena handout Obligatory control and morphological case: Icelandic and Basque

An Introduction to the Minimalist Program

Constraining X-Bar: Theta Theory

Control and Boundedness

Pseudo-Passives as Adjectival Passives

Korean ECM Constructions and Cyclic Linearization

Developing a TT-MCTAG for German with an RCG-based Parser

UCLA UCLA Electronic Theses and Dissertations

Some Principles of Automated Natural Language Information Extraction

SOME MINIMAL NOTES ON MINIMALISM *

Prediction of Maximal Projection for Semantic Role Labeling

Basic Syntax. Doug Arnold We review some basic grammatical ideas and terminology, and look at some common constructions in English.

The presence of interpretable but ungrammatical sentences corresponds to mismatches between interpretive and productive parsing.

Universal Grammar 2. Universal Grammar 1. Forms and functions 1. Universal Grammar 3. Conceptual and surface structure of complex clauses

Minimalism is the name of the predominant approach in generative linguistics today. It was first

Derivations (MP) and Evaluations (OT) *

Inleiding Taalkunde. Docent: Paola Monachesi. Blok 4, 2001/ Syntax 2. 2 Phrases and constituent structure 2. 3 A minigrammar of Italian 3

The subject of adjectives: Syntactic position and semantic interpretation

Theoretical Syntax Winter Answers to practice problems

5 Minimalism and Optimality Theory

11/29/2010. Statistical Parsing. Statistical Parsing. Simple PCFG for ATIS English. Syntactic Disambiguation

Chapter 3: Semi-lexical categories. nor truly functional. As Corver and van Riemsdijk rightly point out, There is more

Interfacing Phonology with LFG

Parsing of part-of-speech tagged Assamese Texts

A Minimalist Approach to Code-Switching. In the field of linguistics, the topic of bilingualism is a broad one. There are many

AQUA: An Ontology-Driven Question Answering System

Syntax Parsing 1. Grammars and parsing 2. Top-down and bottom-up parsing 3. Chart parsers 4. Bottom-up chart parsing 5. The Earley Algorithm

Proof Theory for Syntacticians

1/20 idea. We ll spend an extra hour on 1/21. based on assigned readings. so you ll be ready to discuss them in class

Modeling Attachment Decisions with a Probabilistic Parser: The Case of Head Final Structures

The semantics of case *

CS 598 Natural Language Processing

The Strong Minimalist Thesis and Bounded Optimality

Lexical Categories and the Projection of Argument Structure

THE INTERNATIONAL JOURNAL OF HUMANITIES & SOCIAL STUDIES

Compositional Semantics

Adapting Stochastic Output for Rule-Based Semantics

On the Notion Determiner

On Labeling: Principle C and Head Movement

Specification and Evaluation of Machine Translation Toy Systems - Criteria for laboratory assignments

Applications of memory-based natural language processing

Multiple case assignment and the English pseudo-passive *

Citation for published version (APA): Veenstra, M. J. A. (1998). Formalizing the minimalist program Groningen: s.n.

Hindi Aspectual Verb Complexes

Introduction to HPSG. Introduction. Historical Overview. The HPSG architecture. Signature. Linguistic Objects. Descriptions.

Underlying and Surface Grammatical Relations in Greek consider

cambridge occasional papers in linguistics Volume 8, Article 3: 41 55, 2015 ISSN

Towards a Machine-Learning Architecture for Lexical Functional Grammar Parsing. Grzegorz Chrupa la

Authors note Chapter One Why Simpler Syntax? 1.1. Different notions of simplicity

Heads and history NIGEL VINCENT & KERSTI BÖRJARS The University of Manchester

Hindi-Urdu Phrase Structure Annotation

Which verb classes and why? Research questions: Semantic Basis Hypothesis (SBH) What verb classes? Why the truth of the SBH matters

CAS LX 522 Syntax I. Long-distance wh-movement. Long distance wh-movement. Islands. Islands. Locality. NP Sea. NP Sea

Building an HPSG-based Indonesian Resource Grammar (INDRA)

ENGBG1 ENGBL1 Campus Linguistics. Meeting 2. Chapter 7 (Morphology) and chapter 9 (Syntax) Pia Sundqvist

LTAG-spinal and the Treebank

AN EXPERIMENTAL APPROACH TO NEW AND OLD INFORMATION IN TURKISH LOCATIVES AND EXISTENTIALS

Agree or Move? On Partial Control Anna Snarska, Adam Mickiewicz University

THE INDONESIAN JOURNAL OF LANGUAGE AND LANGUAGE TEACHING

Knowledge-Based - Systems

Chapter 4: Valence & Agreement CSLI Publications

Derivational: Inflectional: In a fit of rage the soldiers attacked them both that week, but lost the fight.

An Interactive Intelligent Language Tutor Over The Internet

The MEANING Multilingual Central Repository

Dissertation Summaries. The Acquisition of Aspect and Motion Verbs in the Native Language (Aristotle University of Thessaloniki, 2014)

The optimal placement of up and ab A comparison 1

18 The syntax phonology interface

Som and Optimality Theory

A Computational Evaluation of Case-Assignment Algorithms

Construction Grammar. University of Jena.

Accurate Unlexicalized Parsing for Modern Hebrew

The Structure of Relative Clauses in Maay Maay By Elly Zimmer

Intension, Attitude, and Tense Annotation in a High-Fidelity Semantic Representation

cmp-lg/ Jul 1995

Case government vs Case agreement: modelling Modern Greek case attraction phenomena in LFG

LIN 6520 Syntax 2 T 5-6, Th 6 CBD 234

The Inclusiveness Condition in Survive-minimalism

Grammars & Parsing, Part 1:

The Real-Time Status of Island Phenomena *

Intra-talker Variation: Audience Design Factors Affecting Lexical Selections

Word Formation is Syntactic: Raising in Nominalizations

Enhancing Unlexicalized Parsing Performance using a Wide Coverage Lexicon, Fuzzy Tag-set Mapping, and EM-HMM-based Lexical Probabilities

Informatics 2A: Language Complexity and the. Inf2A: Chomsky Hierarchy

Language Acquisition Fall 2010/Winter Lexical Categories. Afra Alishahi, Heiner Drenhaus

Specifying Logic Programs in Controlled Natural Language

SEMAFOR: Frame Argument Resolution with Log-Linear Models

Syntactic Ambiguity Resolution in Sentence Processing: New Evidence from a Morphologically Rich Language

Basic Parsing with Context-Free Grammars. Some slides adapted from Julia Hirschberg and Dan Jurafsky 1

Context Free Grammars. Many slides from Michael Collins

Words come in categories

The Interface between Phrasal and Functional Constraints

Tibor Kiss Reconstituting Grammar: Hagit Borer's Exoskeletal Syntax 1

Beyond the Pipeline: Discrete Optimization in NLP

Chunk Parsing for Base Noun Phrases using Regular Expressions. Let s first let the variable s0 be the sentence tree of the first sentence.

Impact of Controlled Language on Translation Quality and Post-editing in a Statistical Machine Translation Environment

Dependency, licensing and the nature of grammatical relations *

Program Matrix - Reading English 6-12 (DOE Code 398) University of Florida. Reading

Grounding Language for Interactive Task Learning

The Structure of Multiple Complements to V

Transcription:

Update on Soar-based language processing Deryle Lonsdale (and the rest of the BYU NL-Soar Research Group) BYU Linguistics lonz@byu.edu Soar 2006 1

NL-Soar Soar 2006 2

NL-Soar developments Discourse/robotic dialogue ICSLP DoD BRIMS poster Running on Soar 8.5.2 Some NLG chunking issues remain Having trouble getting to 8.6.1 Fresh start with 8.6.2... Soar 2006 3

LG-Soar Soar 2006 4

Overview Link-Grammar Soar Implements syntactic, shallow semantic processing Used for information extraction Components Soar architecture Link Grammar parser Discourse Representation Theory for discourse modeling Discussed in Soar 21, Soar 22 Soar 2006 5

LG-Soar developments Predicate extraction in biomedical texts domain (www.clinicaltrials.gov) NLDB 2006 Two stages: Identify and extract predicate logic forms from medical clinical trial (in)eligibility criteria Match up the information with other data, e.g. patients medical records Result: tool for helping match patients with clinical trials Soar 2006 6

XNL-Soar Soar 2006 7

Our goals Integrate the MP into a cognitive modeling engine Explore language/task-integrations using the MP Test cross-linguistic implementation possibilities with the MP Ultimately, determine whether the MP supports incremental, operator-based processing Soar 2006 8

Our approach Map the syntactic parsing onto operators Integrate external knowledge sources Strengths: We have already done this for NL-Soar The MP has an operator-like feel to it Weaknesses: MP lit sketchy on incremental parsing External knowledge sources incommensurable Our scant knowledge of human performance data Soar 2006 9

Derivational principles Minimalist Principles (Chomsky 1995) Merge Move Hierarchy of Projections (Adger 2003) Nominal: D > (Poss) > n > N Clausal: C > T > (Neg) > (Perf) > (Prog) > (Pass) > v > V Governed by features Strong and weak features NP, VP symmetry including shells Soar 2006 10

Operators and operator types XNL-Soar op types and functions: Lexical access: retrieve & store lexicallyrelated information Merge: construct syntax via MP-specified merge operations Movehead: perform head-to-head movement (via adjunction) HoP: consult hierarchy of projections, return next possible target level Project: create bare-structure maximal projection from lexical item Soar 2006 11

Nominal projections DP s NP Shells Feature checking HoP for nominals Projection of N to NP Bare phrase structure for lexical heads Operators: Project, Merge1, Merge2, Check-Root, and HoP Soar 2006 12

Projection of a DP Soar 2006 13

Verbal projections VP Shells Theta roles & the LCS lexicon ucat grids The HoP for the clause Operators: Merge1, Merge2, Check-root, HoP. Soar 2006 14

Merge 1 st Merge Complement merge based on ucat features 2 nd Merge Specifier merge based on a second ucat feature Soar 2006 15

Projection of a VP: HoP & Merge Soar 2006 16

Move Governed by strong ucat features Two types of movement Head-head movement Creates new structure at the head level Phrasal movement Operators: Copy, Hadjunction Soar 2006 17

External knowledge sources WordNet 2.0 (wordnet.princeton.edu) Lexical semantics: part-of-speech, word senses, subcategorization Inflectional and derivational morphology English LCS lexicon (www.umiacs.umd.edu/~bonnie/verbs-english.lcs) Thematic information: θ-grids, θ-roles Used to derive uninterpretable features Triggers syntactic construction Aligned with WordNet information Soar 2006 18

English LCS lexicon data 10.6.a#1#_ag_th,modposs(of)#exonerate#exonerate#exonerate#exonerate+ed# (2.0,00874318_exonerate%2:32:00::) 10.6.a Verbs of Possessional Deprivation: Cheat Verbs/-of WORDS (absolve acquit balk bereave bilk bleed burgle cheat cleanse con cull cure defraud denude deplete depopulate deprive despoil disabuse disarm disencumber dispossess divest drain ease exonerate fleece free gull milk mulct pardon plunder purge purify ransack relieve render rid rifle rob sap strip swindle unburden void wean) ((1 "_ag_th,mod-poss()") (1 "_ag_th,mod-poss(from)") (1 "_ag_th,mod-poss(of)")) "He!!+ed the people (of their rights); He!!+ed him of his sins" Soar 2006 19

Similar Work Incremental parsing in general (Phillips 03) Other linguistic theories for incremental parsing GB (Kolb 1991) Dependency grammar (Milward 1994, Ait- Mokhtar et al. 2002) Categorial Grammar (Izuo 2004) Finite-state methods (Ait-Mokhtar & Chanod 1997) Soar 2006 20

Similar Work Minimalist parsing in other frameworks (Stabler 1997, Harkema 2001) Thematic information and parsing (Schlesewsky & Bornkessel 2004) Crosslinguistic considerations in incremental parsing (Schneider 2000) Human studies on ambiguity, reanalysis Eye tracking (Kamide, Altmann, & Haywood 03) ERP (Bornkessel, Schlesewsky, & Friederici 03) Soar 2006 21

Soar 2006 22

Soar 2006 23

Soar 2006 24

Soar 2006 25

Building a full sentence agent> init-soar agent> r 0: ==>S: S1 2: O: O1 (getword) Input a word: exonerated 4: O: O2 (getword) Input a word: defendants 5: O: O3 (project) --> NP 7: O: O5 (hop) 8: O: O8 (merge1) 10: O: O9 (merge2) --> np 12: O: O14 (hadjoin) --> move N 13: O: O13 (hop) 14: O: O16 (merge1) 16: O: O17 (merge2) --> DP 18: O: O21 (merge1) 20: O: O23 (merge2) --> VP 22: O: O27 (hop) 23: O: O30 (merge1) 25: O: O31 (merge2) --> vp 27: O: O36 (hadjoin) --> move V 28: O: O35 (hop) 29: O: O38 (merge1) 31: O: O39 (merge2) --> TP Soar 2006 26

Current status POC for fundamental syntactic structures Basic sentence types (transitives, unergatives, unaccusatives) All functional and lexical projections in syntactic structure Most feature percolation, feature checking Current system: about 60 productions (cf. 3500 NL-Soar) External knowledge sources: interfaced via 1000+ lines of Tcl/Perl Soar 2006 27

Issue Find a balance between generation and parsing Most MP descriptions are generative, not recognitional in focus Is it advisable and well motivated to undo or reverse movements? If not, is generate-and-test the right mechanism for parsing input? What are the implications for learning and bootstrapping language capabilities (e.g. parsing in the service of generation)? Soar 2006 28

Future functionality XP adjunction Assigners/receivers set? Wider coverage of complex constructions Ditransitives, resultatives, causatives, etc. More semantics/deeper semantics. Quantifier raising Scopal relationships C-command and other interpretive mechanisms More detailed LCS structures Web-based Minimalist Parser grapher Soar 2006 29

Future applications: cf. NL-Soar Explore human parser robustness, processing of ambiguity, learning Integrate syntax/semantics into discourse/conversation component Bootstrapping: parsing and generation Develop human-agent & agent-agent comm Parameterize XNL-Soar for processing of other languages besides English Model cognition in reading Model real-time language/task integrations Soar 2006 30

Conclusion Coals Performance? MP not fully explored Reconciling disparate lexical resources is nontrivial (WordNet + LCS) Redoing learning in Soar8 Graphing is more complicated Nuggets Better coverage (Engl. & crosslinguistically) New start in Soar8 State-of-the-art syntax Interest: CUNY Sentence Processing, CogSci Soar 2006 31