The building blocks of HPSG grammars. Head-Driven Phrase Structure Grammar (HPSG) HPSG grammars from a linguistic perspective

Similar documents
Introduction to HPSG. Introduction. Historical Overview. The HPSG architecture. Signature. Linguistic Objects. Descriptions.

Chapter 4: Valence & Agreement CSLI Publications

On the Notion Determiner

Case government vs Case agreement: modelling Modern Greek case attraction phenomena in LFG

A Minimalist Approach to Code-Switching. In the field of linguistics, the topic of bilingualism is a broad one. There are many

Basic Syntax. Doug Arnold We review some basic grammatical ideas and terminology, and look at some common constructions in English.

Feature-Based Grammar

"f TOPIC =T COMP COMP... OBJ

Approaches to control phenomena handout Obligatory control and morphological case: Icelandic and Basque

CS 598 Natural Language Processing

Intra-talker Variation: Audience Design Factors Affecting Lexical Selections

Structure-Preserving Extraction without Traces

Proof Theory for Syntacticians

Hindi Aspectual Verb Complexes

Syntax Parsing 1. Grammars and parsing 2. Top-down and bottom-up parsing 3. Chart parsers 4. Bottom-up chart parsing 5. The Earley Algorithm

Underlying and Surface Grammatical Relations in Greek consider

Control and Boundedness

Construction Grammar. University of Jena.

Citation for published version (APA): Veenstra, M. J. A. (1998). Formalizing the minimalist program Groningen: s.n.

1/20 idea. We ll spend an extra hour on 1/21. based on assigned readings. so you ll be ready to discuss them in class

cmp-lg/ Jul 1995

The presence of interpretable but ungrammatical sentences corresponds to mismatches between interpretive and productive parsing.

The Interface between Phrasal and Functional Constraints

Saturday, August 8, :30 a.m. Fox Theater Pomona 301 South Garey Avenue Pomona, California

Compositional Semantics

Constraining X-Bar: Theta Theory

An Interactive Intelligent Language Tutor Over The Internet

Developing a TT-MCTAG for German with an RCG-based Parser

Parsing of part-of-speech tagged Assamese Texts

THE INTERNATIONAL JOURNAL OF HUMANITIES & SOCIAL STUDIES

Context Free Grammars. Many slides from Michael Collins

Switched Control and other 'uncontrolled' cases of obligatory control

Specifying Logic Programs in Controlled Natural Language

Towards a Machine-Learning Architecture for Lexical Functional Grammar Parsing. Grzegorz Chrupa la

Advanced Topics in HPSG

Adapting Stochastic Output for Rule-Based Semantics

Objectives. Chapter 2: The Representation of Knowledge. Expert Systems: Principles and Programming, Fourth Edition

Dependency, licensing and the nature of grammatical relations *

Inleiding Taalkunde. Docent: Paola Monachesi. Blok 4, 2001/ Syntax 2. 2 Phrases and constituent structure 2. 3 A minigrammar of Italian 3

A relational approach to translation

Today we examine the distribution of infinitival clauses, which can be

ENGBG1 ENGBL1 Campus Linguistics. Meeting 2. Chapter 7 (Morphology) and chapter 9 (Syntax) Pia Sundqvist

Chapter 3: Semi-lexical categories. nor truly functional. As Corver and van Riemsdijk rightly point out, There is more

Universal Grammar 2. Universal Grammar 1. Forms and functions 1. Universal Grammar 3. Conceptual and surface structure of complex clauses

The Discourse Anaphoric Properties of Connectives

MODELING DEPENDENCY GRAMMAR WITH RESTRICTED CONSTRAINTS. Ingo Schröder Wolfgang Menzel Kilian Foth Michael Schulz * Résumé - Abstract

Informatics 2A: Language Complexity and the. Inf2A: Chomsky Hierarchy

Som and Optimality Theory

The Strong Minimalist Thesis and Bounded Optimality

Construction Grammar. Laura A. Michaelis.

Type-driven semantic interpretation and feature dependencies in R-LFG

Grammars & Parsing, Part 1:

Basic Parsing with Context-Free Grammars. Some slides adapted from Julia Hirschberg and Dan Jurafsky 1

a) analyse sentences, so you know what s going on and how to use that information to help you find the answer.

AQUA: An Ontology-Driven Question Answering System

Ch VI- SENTENCE PATTERNS.

Heads and history NIGEL VINCENT & KERSTI BÖRJARS The University of Manchester

The Pennsylvania State University. The Graduate School. College of the Liberal Arts THE TEACHABILITY HYPOTHESIS AND CONCEPT-BASED INSTRUCTION

BULATS A2 WORDLIST 2

Words come in categories

A Computational Evaluation of Case-Assignment Algorithms

11/29/2010. Statistical Parsing. Statistical Parsing. Simple PCFG for ATIS English. Syntactic Disambiguation

Accurate Unlexicalized Parsing for Modern Hebrew

Enhancing Unlexicalized Parsing Performance using a Wide Coverage Lexicon, Fuzzy Tag-set Mapping, and EM-HMM-based Lexical Probabilities

Pre-Processing MRSes

Developing Grammar in Context

Some Principles of Automated Natural Language Information Extraction

AN ANALYSIS OF GRAMMTICAL ERRORS MADE BY THE SECOND YEAR STUDENTS OF SMAN 5 PADANG IN WRITING PAST EXPERIENCES

Constructions with Lexical Integrity *

Implementing the Syntax of Japanese Numeral Classifiers

An Introduction to the Minimalist Program

Multiple case assignment and the English pseudo-passive *

Pseudo-Passives as Adjectival Passives

Constructions License Verb Frames

AN EXPERIMENTAL APPROACH TO NEW AND OLD INFORMATION IN TURKISH LOCATIVES AND EXISTENTIALS

6.863J Natural Language Processing Lecture 12: Featured attraction. Instructor: Robert C. Berwick

Frequency and pragmatically unmarked word order *

Using dialogue context to improve parsing performance in dialogue systems

LFG Semantics via Constraints

Derivational and Inflectional Morphemes in Pak-Pak Language

Prediction of Maximal Projection for Semantic Role Labeling

Which verb classes and why? Research questions: Semantic Basis Hypothesis (SBH) What verb classes? Why the truth of the SBH matters

In Udmurt (Uralic, Russia) possessors bear genitive case except in accusative DPs where they receive ablative case.

Character Stream Parsing of Mixed-lingual Text

LING 329 : MORPHOLOGY

Minimalism is the name of the predominant approach in generative linguistics today. It was first

Theoretical Syntax Winter Answers to practice problems

Word Formation is Syntactic: Raising in Nominalizations

SEMAFOR: Frame Argument Resolution with Log-Linear Models

EAGLE: an Error-Annotated Corpus of Beginning Learner German

Emmaus Lutheran School English Language Arts Curriculum

The Inclusiveness Condition in Survive-minimalism

Language acquisition: acquiring some aspects of syntax.

LTAG-spinal and the Treebank

THE VERB ARGUMENT BROWSER

CAS LX 522 Syntax I. Long-distance wh-movement. Long distance wh-movement. Islands. Islands. Locality. NP Sea. NP Sea

Using a Native Language Reference Grammar as a Language Learning Tool

Negative Concord in Romanian as Polyadic Quantification

Procedia - Social and Behavioral Sciences 154 ( 2014 )

Modeling full form lexica for Arabic

Transcription:

Te building blocks of HPSG grammars Head-Driven Prase Structure Grammar (HPSG) In HPSG, sentences, s, prases, and multisentence discourses are all represented as signs = complexes of ponological, syntactic/semantic, and discourse information. L1: Alternative Syntactic Teories Spring 010 We can (and will) view HPSG grammars in two different ways: 1. From a linguistic perspective. From a formal perspective Historical note: HPSG is based on Generalized Prase Structure Grammar (GPSG) (Gazdar et al. 198) 1 HPSG grammars from a linguistic perspective HPSG (typed) feature structures From a linguistic perspective, an HPSG grammar consists of a) a lexicon licensing basic s (wic are temselves complex objects) b) lexical rules licensing derived s c) immediate dominance (id) scemata licensing constituent structure d) linear precedence (lp) statements constraining order e) a set of grammatical principles expressing generalizations about linguistic objects HPSG is nonderivational, but in some sense, HPSG as several different levels (layers of features) A feature structure is a directed acyclic grap (DAG), wit arcs representing features going between values Eac of tese feature values is itself a complex object: Te type sign as te features pon and appropriate for it Te feature as a value of type Tis type itself as relevant features ( and non-) Skeleton of a typed feature structure Abbreviated skeleton In attribute-value matrix (AVM) form, ere is te skeleton of an object: sign pon list pon non-non- list sign Tings are often abbreviated wen written down (altoug, te object itself still contains te same tings): pon list pon loc loc non-locnon-loc list sign

An example tree Let s walk troug an example to illustrate ow feature structures can be used, starting wit tis rater impoverised tree: se drinks wine c» pon <se> 1 Example tree wit feature structures subcat D E 1 c» pon <drinks> pon <wine> D E subcat 1, 8 Some tings to note about te tree Prase structure grammar? Ponology (pon) is kept separate from syntax and semantics (), allowing different processes to operate on tem We say tat drinks is a finite verb by specifying its type (verb) and tat te value of its vform feature is fin We ave some way to say tat parts of te tree sare identical information, e.g., tat a VP and its daugter V ave many of te same properties () We uses lists to encode subcategorization information, and tese items are identified wit elements in te tree note, too, ow selection is kept Even toug it is called Head-driven Prase Structure Grammar, te name is a misnomer Noting about te formalism forces you to use PS trees In fact, tecnically, tere are no trees as suc, only features wic encode objects akin to trees Types license particular scemata (e.g., -comps-struc), and a list keeps track of te constituent daugters For ease of representation, we often display tings in trees But te example two slides back is more accurately represented as on te next slide 9 10 prase s l c -comps-struc * + pon <se> comp- 1 prase s l c D E subcat 1 -comps-struc -dtr pon <drinks> -dtr s l c D E subcat 1, * + pon <wine> comp- 11 Lexicalized grammar How do we start deriving suc complex representations? One tenet of HPSG (akin to LFG) is tat te lexicon contains complex representations of s So, wen s are built into prases, we ave all tis information at our ands We can see tis on te lexical entry on te next page, taken from Levine and Meurers (00): For example, we can see tat eac relates its syntactic argument structure (valence) wit its semantics () 1

Lexical entry for put pon <put> category verb aux - valence subj D NP 1 E comps DNP, PP E put-relation putter 1 ting-put destination 1 Capturing dependencies A grammatical framework needs to be able to capture te different grammatical dependencies of natural languages (cf. Levine and Meurers 00, p. ) Local dependencies: limited syntactic domain and largely lexical in nature Non- dependencies: arbitrarily large syntactic domain and independent of lexicon HPSG seems well-suited for tis 1 Local dependencies As wit te oter frameworks we ve looked at, HPSG deals wit dependencies via te selectional properties of lexical s (-driven) For example: Raising verbs select for an argument wit wic tey sare a subject Control (or equi) verbs select for an argument wic as a co-indexed subject 1 Raising verb example pon <seem> cat val subj 1 comps * loc cat val subj 1 comps cont + seem arg 1 Control/Equi verb example pon <try> cat val subj DNP 1 E comps * loc cat val subj DNP 1 E comps cont + try tryer 1 tried 1 Non- dependencies Instead of using transformations, HPSG analyzes unbounded dependency constructions (UDCs) by linking a filler wit a gap Analysis relies on te feature slas Te general idea is: Trace lexical entry puts its s into a non- slas set Tis information is sared among te nodes in a tree Wen te filler is realized, te information is removed from te slas set 18

HPSG grammars from a formal perspective Te signature As wit oter frameworks we ve examined, HPSG sets out to model te domain: Models of empirically observable objects need to be establised, and Teories need to constrain wic models actually exist. Tus, from a formal perspective, an HPSG grammar consists of te signature as declaration of te domain, and te teory constraining te domain. defines te ontology ( declaration of wat exists ): wic kind of objects are distinguised, and wic properties of wic objects are modeled. consists of te type (or sort) ierarcy and te appropriateness conditions, defining wic type as wic appropriate attributes (or features) wit wic appropriate values. Some atomic types ave no feature appropriate for tem 19 0 Example excerpt of a signature Sort-resolved Here, we leave out te appropriateness conditions and just sow a ierarcy of types object Based on te example signature, equivalent: (1) a. func b. marker determiner te following two descriptions are boolean + subst(antive) func(tional) Tat is, a type (or sort) is really a disjunction of its maximally specific subtypes noun verb... marker determiner 1 Models of linguistic objects Structure saring As mentioned, te objects are modelled by feature structures, wic are depicted as directed graps. Since tese models represent objects in te world (and not knowledge about te world), tey are total wit respect to te ontology declared in te signature. Tecnically, one says tat tese feature structures are totally well-typed: Every node as all te attributes appropriate for its type and eac attribute as an appropriate value. Note tat tis is different from LFG. sort-resolved: Every node is of a maximally specific type. Te main explanatory mecanism in HPSG is tat of structure-saring, equating two features as aving te exact same value (token-identical) pon <walks> D E cat subcat NP[nom]1[rd,sing] loc walk walker 1 Te index of te np on te subcat list is said to unify wit te value of lauger

Descriptions Descriptions (cont.) A description language and its abbreviating attribute-value matrix (AVM) notation is used to talk about sets of objects. Descriptions consists of tree building blocks: Type decriptions single out all objects of a particular type, e.g., Attribute-value pairs describe objects tat ave a particular property. Te attribute must be appropriate for te particular type of object, and te value can be any kind of description, e.g., ii spouse name mary Tags (structure saring) to specify token identity, e.g. 1 Complex descriptions are obtained by combining descriptions wit te elp of conjunction ( ), disjunction () and negation ( ). In te AVM notation, conjunction is implicit. A teory (in te formal sense) is a set of description language statements, often referred to as te constraints. Te teory singles out a subset of te objects declared in te signature, namely tose wic are grammatical. A linguistic object is admissible wit respect to a teory iff it satisfies eac of te descriptions in te teory and so does eac of its substructures. Description example A verb, for example, can specify tat its subject be masculine singular (as Russian past tense verbs do): () a. Ya spal. I masc.sg slept masc.sg b. On spal. He masc.sg () On te verb s subj list: slept masc.sg cat noun loc index num sing gen masc Subsumption Feature structure descriptions ave subsumption relations between tem. A more general description subsumes a more specific one. A more general description usually means tat less features are specified. Tis doesn t specify te entire (totally well-typed) feature structure, just wat needs to be true in te feature structure. 8 Subsumption example HPSG from a linguistic perspective (again) Te description in () is said to subsume bot of te following more specific (partial) feature structures: cat noun () a. loc per 1st index num sing gen masc cat noun b. loc per rd index num sing gen masc Now tat we ave tese feature structures, ow do we use tem for linguistic purposes? 1. Specify a signature/ontology wic allows us to make linguisticallyrelevant distinctions and puts appropriate features in te appropriate places. Specify a teory wic constrains tat signature for a particular language Lexicon specifies eac and te different properties tat it as Tere can also be relations (so-called lexical rules) between s in te lexicon Prasal rules, or principles, allow s to combine into prases 9 0

A tour of Pollard and Sag (199) An ontology of linguistic objects We ll start wit te signature and teory from Pollard and Sag (199). In te next series of slides, you sould: begin to understand wat everyting means sign pon list(ponstring)» prase constituent-structure begin to understand te connection between linguistic teory and its formalization in HPSG begin to gain an appreciation for a completely worked-out teory non- non- category category context context category subcat list() 1 Wy te complicated structure? Part-of-speec ( information) & non: Most linguistic constructions can be andled ly, but non- constructions (e.g., extraction) require different mecanisms category,, and context: rougly, tese correspond to syntactic, semantic, and pragmatic notions, all of wic are ly determined and subcat: a s syntactic information comes in two parts: its own lexical information (part of speec, etc.) and information about its arguments» functional marker determiner adjective verb vform vform aux boolean inv boolean substantive prd boolean»» noun preposition case case pform pform Properties of particular part-of-speec Wat subcat does vform finite infinitive base gerund present-part. past-part. passive-part. Te subcat list can be tougt of as akin to a s valency requirements Items on te subcat list are ordered by obliqueness akin to LFG not necessarily by linear order nominative case accusative pform of to Te subcat Principle, described below, will describe a way for a to combine wit its arguments Tat is, we will still need a way to go from te subcat specification to some sort of tree structure NB: Here, we will use a single subcat list, but later we will switc to a valence feature, wic contains bot a subj and comps list

Locality of subcat information subcat selects a list of values, not sign values. If you work troug te ontology, tis means tat a does not ave access to te list of items on its own subcat list Intuitively, tis means tat a cannot dictate properties of te daugters of its daugters. Constructions are tus restricted to relations Te feature specifies different semantic information A feature appropriate for nominal-object objects (a subtype of objects) is index (as sown on te next slide) Agreement features can be stated troug te index feature Note tat case was put somewere else (witin ), so case agreement is treated differently tan person, number, and gender agreement (at least in Englis) 8 Semantic representations Indices psoa nom-obj index index restriction set(psoa) index person person number number gender gender» laug give drink lauger ref tink giver ref drinker reftinker ref given ref drunken ref tougt psoa gift ref person referential tere it number gender first second tird singular plural masculine feminine neuter 9 0 Auxiliary data structures Abbreviations for describing lists Before we move on to some linguistic examples, a few oter objects need to be defined boolean true false list empty-list non-empty-list tail list empty-list is abbreviated as e-list, <> non-empty-list is abbreviated as ne-list» 1 tail is abbreviated as D E 1 is abbreviated as 1» is abbreviated as tail tail D E 1 D E 1 D E 1, Attention: D E D and 1E describe all lists of lengt one! 1

Abbreviations of common AVMs Pollard and Sag (199) use some abbreviations to describe objects: Abbreviation NP 1 S:1 VP:1 Abbreviated AVM» noun category index 1» category 1 category D E subcat 1 Te Lexicon Te basic lexicon is defined by te Word Principle as part of te teory. It defines wic of te ontologically possible s are grammatical: lexical-entry 1 lexical-entry wit eac of te lexical entries being descriptions, suc as e.g.: pon <laugs> cat D E loc subcat NP[nom]1[rd,sing] laug lauger 1 An example lexicon pon <gives> cat D E subcat NP[nom]1[sing], NP[acc], PP[to] s l give giver 1 cont gift given pon <drinks> cat D E s l subcat NP[nom]1[rd,sing], NP[acc] drink cont drinker 1 drunken pon <drink> cat D E s l subcat NP[nom]1[plur], NP[acc] drink cont drinker 1 drunken pon <se> cat loc cont index noun case nom per tird num sing pon <wine> noun cat loc per tird cont index num sing pon <to> cat preposition pform to s l E subcat DNP[acc] 1 i cont index 1 pon <tink> cat D s l subcat NP[nom]1[plur], E S[fin]: tink cont tinker 1 tougt 8

A very first sketc of an example Types of prases Here s tat impoverised tree again: se drinks wine We re going to see ow te teory licenses tis structure... In order to put s from our lexicon into a sentence, we ave to define wat makes an acceptable sentence structure Eac prase as a attribute (s do not ave tis attribute), wic as a constituent-structure value Tis value loosely corresponds to wat we normally view in a tree as daugters Additionally, tree brances contain grammatical role information (adjunct, complement, etc.) By distinguising different kinds of constituent-structures, we define wat kinds of prases exist in a language 9 0 An ontology of prases constituent-structure Sketc of an example for -complement structures -struc -dtr sign comp- list(prase)» coordinate-structure» pon <se> 1 subcat D E 1 -comps-struc -marker-struc -dtr prase marker-dtr comp- elist -adjunct-struc -dtr prase adjunct-dtr prase comp- elist pon <drinks> D E subcat 1,» pon <wine> 1 Universal Principles Head-Feature Principle: But ow exactly did tat last example work? drinks as information specifying tat it is a verb and so fort, and it also as subcategorization information specifying tat it needs a subjects and an object. Te information gets percolated up (Te Principle) Te subcategorization information gets cecked off as you move up in te tree (Te subcat Principle) In prose: Te feature of any ed prase is structure-sared wit te value of te daugter. Specified as a constraint:»» prase 1 ed-structure -dtr 1 Suc principles are treated as linguistic universals in HPSG.

Subcat Principle: In a ed prase, te subcat value of te daugter is te concatenation of te prase s subcat list wit te list (in order of increasing obliqueness of values of te complement daugters). ed-structure i subcat 1 -dtr subcat 1 comp- sign wit standing for list concatenation, i.e., append, defined as follows e-list 1 := 1.» first 1 rest :=» first 1 rest. Fallout from tese Principles Note tat agreement is andled neatly, simply by te fact tat te values of a s daugters are token-identical to te s subcat items. One question remains before we can get te structure we ave above: How exactly do we decide on a syntactic structure? i.e., Wy is it tat te object was cecked off low and te subject was cecked off at a iger point? Answer: because of te ID scemata used Immediate Dominance (ID) Scemata Tere is an inventory of valid ID scemata in a language Every ed prase must satisfy exactly one of te ID scemata Wic ID scema is used depends on te type of te attribute tis goes back to te ontology of prases we saw earlier Formally, toug, tese constraints are prased as te universal principles were Immediate Dominance Principle (for Englis):» prase ed-struc» verb inv verb! -comps-struc -dtr prase comp- sign (Head-Subject)» verb inv verb! subcat» -comps-struc -dtr (Head-Complement)» verb inv +» -comps-struc -dtr (Head-Subject-Complement) continued on next page 8 Immediate Dominance Principle (for Englis):» prase ed-struc...» -marker-struc marker-dtr marker (Head-Marker) -adjunct-struc adj-dtr mod 1 -dtr 1 (Head-Adjunct) So, in te example of Se drinks wine, te value over drinks wine is a -comps-struc, wile te over te wole sentence is a -subjstruc 9 Towards Head Adjunct Structures Lexical entry of an attributive adjective pon <red> category cat adj prd mod category cat noun subcat * category cat det + nom-obj index 1 index restr list nom-obj index 1 restr * red-rel arg1 1 + 0

Lexical entry of an attributive adjective Version witout redundant specifications Sketc of an example for a -adjunct structure pon <red> adj prd cat cat noun D ie subcat loc cat det mod loc loc index 1index restr list index 1 * + cont red-rel restr arg1 1 pon <red, book> s loc cat D E subcat 1 a pon <red> pon <book> adj s loc cat prd s loc cat noun D ie subcat 1 loc cat det mod 1 Sketc of an example wit an auxiliary pon <Jon, can, go> c» pon <Jon> 1 pon <can, go> D E subcat 1 c» pon <can> pon <go> verb aux + inv D i ie subcat 1NP nom,vp bse Sketc of an example wit an inverted auxiliary pon <can, Jon, go> c c»» pon <can> pon <Jon> pon <go> verb 1 aux + inv + D i ie subcat 1NP nom,vp bse SPEC Principle: Marking Principle: prase i marker-dtr comp- first functional marker-dtr comp- first spec 1 -dtr 1» prase marking» 1 -mark-struc ed-structure marker-dtr marking 1 marking» 1 -mark-struc -dtr marking 1

Lexical entry of te marker tat Sketc of an example for a -marker structure pon <tat> mark verb spec loc cat bse marking unmarked marking tat pon <tat,jon,laugs> s loc cat marking 1 m pon <tat> pon <Jon laugs> mark verb spec s loc cat marking unmarked marking 1tat 8 A few more points on HPSG We can view a grammar as a set of constraints: formulas wic ave to be true in order for a feature structure to be well-formed Wit suc a view, parsing wit HPSG falls into te realm of constraintbased processing Two important points about relating descriptions are subsumption and unification, loosely defined as: subsumption: te description F subsumes te description G iff G entails F; i.e., F is more general tan G unification: te description of F and G unify iff teir values are compatible Closed World Assumption: tere are no linguistic species beyond wat is specified in te type ierarcy 9