Building Applied Natural Language Generation Systems. Robert Dale and Ehud Reiter

Similar documents
ENGBG1 ENGBL1 Campus Linguistics. Meeting 2. Chapter 7 (Morphology) and chapter 9 (Syntax) Pia Sundqvist

CS 598 Natural Language Processing

Introduction to HPSG. Introduction. Historical Overview. The HPSG architecture. Signature. Linguistic Objects. Descriptions.

Syntax Parsing 1. Grammars and parsing 2. Top-down and bottom-up parsing 3. Chart parsers 4. Bottom-up chart parsing 5. The Earley Algorithm

Developing a TT-MCTAG for German with an RCG-based Parser

The presence of interpretable but ungrammatical sentences corresponds to mismatches between interpretive and productive parsing.

Porting to an Italian Surface Realizer: A Case Study

Universal Grammar 2. Universal Grammar 1. Forms and functions 1. Universal Grammar 3. Conceptual and surface structure of complex clauses

Approaches to control phenomena handout Obligatory control and morphological case: Icelandic and Basque

Parsing of part-of-speech tagged Assamese Texts

a) analyse sentences, so you know what s going on and how to use that information to help you find the answer.

Inleiding Taalkunde. Docent: Paola Monachesi. Blok 4, 2001/ Syntax 2. 2 Phrases and constituent structure 2. 3 A minigrammar of Italian 3

A Framework for Customizable Generation of Hypertext Presentations

Derivational and Inflectional Morphemes in Pak-Pak Language

Grammars & Parsing, Part 1:

A Minimalist Approach to Code-Switching. In the field of linguistics, the topic of bilingualism is a broad one. There are many

Compositional Semantics

Welcome to the Purdue OWL. Where do I begin? General Strategies. Personalizing Proofreading

11/29/2010. Statistical Parsing. Statistical Parsing. Simple PCFG for ATIS English. Syntactic Disambiguation

Context Free Grammars. Many slides from Michael Collins

1 st Quarter (September, October, November) August/September Strand Topic Standard Notes Reading for Literature

Proof Theory for Syntacticians

Case government vs Case agreement: modelling Modern Greek case attraction phenomena in LFG

Basic Parsing with Context-Free Grammars. Some slides adapted from Julia Hirschberg and Dan Jurafsky 1

Words come in categories

ELD CELDT 5 EDGE Level C Curriculum Guide LANGUAGE DEVELOPMENT VOCABULARY COMMON WRITING PROJECT. ToolKit

Machine Learning from Garden Path Sentences: The Application of Computational Linguistics

Developing Grammar in Context

Taught Throughout the Year Foundational Skills Reading Writing Language RF.1.2 Demonstrate understanding of spoken words,

Natural Language Processing. George Konidaris

Argument structure and theta roles

Enhancing Unlexicalized Parsing Performance using a Wide Coverage Lexicon, Fuzzy Tag-set Mapping, and EM-HMM-based Lexical Probabilities

Subject: Opening the American West. What are you teaching? Explorations of Lewis and Clark

Ch VI- SENTENCE PATTERNS.

1/20 idea. We ll spend an extra hour on 1/21. based on assigned readings. so you ll be ready to discuss them in class

Control and Boundedness

Emmaus Lutheran School English Language Arts Curriculum

LNGT0101 Introduction to Linguistics

Chapter 4: Valence & Agreement CSLI Publications

Adapting Stochastic Output for Rule-Based Semantics

An Interactive Intelligent Language Tutor Over The Internet

Beyond the Pipeline: Discrete Optimization in NLP

Type Theory and Universal Grammar

Opportunities for Writing Title Key Stage 1 Key Stage 2 Narrative

ELA/ELD Standards Correlation Matrix for ELD Materials Grade 1 Reading

Accurate Unlexicalized Parsing for Modern Hebrew

Constraining X-Bar: Theta Theory

LING 329 : MORPHOLOGY

A First-Pass Approach for Evaluating Machine Translation Systems

Specifying Logic Programs in Controlled Natural Language

UNIVERSITY OF OSLO Department of Informatics. Dialog Act Recognition using Dependency Features. Master s thesis. Sindre Wetjen

Some Principles of Automated Natural Language Information Extraction

Generation of Referring Expressions: Managing Structural Ambiguities

Construction Grammar. University of Jena.

First Grade Curriculum Highlights: In alignment with the Common Core Standards

Loughton School s curriculum evening. 28 th February 2017

A Simple Surface Realization Engine for Telugu

Oakland Unified School District English/ Language Arts Course Syllabus

ReinForest: Multi-Domain Dialogue Management Using Hierarchical Policies and Knowledge Ontology

Objectives. Chapter 2: The Representation of Knowledge. Expert Systems: Principles and Programming, Fourth Edition

Modeling full form lexica for Arabic

An Introduction to the Minimalist Program

Towards a MWE-driven A* parsing with LTAGs [WG2,WG3]

On the Notion Determiner

Towards a Machine-Learning Architecture for Lexical Functional Grammar Parsing. Grzegorz Chrupa la

What the National Curriculum requires in reading at Y5 and Y6

Minimalism is the name of the predominant approach in generative linguistics today. It was first

Basic Syntax. Doug Arnold We review some basic grammatical ideas and terminology, and look at some common constructions in English.

L1 and L2 acquisition. Holger Diessel

Interfacing Phonology with LFG

Program Matrix - Reading English 6-12 (DOE Code 398) University of Florida. Reading

Ensemble Technique Utilization for Indonesian Dependency Parser

Update on Soar-based language processing

Target Language Preposition Selection an Experiment with Transformation-Based Learning and Aligned Bilingual Data

Intermediate Academic Writing

Copyright and moral rights for this thesis are retained by the author

Visual CP Representation of Knowledge

The College Board Redesigned SAT Grade 12

AQUA: An Ontology-Driven Question Answering System

Using dialogue context to improve parsing performance in dialogue systems

SAMPLE. Chapter 1: Background. A. Basic Introduction. B. Why It s Important to Teach/Learn Grammar in the First Place

The Interface between Phrasal and Functional Constraints

English for Life. B e g i n n e r. Lessons 1 4 Checklist Getting Started. Student s Book 3 Date. Workbook. MultiROM. Test 1 4

Language Acquisition Fall 2010/Winter Lexical Categories. Afra Alishahi, Heiner Drenhaus

Nancy Hennessy M.Ed. 1

Organizing Comprehensive Literacy Assessment: How to Get Started

Software Maintenance

Adjectives tell you more about a noun (for example: the red dress ).

GERM 3040 GERMAN GRAMMAR AND COMPOSITION SPRING 2017

Inteligencia Artificial. Revista Iberoamericana de Inteligencia Artificial ISSN:

Analysis of Probabilistic Parsing in NLP

CELTA. Syllabus and Assessment Guidelines. Third Edition. University of Cambridge ESOL Examinations 1 Hills Road Cambridge CB1 2EU United Kingdom

Intensive English Program Southwest College

National Literacy and Numeracy Framework for years 3/4

M55205-Mastering Microsoft Project 2016

BANGLA TO ENGLISH TEXT CONVERSION USING OPENNLP TOOLS

CAS LX 522 Syntax I. Long-distance wh-movement. Long distance wh-movement. Islands. Islands. Locality. NP Sea. NP Sea

Som and Optimality Theory

HDR Presentation of Thesis Procedures pro-030 Version: 2.01

"f TOPIC =T COMP COMP... OBJ

Transcription:

Building Applied Natural Language Generation Systems Robert Dale and Ehud Reiter 1

Overview 1 An Introduction to NLG 2 Requirements Analysis for NLG 3 NLG Architecture and System Design 4 A Case Study 5 A Closer Look at the Component Tasks 6 Conclusions and Pointers 2

Component Tasks in NLG Text Planning Content determination Discourse planning Sentence Planning Sentence aggregation Lexicalisation Referring expression generation Linguistic Realization Syntactic and morphological realization Orthographic realization 3

Realisation Goal: to convert sentence plans into actual text Purpose: to hide the peculiarities of English (or whatever the target language is) from the rest of the NLG system 4

Realisation Tasks Insert function words Choose correct inflection of content words Order words within a sentence Apply orthographic rules 5

Realisation Techniques Bi-directional Grammar Specifications Grammar Specifications tuned for Generation Templates 6

Realisation vs Parsing Realisation is easier than parsing: no need to handle the full range of syntax that a human might use no need to resolve ambiguities no need to cater for ill-formed input 7

Bi-directional Grammar Specifications Key idea: one grammar specification used for both realisation and parsing Generally expressed as a declarative set of correspondences between semantic and syntactic structures Different processes applied for generation and analysis 8

Bi-directional Grammar Specifications Variety of algorithms, including semantic-head driven Algorithms often perform lexicalisation as well as realisation Theoretically elegant approach To date, sometimes used in machine-translation systems, but almost never used in other applied NLG systems 9

Problems with the Bi-directional Approach Output of an NLU parser (a semantic form) is very different from the input to an NLG realiser (a sentence plan) Debatable whether lexicalisation should be integrated with realisation Difficult in practice to engineer large bidirectional grammars Difficulties handling fixed phrases 10

Grammar Specifications tuned for Generation Grammar provides a set of choices for realisation Choices are made on the basis of the input sentence plan Grammar can only be used for NLG Widely used in practice (including FoG, PlanDoc, and AlethGen) Working software is available 11

Systemic Grammar Emphasises the functional organisation of language surface forms are viewed as the consequences of selecting a set of abstract functional features choices correspond to minimal grammatical alternatives the interpolation of an intermediate abstract representation allows the specification of the text to accumulate gradually 12

Systemic Grammar Declarative Bound Relative Mood Major Indicative Imperative Present-Participle Interrogative Polar Wh- Minor Past-Participle Infinitive 13

Systemic Grammar Clause Choices Major indicative declarative: The cat is on the mat. Major indicative declarative relative: [He didn t see the cat] that chased the rat. Major indicative declarative bound: [It only hurts] when I laugh. Major indicative interrogative polar: Has anybody seen my seagull? Major imperative: Don t be ridiculous. Minor present-participle: [You ll enjoy] having more free time. 14

KPML How it works: choices are made using INQUIRY SEMANTICS for each choice system in the grammar, a set of predicates known as CHOOSERS are defined these tests are functions from the internal state of the realiser and host generation system to one of the features in the system the chooser is associated with 15

KPML Realisation Statements: small grammatical constraints at each choice point build up to a grammatical specification Insert SUBJECT : an element functioning as subject will be present Conflate SUBJECT ACTOR : the constituent functioning as SUBJECT is the same as the constituent that functions as ACTOR Order FINITE SUBJECT : FINITE must immediately precede SUBJECT 16

Inputs and Outputs (S1/ThereBe :object (O1/train :cardinality 20 :relations ((R1/period :value daily) (R2/source :value Aberdeen) (R3/destination :value Glasgow)))) There are 20 trains daily from Aberdeen to Glasgow. 17

FUF/SURGE FUF: a unification-based linguistic realisation toolkit SURGE: a systemic-based unification grammar of English 18

FUF/SURGE Basic idea: input specification in the form of a FUNCTIONAL DESCRIPTION, a recursive attribute--value matrix the grammar is a large functional description with alternations representing choice points realisation is achieved by unifying the input FD with the grammar FD 19

FUF: An Input FD ((cat clause) (process ((type composite) (relation possessive) (lex hand ))) (participants ((agent ((cat pers_pro) (gender feminine))) ((affected ((cat np) (lex editor ))) ((possessor )) ((possessed ((cat np) (lex draft ))))) She hands the draft to the editor. 20

FUF: A Grammar Fragment ((cat np) (n ((cat noun) (alt ( (number {^ ^ number}))) ;; Proper names don't need an article ((proper yes) (pattern (n))) ;; Common names do ((proper no) (pattern (det n)) (det ((cat article) (lex "the"))))))) 21

Templates No grammar at all: instead, the NLG system is based on templates Text Planner produces trees with templatelike leaves Sentence Planner performs usual operations Realisation may do some morphology and orthography, but nothing else Text planning and sentence planning may still be very complex 22

An Template-driven Example Text Planner Output: [X depart(num=num(x)) at Y] X = CalExpress, Y = 1000 Sentence Planner Output: [X depart(num=num(x)) at Y] X = [the Caledonian Express] (num=singular) Y = [10AM] Realiser Output: The Caledonian Express departs at 10AM. 23

Templates: Pros and Cons Pros easy for non-nlp people, such as domain experts, to understand easy and fast to implement Cons lots of templates required if there is a lot of syntactic variability restricts sentence-planning possibilities 24

Morphology and Orthography Realiser must be able to: inflect words apply standard orthographic spelling changes add punctuation add standard punctuation rules 25

Research Questions How can the different techniques be combined? How much sentence planning can be done with templates? How do layout issues affect realisation? 26

Linguistic Realization For the WeatherReporter system, complexity of the realiser depends on which linguistic decisions were already made by the sentence planner Simplest realiser just walks around the sentence plans in the text plan in a top-down left-toright manner, realising leaf nodes More sophistication is possible 27

An SPL input to KPML (l / greater-than-comparison :tense past :exceed-q (l a) exceed :command-offer-q notcommandoffer :proposal-q notproposal :domain (m / one-or-two-d-time :lex month :determiner the) :standard (a / quality :lex average determiner zero) :range (c / sense-and-measure-quality :lex cool) :inclusive (r / one-or-two-d-time :lex day :number plural :property-ascription (r / quality :lex rain) :size-property-ascription (av / scalable-quality :lex the-av-no-of))) The month was cooler than average with the average number of rain days. 28

An input FD for SURGE ((cat clause) (proc ((type ascriptive) (mode attributive))) (partic ((carrier ((cat common) (lex "month"))) (attribute ((cat ap) (complex conjunction) (distinct ~(((lex "cool") (comparative yes)) ((lex "dry") (comparative yes)))))))))) The month was cooler and drier. 29

Summary We ve seen techniques for text planning techniques for sentence planning techniques for linguistic realisation 30