Generating a Sentence from a Thought

Similar documents
AQUA: An Ontology-Driven Question Answering System

Emmaus Lutheran School English Language Arts Curriculum

Parsing of part-of-speech tagged Assamese Texts

Introduction to HPSG. Introduction. Historical Overview. The HPSG architecture. Signature. Linguistic Objects. Descriptions.

An Interactive Intelligent Language Tutor Over The Internet

What the National Curriculum requires in reading at Y5 and Y6

Linking Task: Identifying authors and book titles in verbose queries

CS 598 Natural Language Processing

Loughton School s curriculum evening. 28 th February 2017

SINGLE DOCUMENT AUTOMATIC TEXT SUMMARIZATION USING TERM FREQUENCY-INVERSE DOCUMENT FREQUENCY (TF-IDF)

Writing a composition

Taught Throughout the Year Foundational Skills Reading Writing Language RF.1.2 Demonstrate understanding of spoken words,

Opportunities for Writing Title Key Stage 1 Key Stage 2 Narrative

Written by: YULI AMRIA (RRA1B210085) ABSTRACT. Key words: ability, possessive pronouns, and possessive adjectives INTRODUCTION

Linguistic Variation across Sports Category of Press Reportage from British Newspapers: a Diachronic Multidimensional Analysis

Derivational and Inflectional Morphemes in Pak-Pak Language

Senior Stenographer / Senior Typist Series (including equivalent Secretary titles)

Using dialogue context to improve parsing performance in dialogue systems

ELA/ELD Standards Correlation Matrix for ELD Materials Grade 1 Reading

Iraqi EFL Students' Achievement In The Present Tense And Present Passive Constructions

1 st Quarter (September, October, November) August/September Strand Topic Standard Notes Reading for Literature

The College Board Redesigned SAT Grade 12

First Grade Curriculum Highlights: In alignment with the Common Core Standards

Intra-talker Variation: Audience Design Factors Affecting Lexical Selections

Basic Syntax. Doug Arnold We review some basic grammatical ideas and terminology, and look at some common constructions in English.

BASIC ENGLISH. Book GRAMMAR

Approaches to control phenomena handout Obligatory control and morphological case: Icelandic and Basque

California Department of Education English Language Development Standards for Grade 8

Subject: Opening the American West. What are you teaching? Explorations of Lewis and Clark

Proof Theory for Syntacticians

Syntax Parsing 1. Grammars and parsing 2. Top-down and bottom-up parsing 3. Chart parsers 4. Bottom-up chart parsing 5. The Earley Algorithm

Target Language Preposition Selection an Experiment with Transformation-Based Learning and Aligned Bilingual Data

Derivational: Inflectional: In a fit of rage the soldiers attacked them both that week, but lost the fight.

Coast Academies Writing Framework Step 4. 1 of 7

BULATS A2 WORDLIST 2

Comprehension Recognize plot features of fairy tales, folk tales, fables, and myths.

Some Principles of Automated Natural Language Information Extraction

Language Acquisition Fall 2010/Winter Lexical Categories. Afra Alishahi, Heiner Drenhaus

The presence of interpretable but ungrammatical sentences corresponds to mismatches between interpretive and productive parsing.

Houghton Mifflin Reading Correlation to the Common Core Standards for English Language Arts (Grade1)

TABE 9&10. Revised 8/2013- with reference to College and Career Readiness Standards

Developing Grammar in Context

A Case Study: News Classification Based on Term Frequency

Reading Grammar Section and Lesson Writing Chapter and Lesson Identify a purpose for reading W1-LO; W2- LO; W3- LO; W4- LO; W5-

Correspondence between the DRDP (2015) and the California Preschool Learning Foundations. Foundations (PLF) in Language and Literacy

Objectives. Chapter 2: The Representation of Knowledge. Expert Systems: Principles and Programming, Fourth Edition

ELD CELDT 5 EDGE Level C Curriculum Guide LANGUAGE DEVELOPMENT VOCABULARY COMMON WRITING PROJECT. ToolKit

Myths, Legends, Fairytales and Novels (Writing a Letter)

Using a Native Language Reference Grammar as a Language Learning Tool

Learning Disability Functional Capacity Evaluation. Dear Doctor,

Dickinson ISD ELAR Year at a Glance 3rd Grade- 1st Nine Weeks

Machine Learning from Garden Path Sentences: The Application of Computational Linguistics

Words come in categories

5 th Grade Language Arts Curriculum Map

The Interface between Phrasal and Functional Constraints

Specification and Evaluation of Machine Translation Toy Systems - Criteria for laboratory assignments

Learning Methods for Fuzzy Systems

Probabilistic Latent Semantic Analysis

On the Notion Determiner

English for Life. B e g i n n e r. Lessons 1 4 Checklist Getting Started. Student s Book 3 Date. Workbook. MultiROM. Test 1 4

Case government vs Case agreement: modelling Modern Greek case attraction phenomena in LFG

AN ANALYSIS OF GRAMMTICAL ERRORS MADE BY THE SECOND YEAR STUDENTS OF SMAN 5 PADANG IN WRITING PAST EXPERIENCES

Intensive English Program Southwest College

Compositional Semantics

Notes on The Sciences of the Artificial Adapted from a shorter document written for course (Deciding What to Design) 1

Mercer County Schools

Context Free Grammars. Many slides from Michael Collins

BANGLA TO ENGLISH TEXT CONVERSION USING OPENNLP TOOLS

Underlying and Surface Grammatical Relations in Greek consider

11/29/2010. Statistical Parsing. Statistical Parsing. Simple PCFG for ATIS English. Syntactic Disambiguation

Ch VI- SENTENCE PATTERNS.

EdIt: A Broad-Coverage Grammar Checker Using Pattern Grammar

PAGE(S) WHERE TAUGHT If sub mission ins not a book, cite appropriate location(s))

Modeling full form lexica for Arabic

Minimalism is the name of the predominant approach in generative linguistics today. It was first

Oakland Unified School District English/ Language Arts Course Syllabus

A First-Pass Approach for Evaluating Machine Translation Systems

Copyright 2002 by the McGraw-Hill Companies, Inc.

A Minimalist Approach to Code-Switching. In the field of linguistics, the topic of bilingualism is a broad one. There are many

1. Introduction. 2. The OMBI database editor

Evolution of Symbolisation in Chimpanzees and Neural Nets

GERM 3040 GERMAN GRAMMAR AND COMPOSITION SPRING 2017

FOREWORD.. 5 THE PROPER RUSSIAN PRONUNCIATION. 8. УРОК (Unit) УРОК (Unit) УРОК (Unit) УРОК (Unit) 4 80.

Abstractions and the Brain

Part I. Figuring out how English works

Project in the framework of the AIM-WEST project Annotation of MWEs for translation

The Smart/Empire TIPSTER IR System

Concept Acquisition Without Representation William Dylan Sabo

Improved Effects of Word-Retrieval Treatments Subsequent to Addition of the Orthographic Form

Constraining X-Bar: Theta Theory

Copyright 2017 DataWORKS Educational Research. All rights reserved.

Age Effects on Syntactic Control in. Second Language Learning

Developing a TT-MCTAG for German with an RCG-based Parser

Parallel Evaluation in Stratal OT * Adam Baker University of Arizona

Detecting English-French Cognates Using Orthographic Edit Distance

On-Line Data Analytics

Construction Grammar. University of Jena.

Procedia - Social and Behavioral Sciences 154 ( 2014 )

Proposed syllabi of Foundation Course in French New Session FIRST SEMESTER FFR 100 (Grammar,Comprehension &Paragraph writing)

Inleiding Taalkunde. Docent: Paola Monachesi. Blok 4, 2001/ Syntax 2. 2 Phrases and constituent structure 2. 3 A minigrammar of Italian 3

Transcription:

Generating a Sentence from a Thought W. Faris 1 and K.H. Cheng Computer Science Department, University of Houston, Houston, Texas, USA Abstract It is desirable for an intelligent program to communicate with humans using a natural language. We have recently developed a program to parse a sentence into a thought using a learned English grammar. Without learning a separate grammar, this paper discusses how we convert the learned grammatical structures into role structures. These role structures can then be used by our algorithm to generate a sentence that reflects the contents of a given thought. Roles define the purpose of grammar terms and link grammatical knowledge to semantic knowledge. Since this linkage separates semantics from grammatical structures completely, the thoughts used in generating a sentence only need to be logical thoughts. Consequently, the creator of a thought may define the thought s content based simply on its semantics and not be concerned with the grammatical details of the natural language. Keywords: Grammar, Sentence Generation, Semantic Representation, Natural Language Processing 1 Introduction One important objective for most artificial intelligence programs is to possess the ability to communicate with humans using a natural language. The communication problem has two major objectives: comprehending the intention of a given sentence, and generating a sentence for a thought that the program wants to express. Recently, we have developed a communication sub-system for the A Learning Program System (ALPS) project [1]. The goal of ALPS is to learn all types of knowledge, including those involved in communication. Our system does not use any pre-coded grammatical knowledge, but instead acquires them during program execution. Our system first learns the grammar terms (parts-of-speech) of the English language along with their details, and then uses this grammatical knowledge to parse a given sentence [2]. Subsequently, we have developed solutions to understand declarative sentences [3, 4] including identifying the correct knowledge referenced by various forms of pronouns [5]. In this paper, we discuss how we transform the grammatical knowledge acquired for parsing a sentence into a bidirectional grammar [6] and present how to use that to generate a sentence when 1 Research is supported in part by GAANN: Doctoral Training in Computer and Computational Sciences, US Department of Education under grant #P200A070377. given a thought. Reiter [7] identifies four stages in generating a sentence: content determination, sentence planning, surface generation, and morphology. Content determination involves the formation of an internal representation for the information to be expressed such as feature structures [8]. Sentence planning is the task of mapping the semantic representation to a linguistic one. For instance, it is responsible for identifying the determiner, adjective, and noun when given a semantic representation such as the red ball. Kasper [9] uses sentence planning to develop an interface between a semantic domain and the Penman s sentence generator [10]. However, it requires specialized modifications based on the domain s knowledge base. Surface generation is the process of properly correlating grammatical terms to one another, such as recognizing that an article must precede the noun. The semantic head-driven generation algorithm [11] is an example of a surface generation algorithm. However, it requires the use of a grammar that does not provide a clear separation between the grammatical and the semantic knowledge. In addition, the grammar used is only for sentence generation and requires a separate grammar for parsing. TG/2 [12] is another surface generation tool, but it is limited to a predefined grammar. Finally, morphology requires the use of linguistic rules to produce the correct form of a word, examples of which can be found in [13]. One common approach to use a bidirectional grammar is using the same grammatical structure in both parsing and generating a sentence. In other words, when a sentence is being interpreted, parsers identify grammar terms using a set of acceptable sequences and alternatives provided by the grammar; these same grammar terms are used to generate a sentence. The problem is grammars are traditionally designed for parsing, taking text as input. Whereas, in generating a sentence, the input is semantic knowledge. Trying to use the same component in these grammars to generate a sentence has its limitations [14]. Instead, our definition of a bidirectional grammar simply requires that one grammar be used for both purposes, but that it may use different components of the same grammar to accomplish both tasks. The original definition of our grammar uses the grammatical structures to parse a sentence. During the learning of this grammar, we construct knowledge for generating sentences using components known as roles. The role of a grammar term defines the purpose of the term and acts as a bridge between grammatical knowledge and its semantics. Specifically the knowledge created includes role

sets, and role sequences. A role set represents the alternative semantic representations of a grammar term, while a role sequence defines a precise way to express a role. Our program compiles this knowledge and sets up the bridge between a grammar term and its associated role. By using these roles, role set, role sequences and bridges, it removes the dependency on the grammatical terms. The process for generating a sentence begins by identifying a sequence of roles that best matches the information stored in the given thought. Each role is then called sequentially to produce a textual representation of that role. Finally, each word is transformed into the right form based on the properties of the knowledge that it represents. Because of the existence of roles, the grammar acquired by our solution does not require special knowledge of the domain. In addition, since roles may be associated to multiple grammar terms, it allows for a simple implementation for a complex grammar. The principles behind creating role sequences ensure the correct grammatical relationships between each term in the sequence, our form of surface generation. As a result, the generated sentence automatically follows the structure taught in the grammar when using the chosen sequence to express the given thought. Our approach has the advantage that no surface generation step is needed at the time of generating a sentence, and thoughts may be created based on its logical meaning instead of the grammatical requirement of the natural language. Note that we are not interested on how or why the program produces a thought, but on how to generate a sentence according to the contents of a given thought. Currently, our program may use properly constructed thoughts created in three situations. The first situation occurs when a sentence presented to the program by a human user is parsed, creating a thought. We have tested our solution on thoughts such as declarations and questions with various kinds of verbs and pronouns. The second situation occurs when the program is responding to a question. It uses the question thought to create a declaration thought that includes the found answer. The third situation happens during the understanding of a sentence that involves an action. Currently, when certain actions [3] are learned, their effects on various logical objects involved in the action are presented to our program as a sequence of English sentences. The stored effects are individual thoughts created from parsing each sentence. When the program attempts to understand a given sentence that uses that action verb, the thought for each actual effect may be created easily from the prototype effect thought. The actual effect sentence is then generated using our proposed solution. For example, one effect for the action buy is The buyer gives the price to the seller. Given the sentence John buys Jack 2 apples from Mary for 4 dollars., our solution produces the sentence John gives Mary 4 dollars. as one actual effect. The rest of this paper is organized as follows. In the next section, we discuss key components of the learned grammar and describe how roles are used to express the intention of a grammar term. In addition, we demonstrate how multiple roles may be combined to form the structure of thoughts expected by our solution. Section 3 discusses the processing needed, at the time of learning the English grammar, to form a bidirectional grammar. This includes building role sets, role sequences, and sequence collections. Section 4 presents the algorithm to generate an English sentence from a given thought. It then describes the algorithm s three major steps in detail: selecting the best-fit role sequence to express the given role, identifying the words representing each role in the selected sequence, and transforming each word into the right form based on the properties of the knowledge that the word represents. Finally, Section 5 concludes the paper and presents some challenges and future objectives of the project. 2 Grammar The learning of the English grammar is done incrementally in ALPS, i.e., our program first learns a subset of the English grammar and details may then be added to increase the kind and complexity of the sentences that the program can handle. Our program originally uses the learned grammar to parse and understand English sentences. It first learns grammar terms such as sentence, complete subject, verb, noun phrase, and preposition. Each grammar term has several components that define the term and how to use it. Some major components introduced in [2] are structure, role, and rule. The structure of a grammar term defines the exact grammatical format of the term. Two major possibilities for the structure of a grammar term are a sequence and an alternative of grammar terms. A rule specifies a condition that must be satisfied by either the grammar term or its structure. The role of a grammar term defines the purpose of the term. Another grammar term s component introduced in [3, 4] is the control, which is the grammar term in a sequence that carries out an important duty in understanding the semantics of the term. We will show in this paper how to use the same grammar to generate an English sentence when given a thought to express. A role associated to a grammar term has three important aspects: the grammatical role, the semantic role, and its structure. The grammatical role, identified by italics, is the label given to the role depending on the grammatical purpose of the term. For example, the role of the first noun phrase in a sentence is labeled as subject. The semantic role, called a logical object, is the label given to the role in relation to its semantic purpose within a higher-level role. For example, the intent of a declarative sentence using a be verb is to define some object of interest. As a result, its role, a declaration, requires two logical objects: definition and object-of-interest. The grammatical subject of the sentence serves the semantic purpose of object-of-interest. On the other hand, in a declaration using an action verb, the

same subject serves the purpose of actor if the declaration is in active voice. The structure of a role reflects the content of the role, and consists of several related logical objects that represent what the role is expressing. For example, in a thought that reflects an action, the logical objects are actor, act-on object, and act-for object. Consider another example; the aspect-of-an-object structure is one possible structure to realize the role associated to a noun phrase. The knowledge-of-interest referred to by an aspect-of-an-object structure represents an aspect or characteristic of a specific object, the object-of-interest. Consequently, aspect and object-of-interest are the two logical objects in this structure. Given specific values of these logical objects, the knowledge-of-interest may be inferred easily. For instance, if the logical objects aspect and object-of-interest refer to the height concept and Mt. Everest, respectively, then the intent of this role is to express the height of Mt. Everest. One way to express an aspect-of-an-object in English is to use a noun phrase that uses the preposition of. As a result, the grammar term for the preposition of is associated with an aspect-of-an-object role structure. Table 1 shows some role structures and their usage in ALPS. The above example shows a role structure whose logical objects are already provided. However, identifying the logical objects within a structure depends on the associated grammar term. For example, when attached to the preposition of and used in a noun phrase, aspect should be the simple subject grammatical role, while object-of-interest should be the object-of-the-preposition. On the other hand, a role with this structure could be attached to the possessive noun to express phrases such as John s weight. In this case, the object-of-interest is the grammatical role possessor, while the aspect should point to the role of the noun being modified, labeled term. In order to create the correct associations, a dynamically generated bridge is created for each instance of the structure linking logical objects to grammatical roles. The structure associated to of will have a bridge that links aspect to simple subject and objectof-interest to object-of-the-preposition. On the other hand, the bridge for the structure in the second example links aspect to term and object-of-interest to possessor. By using different bridges, we may apply the same role structure to different grammar terms to express the same knowledge in various ways. The usages of this role structure with its bridges are shown in Figure 1. Note that these bridges link in both directions, one for generating a sentence and the other in creating the thought while parsing a sentence. It is important to note that the required thought given to our program to generate a sentence is a logical thought, i.e., its content is identified by logical objects instead of grammatical terms. As a result, the creator of the thought only needs to define the logical meaning of the thought, and does not need to be concerned with the grammatical details of the natural language. We will use the thought for the sentence The next prime number after 7 is 11. as an example. Since this is a declarative sentence, the thought is a declaration. Every declaration depends on its verb role, which is a define role in this example. A define role uses a definition to define an object-of-interest. Therefore, in the declaration, the logical object object-of-interest is the next prime number after 7, and the definition is the number 11. The definition object may be represented easily by a wholeobject role that contains the number 11. The logical object object-of-interest for the phrase the next prime number after 7 may be constructed in a way similar to a declaration. We use a relationship role structure for this expression since it depicts a number related to another number. A relationship role structure is defined by three logical objects: an object-of-interest, a reference-object, and a relation. In this example, 7 is the reference-object, after is the relation, and the next prime number is the object-of-interest. Finally, the next prime number is represented by a role Table 1. Example role structures in ALPS Category Role Structure Purpose Logical Objects Example Declaration Define Defines a state, category or property of an object object-of-interest, definition Apples are fruit. Act Expresses an action Actor, act-on, act-for John gave Jill a gift. Possess Defines the possession of one object by another possessor, possession Mary has 2 homes. Usage Aspect-of-an-Object Represent an aspect or property of an object aspect, object-of-interest His weight Relationship Represent an object in relation to another object reference-object, object-of-interest The ball on the table Simple Subject Obj of the Preposition Aspect Obj-of-Interest Pre Aspect-of-an-Object posi tion Aspect Weight Aspect Obj-of-Interest Term Possessor Object-of-interest weight of John John John s weight Figure 1. Two bridges of the aspect-of-an-object structure

which contains the knowledge prime number, the modifier the, which represents a property of uniqueness, and the adjective next. 3 Sequence collection At the time our program learns the grammar, the collection of role sequences for a grammar term is generated according to its grammatical structure. For a grammar term that is defined by a list of alternative grammar terms, its role set is composed of the role structures for the roles associated with its descendants. For example, two alternatives of preposition are of and relational prepositions such as above and before. The structures for their associated roles, aspect-of-an-object and relationship, are added to the role set for preposition. Note that the same grammar may be learned incrementally in many different orders. It is possible that a role is attached to a grammar term that is known to be a descendant of another term. Alternatively, a grammar term already having an associated role is later taught as a descendant of another term. In both cases, the role structure for that role is added to the role set stored at the root grammar term. For instance, suppose define role has already been associated with the be grammar term and verb is initially taught to have two alternatives: action verb and linking verb. When act role is taught to associate with the action verb, it is added to the role set for verb as an alternative. Similarly, define role is also added to the role set for verb when be is taught as the child alternative term of linking verb. As a result, the role set for verb contains an act and define role structures as its alternatives. For a grammar term whose grammatical structure is a sequence, a role sequence is created based on its order and occurrence. Each item in the role sequence is a grammatical role, with a pointer to the role set of the corresponding grammar term. When a new role structure is added to the role set, it will be available dynamically to any role sequence that contains the role set. For instance, a noun phrase may be defined by the sequence [nominal, prepositional phrase], where nominal is compulsory and has the simple subject role, and prepositional phrase is optional and defined in turn by the sequence [preposition, nominal] with this nominal having the object-of-the-preposition role. The corresponding role sequence created for noun phrase is [simple subject, prepositional, object-of-the-preposition] with the first item compulsory while the last two items optional. Unlike roles and role sets that are attached with grammar terms, a role sequence is attached to each role structure in the role set of the control. Once a new role sequence is attached to a role structure, that structure may now be expressed by the originating grammatical sequence. For example, the role sequence for noun phrase would be attached to each alternative role structure in the role set of preposition such as aspect-of-an-objec and relationship. Recall that a role structure such as aspect-of-an-object may be used by multiple terms each having a different role sequence, the collection of these role sequences is called the sequence collection for that role structure. Any role sequence found in its sequence collection may be used to express the content of the role structure. For example, since the aspect-of-an-object structure may also be used by possessive noun, the role sequence for possessive noun, [possessor, term], is also added to the sequence collection for the aspect-of-an-object role structure. Figure 2 shows the sequence structures that may be taught for a noun phrase and the corresponding role sequences created for the aspectof-an-object role structure. These dual structures allow one grammar to both parse and generate a sentence using the grammatical structures and the role sequences, respectively. The next section will show how to choose from a sequence collection a specific role sequence to produce a statement that properly reflects the contents of the role. noun phrase aspect of an object nominal + prepositional phrase possessive noun + noun simple subject + prepositional + object of the preposition possessor + term Figure 2. An example dual structure of grammatical structures and role sequences. Finally, when a sequence is taught as a grammatical structure for a specific kind of a grammar term, the role structures in the role set of the control are not the appropriate place to store the generated role sequence. For example, the control of a sequence applicable to many different kinds of sentences is the verb. Consequently, the generated role sequence [subject, verb, predicate] is associated to each role structure in the role set for verb, in particular, define. On the other hand, decision question is a kind of sentence that has a special sequence structure, which only applies to it, yet it has the same control, the verb. If the generated role sequence [verb, subject, predicate] is also stored in each role structure of the role set of the control, such as define, then two errors may occur. Either the special structure is used erroneously to generate sentences of other kinds or decision questions may be generated erroneously by the generic structure. To prevent these errors, when a sequence is taught for a kind of a grammar term, the generated role sequence is added to the sequence collection of that kind s unique role.

4 Sentence generation Given that a thought is a collection of roles, each representing a unique semantic element to be expressed, the task of producing an English sentence to reflect that thought involves expressing the intent of each internal role. The intent of each role is represented by the structure of the role. Our algorithm to generate a sentence consists of two major steps: generate and make. The generate step, called by every role structure needed to be expressed, is further subdivided into two steps: select and generatelist. The select step is our version of sentence planning, as it selects the role sequence from the sequence collection that best fits the given thought or role. In general, the sequence collection is retrieved from the role to be expressed. However, for thoughts, the location of the sequence collection depends on its kind. The sequence that best fits is the one that expresses all the information contained within the role. The generatelist step assembles a list of wordunits according to the chosen sequence of roles. The make step performs the duty of morphology. It selects the correct form of the word based on the properties stored in each word unit, and applies any syntactic rules needed to generate a complete sentence. Recall that the sequence collection of a role contains multiple role sequences that may be used to express what the role represents. However, not all sequences apply to the given role, and some may be more desirable than others. The select step selects the sequence that best fits the given role. For example, suppose declaration has the following sequences in its collection: [subject, verb, predicate], [subject, verb, direct object], and [subject, verb, indirect object, direct object]. Now given a thought that expresses the act of a boy selling lemonade to his neighbors, although the first sequence may be used to express a declaration to define a subject, it is not suitable to express a declaration about an action. The second sequence is for an action, but does not express all the information stored in the thought; specifically, the indirect object representing the neighbors cannot be expressed. The last sequence is the best fit for this thought and should be the one selected to generate the sentence. We make use of an algorithm, checkavailability, to eliminate all sequences that do not match the given role by determining the availability of each element in the role sequence. A role sequence is suitable to express a given role if the given role contains all the logical objects required by the role sequence. If a required element is missing, the role sequence cannot be used and is eliminated. Recall that an element in a sequence can be a single role, a role set, or another role sequence. If it is a role sequence, the checkavailability algorithm is called recursively on this inner sequence. If the element is a role set, the checkavailability function is called for each role structure in the set. As long as one alternative is available, the entire element is considered available. If the element is a single role, it needs a bridge to determine if the element exists in the given role. The reason is grammatical roles are used in role sequences, but roles to be expressed contain logical objects instead. The correct bridge may be determined by either the given role or the role of the control term. Continuing with our example thought about an action, since the control of a sentence is the verb and the role for an action verb is the act role, the bridge for an act role is used. Using this bridge, the program knows that the subject maps to the actor, the direct object to the act-on object and the indirect object to the act-for object. As shown in Figure 3, these logical objects are all in the given thought and can be found, indicating that the role sequence is valid. [subject, verb, indirect object, direct object] Subject Direct Obj Indirect Obj Action Sell Act On Lemonade Act Role Bridge Declaration Actor Act-On Act - For Actor Boy Act For Neighbors Figure 3. Matching a role sequence with logical objects Once a logical object is found, its corresponding object in the role is marked as used to prevent another element from using it. For instance, a sentence may have multiple prepositional phrases. If not marked, the same role may be used by each prepositional phrase, resulting in duplicate phrases. If the logical object exists in the given role, the role element is available. However, if an element in a role sequence is not available and its occurrence is compulsory, then the role sequence is eliminated. On the other hand, if its occurrence is optional, it is still possible to use the role sequence and so it is not eliminated. For example, the first sequence in the sequence collection for sentence is not acceptable because the compulsory predicate cannot be mapped to any object in the act role. After removing all invalid sequences, it is still possible to have a set of valid alternatives. In our example, two valid sequences may still be used: [subject, verb, direct object], and [subject, verb, indirect object, direct object]. The latter sequence will be chosen since it expresses the most information from the given thought. After selecting the best sequence of roles to express the thought, the generatelist step will work on each role according to its order in the sequence. There are two categories of roles: composite and non-composite. A composite role such as aspect-of-an-object contains multiple sub-roles representing the logical objects of the role, while a

non-composite role is atomic. If the role is composite, the generate function is called on that role and the process of selecting a sequence and composing a word unit list is handled at the level of the composite role. If the role is noncomposite, a word-unit is created by extracting the content of the role including the name of the knowledge and any properties associated with that knowledge. For example, a role representing the person John would produce a wordunit containing John and the properties singular, thirdperson, male, and unique. Note that some of these properties may be stored within the thought role such as the tense of the sentence where the tense property of a verb may be obtained. The object-oriented nature of ALPS allows for each type of non-composite role to have its own unique function to determine what properties are needed and where to locate them. The make step selects the correct form of the word based on the properties stored in each word unit, and applies any syntactic rules needed to generate a grammatically correct sentence. Currently, we have focused on properties such as case for nouns, and both case and tense for verbs. Two categories of morphology rules: regular and irregular, are used in English to distinguish the various values of a property for a word. For regular words, they follow a set of rules that determine how affixes are added to root words in order to express a property. This allows the same affix to be applied to a large number of words. For example, the suffix s can be applied to nouns to form the plural of the word. Our program is taught a set of rules that is used to understand a word based on its root form and any affixes found, and this set of rules is used to produce the correct word when given the property. Each morphology rule is taught with a condition to determine if that rule is applicable to the current word, the transformation to apply to the word, and the new property that the word has. For instance, one rule for forming the plural of a word could have condition that if the word ends in y, then transform the word by replacing the -y with an ies. For irregular words, the difference between the root word and its varying forms does not adhere to any consistent rules. For instance, even though the plural of house is houses, the plural of mouse is mice but not mouses, and dice is the plural form of die. It is estimated that there are around 250 irregular verbs alone in the English language [15], and each special case needs to be learned individually. In our program, specialized lexicons identified by their implied properties are learned for use to map a word between its base form and its irregular form. For instance, a plural lexicon that contains a set of words and their irregular plural forms would contain pairings such as goose/geese, ox/oxen, and radius/radii. A word unit that does not contain any properties indicates the word is already in the correct form and no conversion is needed. If it has properties, the make step first tries to convert the base form of the stored word into its irregular form. For each property in the word unit, the corresponding lexicon is looked up. If the word in question is found within that lexicon, this indicates that the property can be applied and the irregular form found in the lexicon is used in the final sentence. If all properties have been tried and an irregular form is not found, the make step will try morphology rules for regular words. Finally, whether or not a word has been transformed, all word-units are tested for syntactic rules on capitalization. For example, the first word in a sentence and proper nouns are capitalized. One final complication of our solution arises from the fact that not all grammar terms have a role associated with them, yet that grammar term still needs to be represented in a role sequence in order to produce a proper expression. For example in our grammar, the grammar term punctuation is a part of a sentence, but does not have an associated role. In this instance, the role sequences for a sentence will have any empty role set reflecting the punctuation. In order to produce a valid sentence, compulsory terms without roles use rules to indicate the proper way of generating the correct output. For example, two rules taught for punctuation are that a declarative sentence must end in a period and an interrogative sentence must end with a question mark. As a result, whenever a role set is empty, the algorithm will use the applicable rule to determine the correct word unit to complete the sequence. 5 Conclusion In future versions of our sentence generating algorithms, we aim to tackle subtle problems in sentence selection, such as generating passive voice sentences, and in morphology, such as property prioritizing. When deciding to express a thought in the passive voice, a variety of bridges may be used depending on the structure of the passive voice. For instance, take an original thought expressing that John gave a gift to Mary. In the thought, John is identified by the logical object actor, the gift is the act-on object, and Mary is the act-for or beneficiary. This thought could be expressed in several passive voice sentences such as A gift was given to Mary. or Mary was given a gift. To express the first case, the bridge would need to link subject to the act-on object and in the second; the bridge would need to link subject to the act-for object. As a result, a method needs to be developed to determine which object is the focus of the sentence and thus choose the correct bridge to properly express the sentence in that manner. The issue of property prioritizing occurs when certain modalities, or helping verbs, are presented. In certain cases, a property of a word may not need to be applied. For instance, verbs generally take the same case as the subject in a sentence. However, most verbs do not differentiate between singular and plural forms when preceded by a helping verb: He may run a marathon., rather than He may runs a marathon. Similarly, the be verb takes the root form, ignoring all properties, when it is preceded by a helping verb: John could be a civil engineer. instead of John could is a civil

engineer. The distinction of when properties can and should be applied is one that needs to be taught through rules that are more sophisticated. Traditional bidirectional grammars rely on the same structure that was used in parsing a sentence to produce a sentence. This limits the flexibility and usefulness of the learned grammar. Instead, our program converts the learned grammar structure into a secondary type of knowledge that may be used to generate a sentence. This is accomplished by combining roles into sets and sequences to create a parallel structure based solely on the roles instead of grammar term. The purpose of a role set is to collect the alternative semantic representations of a grammar term, while role sequences define the various ways the intent of a role can be expressed. A role can then select an applicable role sequence from its collection to create a list of word units that expresses its contents. These word unit lists then propagate up to the thought to produce a list that expresses the entire thought. Finally, by applying properties gathered from the roles and the thought, the proper form of each word may be produced to create the final output sentence. Learning grammar incrementally allows more complex structures to be added as desired, in turn expanding the type of sentences that can be parsed and generated. By using roles, with a clear separation between grammatical and semantic purposes, along with the use of the proper bridge, our approach has the advantage that the same logical structure may be expressed in multiple ways in the natural language. The principles behind creating role sequences ensure the correct grammatical relationships between each term in a sequence. It has the advantage that no surface generation step is needed at the time of generating a sentence, and thoughts may be created based on its logical meaning instead of the grammatical requirement of the natural language. 6 References [1] K. Cheng. An Object-Oriented Approach to Machine Learning ; International Conference on Artificial Intelligence, 487-492, 2000. [2] W. Faris & K. Cheng. An Object-Oriented Approach in Representing the English Grammar and Parsing ; International Conference on Artificial Intelligence, 325-331, 2008. [3] E. Ahn, W. Faris, & K. Cheng. Recognizing the Effects caused by an Action in a Declarative Sentence ; International Conference on Artificial Intelligence, 149-155, 2009. [5] W. Faris & K. Cheng. Understanding Pronouns ; International Conference on Artificial Intelligence, 850-856, 2010. [6] D. Appelt. Bidirectional grammars and the design of natural language generation systems ; In Proceedings of the 1987 workshop on Theoretical issues in natural language processing, Association for Computational Linguistics, Stroudsburg, PA, USA, 206-212, 1987. [7] E. Reiter. Has a consensus NL generation architecture appeared, and is it psycholinguistically plausible? ; In Proceedings of the Seventh International Workshop on Natural Language Generation, Association for Computational Linguistics, Stroudsburg, PA, USA, 163-170, 1994. [8] S. Shieber. "An introduction to unification-based approaches to grammar", CSLI Lecture Notes, 4, Stanford Univ. 1986. [9] R. Kasper. A flexible interface for linking applications to Penman's sentence generator ; In Proceedings of the workshop on Speech and Natural Language, Association for Computational Linguistics, Stroudsburg, PA, USA, 153-158, 1989. [10] E. Hovy. The current status of the Penman language generation system ; Proceedings of the workshop on Speech and Natural Language, Association for Computational Linguistics, Stroudsburg, PA, 1989. [11] S. Shieber, G. van Noord, F. Pereira, and R. Moore. Semantic-head-driven generation, Computational Linguistustics, 16, 1, 30-42, 1990 [12] S. Busemann. Best-First surface realization ; In Proceedings of the Eighth International Workshop on Natural Language Generation, 101-110, 1996. [13] S. Russell and P. Norvig, Artificial Intelligence, A Modern Approach, 2 nd Ed., Prentice Hall, 2003. [14] G. Russell, S. Warwick, & J. Carroll. Asymmetry in Parsing and Generating with Unification Grammars: Case Studies from ELU ; 28 th Annual meeting on Association for Computational Linguistics, 205-211, 1990. [15] R. Quirk, S. Greenbaum, G. Leech, & J. Svartvik. A Comprehensive Grammar of the English Language. Longman, 1985. [4] W. Faris & K. Cheng. Understanding and Executing a Declarative Sentence involving a forms-of-be Verb ; IEEE International Conference on Systems, Man, and Cybernetics, 1695-1700, 2009.