The Verbmobil Semantic Database. Humboldt{Univ. zu Berlin. Computerlinguistik. Abstract

Similar documents
Specification and Evaluation of Machine Translation Toy Systems - Criteria for laboratory assignments

cmp-lg/ Jul 1995

Citation for published version (APA): Veenstra, M. J. A. (1998). Formalizing the minimalist program Groningen: s.n.

Implementing the Syntax of Japanese Numeral Classifiers

A relational approach to translation

Developing a TT-MCTAG for German with an RCG-based Parser

Annotation Projection for Discourse Connectives

Chapter 4: Valence & Agreement CSLI Publications

have to be modeled) or isolated words. Output of the system is a grapheme-tophoneme conversion system which takes as its input the spelling of words,

Proceedings of the 19th COLING, , 2002.

Modeling Attachment Decisions with a Probabilistic Parser: The Case of Head Final Structures

Some Principles of Automated Natural Language Information Extraction

Study in Berlin at the HTW. Study in Berlin at the HTW

Intra-talker Variation: Audience Design Factors Affecting Lexical Selections

1. Introduction. 2. The OMBI database editor

AQUA: An Ontology-Driven Question Answering System

Including the Microsoft Solution Framework as an agile method into the V-Modell XT

A Brief Profile of the National Educational Panel Study

Control and Boundedness

Compositional Semantics

A Minimalist Approach to Code-Switching. In the field of linguistics, the topic of bilingualism is a broad one. There are many

Parsing of part-of-speech tagged Assamese Texts

CS 598 Natural Language Processing

Controlled vocabulary

Introduction to HPSG. Introduction. Historical Overview. The HPSG architecture. Signature. Linguistic Objects. Descriptions.

Applications of memory-based natural language processing

Dreistadt: A language enabled MOO for language learning

Participate in expanded conversations and respond appropriately to a variety of conversational prompts

ReinForest: Multi-Domain Dialogue Management Using Hierarchical Policies and Knowledge Ontology

The presence of interpretable but ungrammatical sentences corresponds to mismatches between interpretive and productive parsing.

Natural Language Processing. George Konidaris

Susanne J. Jekat

ECML Project B.1: Intercultural Communication in Teacher Education Workshop Report National Training Event Germany Stuttgart, Oct.

Universal Grammar 2. Universal Grammar 1. Forms and functions 1. Universal Grammar 3. Conceptual and surface structure of complex clauses

Modeling full form lexica for Arabic

Customised Software Tools for Quality Measurement Application of Open Source Software in Education

Proof Theory for Syntacticians

Inoffical translation 1

In Udmurt (Uralic, Russia) possessors bear genitive case except in accusative DPs where they receive ablative case.

An Interactive Intelligent Language Tutor Over The Internet

Multi-Tier Annotations in the Verbmobil Corpus

PROCESS USE CASES: USE CASES IDENTIFICATION

Objectives. Chapter 2: The Representation of Knowledge. Expert Systems: Principles and Programming, Fourth Edition

Accuracy (%) # features

Hindi Aspectual Verb Complexes

The MEANING Multilingual Central Repository

Opening Session: European Master in Law & Economics 29 November 2013, 17:00 Uhr, Gästehaus der Universität, Rothenbaumchaussee 34

MODELING DEPENDENCY GRAMMAR WITH RESTRICTED CONSTRAINTS. Ingo Schröder Wolfgang Menzel Kilian Foth Michael Schulz * Résumé - Abstract

An Open Framework for Integrated Qualification Management Portals

Dr. Judith Christina Abdel-Massih-Thiemann. Freelance consultant for organizational and project development

An Introduction to the Minimalist Program

Vocabulary Usage and Intelligibility in Learner Language

THE KARLSRUHE EDUCATION MODEL FOR PRODUCT DEVELOPMENT KALEP, IN HIGHER EDUCATION

SINGLE DOCUMENT AUTOMATIC TEXT SUMMARIZATION USING TERM FREQUENCY-INVERSE DOCUMENT FREQUENCY (TF-IDF)

EQF-Ref Wp3: EQF Referencing Process Exchange of Experience Austria

user s utterance speech recognizer content word N-best candidates CMw (content (semantic attribute) accept confirm reject fill semantic slots

SEMAFOR: Frame Argument Resolution with Log-Linear Models

MASTER OF SCIENCE (M.S.) MAJOR IN COMPUTER SCIENCE

Minimalism is the name of the predominant approach in generative linguistics today. It was first

1/20 idea. We ll spend an extra hour on 1/21. based on assigned readings. so you ll be ready to discuss them in class

2 Mitsuru Ishizuka x1 Keywords Automatic Indexing, PAI, Asserted Keyword, Spreading Activation, Priming Eect Introduction With the increasing number o

Author: Justyna Kowalczys Stowarzyszenie Angielski w Medycynie (PL) Feb 2015

Development of an IT Curriculum. Dr. Jochen Koubek Humboldt-Universität zu Berlin Technische Universität Berlin 2008

LING 329 : MORPHOLOGY

Argument structure and theta roles

Inleiding Taalkunde. Docent: Paola Monachesi. Blok 4, 2001/ Syntax 2. 2 Phrases and constituent structure 2. 3 A minigrammar of Italian 3

Linking Task: Identifying authors and book titles in verbose queries

GERM 3040 GERMAN GRAMMAR AND COMPOSITION SPRING 2017

Informatics 2A: Language Complexity and the. Inf2A: Chomsky Hierarchy

The leaky translation process

EAGLE: an Error-Annotated Corpus of Beginning Learner German

Development of the First LRs for Macedonian: Current Projects

Basic Syntax. Doug Arnold We review some basic grammatical ideas and terminology, and look at some common constructions in English.

P. Belsis, C. Sgouropoulou, K. Sfikas, G. Pantziou, C. Skourlas, J. Varnas

Clouds = Heavy Sidewalk = Wet. davinci V2.1 alpha3

The Computational Value of Nonmonotonic Reasoning. Matthew L. Ginsberg. Stanford University. Stanford, CA 94305

[From: Overcoming the language barrier, 3-6 May 1977, vol.1 (München: Verlag Dokumentation, 1977)]

English (native), German (fair/good, I am one year away from speaking at the classroom level), French (written).

A Comparison of Academic Ranking Scales

A Grammar for Battle Management Language

A Didactics-Aware Approach to Management of Learning Scenarios in E-Learning Systems

Constraining X-Bar: Theta Theory

Achim Stein: Diachronic Corpora Aston Corpus Summer School 2011

Language acquisition: acquiring some aspects of syntax.

Specifying Logic Programs in Controlled Natural Language

Online Marking of Essay-type Assignments

Exploiting Phrasal Lexica and Additional Morpho-syntactic Language Resources for Statistical Machine Translation with Scarce Training Data

Language Acquisition Fall 2010/Winter Lexical Categories. Afra Alishahi, Heiner Drenhaus

11/29/2010. Statistical Parsing. Statistical Parsing. Simple PCFG for ATIS English. Syntactic Disambiguation

Webquests in the Latin Classroom

Lecture Notes in Artificial Intelligence 4343

English Language and Applied Linguistics. Module Descriptions 2017/18

Diploma in Library and Information Science (Part-Time) - SH220

A Framework for Customizable Generation of Hypertext Presentations

Curriculum vitae University of Saarland Sociology, American Studies, Economics

Target Language Preposition Selection an Experiment with Transformation-Based Learning and Aligned Bilingual Data

The Language of Football England vs. Germany (working title) by Elmar Thalhammer. Abstract

Evolution of Symbolisation in Chimpanzees and Neural Nets

An Approach to Polarity Sensitivity and Negative Concord by Lexical Underspecification

Analysis of Lexical Structures from Field Linguistics and Language Engineering

Transcription:

The Verbmobil Semantic Database Karsten L. Worm Univ. des Saarlandes Computerlinguistik Postfach 15 11 50 D{66041 Saarbrucken Germany worm@coli.uni-sb.de Johannes Heinecke Humboldt{Univ. zu Berlin Computerlinguistik Jagerstrae 10/11 D{10099 Berlin Germany heinecke@compling.hu-berlin.de Abstract This paper describes the development and use of a lexical semantic database for the Verbmobil speech{to{speech machine translation project. The motivation is to provide a common information source for the distributed development of the semantics, transfer and semantic evaluation modules and to store lexical semantic information application{ independently. Dieser Beitrag beschreibt die Entwicklung und Anwendung einer lexikalisch{semantischen Datenbank fur das Projekt Verbmobil zur maschinellen Ubersetzung gesprochener Sprache. Die Zielsetzung ist, eine gemeinsame Informationsquelle fur die verteilte Entwicklung der Module Semantik, Transfer und Semantische Auswertung bereitzustellen und lexikalisch{semantische Information anwendungsunabhangig zu verwalten. 1 Introduction The distributed development of the modules of a large natural language processing system at dierent sites makes interface denitions a vital issue. It becomes even more urgent when several modules with the same intended functionality are developed in parallel and should be compatible with respect to their input{ output{behaviour. The research reported in this paper was supported by the German Bundesministerium fur Bildung, Wissenschaft, Forschung und Technologie under contracts 01 IV 101 R and 01 IV 101 G6. We wish to thank our colleagues in the lexicon, syntax/semantics and transfer groups in the project.

SynSem VIT Transfer VIT Generation Semantic Evaluation Figure 1: The Verbmobil architecture (simplied) Another important issue is the acquisition and maintenance of lexical information which should be stored independently of an application in order to make it (re{)usable for dierent purposes. This paper describes the design and use of the Verbmobil Semantic Database which we developed in order to deal with these issues in the area of lexical semantics in Verbmobil. 2 The Verbmobil Project The Verbmobil project [Wah93, BGL + 96] aims at the development of a speech{ to{speech machine translation system for face{to{face appointment scheduling dialogues. It employs a semantic transfer approach to translation [DE96], i. e., an input utterance is syntactically analyzed, a semantic representation of the content is built up, and this source language semantic representation is mapped to a target language semantic representation by the transfer module. This representation is the input for the target language generation. Additionally, a semantic evaluation module answers disambiguation queries (cf. gure 1). 3 Motivation for the Semantic Database The architecture of Verbmobil makes it necessary for the semantics, transfer, semantic evaluation and generation modules to agree on the format and contents of the semantic representations they exchange. E. g., the developers of the transfer module need to know how the semantics of the dierent lemmata in the vocabulary is represented in the structures produced by the syntax{semantics module (synsem for short), i. e., which predicates and structures they have to map to the target language. On the other hand, semantics need to know which readings have to be distinguished by transfer in order to arrive at correct translations. This need becomes even more urgent when, like in Verbmobil, there are several synsem modules (two for German, one for Japanese), which have to produce 2

compatible output, and the modules are developed in parallel by partners at dierent sites. 1 As a frame for the exchange of semantic representations, a common format, the Verbmobil Interface Term, VIT for short, has been dened [BES96]. The VIT is the central data structure used at the interfaces between the language modules of Verbmobil. A VIT is a ten{place term with slots for a list of labeled semantic predicates, sortal and anaphoric information, scope relations, prosodic features, etc. What is needed then in addition to the VIT data structure denition is a denition of the VIT's contents, for each lemma in the vocabulary of the system a denition of the semantic predicates and other types of information, e g., sortal restrictions, it introduces in the VIT. E. g., for a verb like kommen, we need to specify that it introduces a predicate kommen(l1,i1) together with an argument role arg1(l1,i1,i2) in the semantics slot and sort(i1,space_time) in the sorts slot. If a source providing this kind of information to the developers of the separate modules is available, the modules delivering (the two synsem modules) or processing (especially the transfer module) VITs conforming to this denition can be developed in parallel. It would also be desirable to use this information source directly in the construction of the linguistic knowledge bases of the synsem modules to guarantee consistency between their output and the specications. To meet these goals, we have developed the Verbmobil Semantic Database, which we will describe in the remainder of this paper. 4 Design and Implementation of the Database The database is organized around a set of abstract semantic classes [BES96], which are used to classify the lemmata in the vocabulary. It is implemented using the lexicon formalism L E X4 [GH95]. 4.1 Semantic Classes The semantic classes in use are originally based on a morpho{syntactic classication of the words in the vocabulary of the system which has been rened to account for semantic properties. For each semantic class a representation scheme, called the predscheme, has been dened, which species the predicates together with their arity and arguments appearing in a VIT for instances of the class. As an example consider the class intransitive verb. A intransitive verb is rep- 1 In the following, we concentrate on the Semantic Database for German. The database we developed for the Japanese synsem module [Mor96] follows the same principles. 3

Class PredScheme Example transitive verb R(L,I), argx(l,i,i1), argy(l,i,i2) treen common noun R(L,I) Termin det quant R(L,I,H) jeder demonstrative demonstrative(l,i,l1) dieser wh question whq(l,i,h), tloc(l2,i2,i1), time(l1,i1) wann Table 1: A few examples of semantic classes resented as R(L,I), argx(l,i,i1). 2 I. e., it introduces some relation R and one thematic roles (I is the event variable, L a label used to refer to the verb's semantic contribution, and I1 is the instance lling the role). The verb's relation and the thematic roles it assigns have to be dened for each verb in the database. Cf. table 1 for further examples of semantic classes together with their predschemes. 4.2 The Lexicon Formalism L E X4 The semantic database makes use of the lexicon formalism L E X4 developed in the course of the Verbmobil project [GH95]. The Lexicon Formalism L E X4 has been used since summer 1994 within Verbmobil's lexicon group. It is based on feature-structures (permitting disjunction and negation) embedded in an inheritance hierarchy of classes. In L E X4 the task of constructing a lexicon is split up into four parts: Modelling the lexicon (i.e., its linguistic classes), data-acquisition (can be done at the same time by dierent contributors), denition of the application-interface (data can be compiled into every format needed after being processed by the L E X4-machine) and ecient storage. Modelling a lexicon involves dening classes, their appropriate features and inheritance relations between classes. Examples for dening classes will be given below in section 4.3; appropriateness of features is dealt with in the remainder of this section. Database entries, called bases, are instances of a class. Consequently, they assign values to the features they inherit from their class which are not yet fully specied by the class denition. 4.3 Semantic Classes and their Representation in L E X4 The abstract semantic classes of section 4.1 have been modelled in the lexicon formalism L E X4 along the following lines. 2 X stands for one of the values f1; 2; 3g, since arg1, arg2, arg3 are the thematic roles used in Verbmobil. 4

semdb_c verb_c intransitive_c transitive_c ditransitive_c common_noun_c... Figure 2: Part of the class hierarchy Firstly a general superclass semdb c is dened from which all classes inherit features for the lemma, the main predicate's name, the part of speech, etc. The individual subclasses corresponding to the abstract semantic classes additionally introduce a specic predscheme for each predicate associated with words of this class and features for sortal information, thematic roles, etc. class semdb_c :< top >: % - Main class from which % all classes inherit predname: top & % - Name of the semantic predicate lemma: top & % - Lemma of the entry pos: top. % - Part of Speech While the abstract semantic classes are not hierarchically organized, their modelling in L E X4 makes use of a hierarchy to capture generalizations. E. g., we abstract over the properties all verb classes have in common and place them in an abstract verb class verb c from which all verb classes, e. g., intransitive c, inherit, cf. gure 2 (classes corresponding to semantic classes are shown in boldface) and below. class verb_c :< semdb_c >: % - All verbal classes inherit this. sort_of_inst: top. % - Sort of eventuality. class intransitive_c :< verb_c >: % - Intransitive verbs semclass: intransitive_verb & % - Semantic class predscheme: 'L,I' & % - PredScheme for PredName predscheme_a1: 'L,I,I1' & % - PredScheme for the argument role_a1: (arg1 \ arg2 \ arg3). % - Thematic roles of arguments 4.4 Representation of Lemmata A base for a lemma consists of its classication together with its idiosyncratic properties in terms of feature values; it inherits the feature values which are specied in the denition of the class. Among the idiosyncratic information 5

we have predicate names, sortal restrictions, etc. Thus an entry inherits the predscheme from the class, while the concrete predicate name in the predscheme is dened in the entry itself. base 'kommen' :<< intransitive_c >>: % - The entry inherits % from `intransitive_c '. pos: 'VVFIN;VVINF' & % - Further specications. lemma: 'kommen' & predname: 'kommen' & sort_of_inst: space_time & role_a1: 'arg1'. 5 Application of the Semantic Database The Semantic Database is currently being used for creating the semantic lexica of the syntactic{semantic modules of Verbmobil, for producing a table of lemmata with the predicates and other types of information they introduce in a VIT and for checking the correctness of the generated interface terms automatically. To guarantee consistency between the output of the synsem module and the database content, the semantic lexicon of SynSemS3 3 is generated out of the semantic database, e. g., the following entry for kommen. sem_lex(cat, kommen) short_for intrans_verb_sem(cat, kommen, (space_time), [arg1]). The verbs in the syntactic lexicon contain calls to the macro sem lex/2 which are expanded in the semantic lexicon as shown above. 4 The macro intrans verb sem denes the semantic properties of intransitive verbs [BGL + 96]. Additionally, we generate a table of lemmata which is used by the transfer developers and as an information source for the automatic correctness check on VIT representations. In the table the example appears as this: kommen VVINF intransitive_verb kommen(l,i),arg1(l,i,i1) I1/space_time 3 SynSemS3 is the syntactic{semantic module developed by Siemens AG (syntax), University of the Saarland and University of Stuttgart (semantics). The other synsem module developed by IBM Germany makes use of the table output of the database to create a semantic lexicon. 4 The rst argument of sem lex/2 ranges over entry nodes of the feature structures of the lexical entry used by the grammar formalism. 6

6 Conclusion The use of the semantic database has proven to be successful in dealing with about 2000 German and 300 Japanese lemmata for version 1.0 of the Research Prototype. It allows the partners responsible for the syntactic/semantic, transfer and semantic evaluation modules to develop their modules in parallel, relying on the interface specication and the content of the database. References [BES96] Johan Bos, Markus Egg, and Michael Schiehlen. Abstract Semantic Classes and Concrete VIT Representations. Verbmobil{Memo 101, Universitat des Saarlandes, Computerlinguistik, Saarbrucken, 1996. [BGL + 96] Johan Bos, Bjorn Gamback, Christian Lieske, Yoshiki Mori, Manfred Pinkal, and Karsten Worm. Compositional semantics in Verbmobil. In Proc. of the 15 th COLING, Copenhagen, Denmark, 1996. [DE96] Michael Dorna and Martin C. Emele. Semantic{based transfer. In Proc. of the 15 th COLING, Copenhagen, Denmark, 1996. [GH95] Gunter Gebhardi and Johannes Heinecke. Lexikonformalismus LeX4. Verbmobil Technisches Dokument 19, Humboldt{Universitat zu Berlin, Computerlinguistik, Berlin, 1995. [Mor96] [Wah93] Yoshiki Mori. Multiple discourse relations on the sentential level in Japanese. In Proc. of the 15 th COLING, Copenhagen, Denmark, 1996. Wolfgang Wahlster. Verbmobil: Translation of face-to-face dialogues. In Proceedings of the 3 rd European Conference on Speech Communication and Technology, pages 29{38, Berlin, Germany, 1993. 7