Collocations of Nouns: How to Present Verb-noun Collocations in a Monolingual Dictionary

Similar documents
Lemmatization of Multi-word Lexical Units: In which Entry?

The development of a new learner s dictionary for Modern Standard Arabic: the linguistic corpus approach

1. Introduction. 2. The OMBI database editor

Procedia - Social and Behavioral Sciences 154 ( 2014 )

Revision and Digitisation of the Early Volumes of Norsk Ordbok: Lexicographical Challenges

Target Language Preposition Selection an Experiment with Transformation-Based Learning and Aligned Bilingual Data

THE VERB ARGUMENT BROWSER

A corpus-based approach to the acquisition of collocational prepositional phrases

CEFR Overall Illustrative English Proficiency Scales

EdIt: A Broad-Coverage Grammar Checker Using Pattern Grammar

FOREWORD.. 5 THE PROPER RUSSIAN PRONUNCIATION. 8. УРОК (Unit) УРОК (Unit) УРОК (Unit) УРОК (Unit) 4 80.

Ontologies vs. classification systems

LANGUAGE IN INDIA Strength for Today and Bright Hope for Tomorrow Volume 11 : 12 December 2011 ISSN

Project in the framework of the AIM-WEST project Annotation of MWEs for translation

Procedia - Social and Behavioral Sciences 200 ( 2015 )

BULATS A2 WORDLIST 2

Approaches to control phenomena handout Obligatory control and morphological case: Icelandic and Basque

Multilingual Sentiment and Subjectivity Analysis

IMPROVING SPEAKING SKILL OF THE TENTH GRADE STUDENTS OF SMK 17 AGUSTUS 1945 MUNCAR THROUGH DIRECT PRACTICE WITH THE NATIVE SPEAKER

Writing a composition

Effectiveness of Electronic Dictionary in College Students English Learning

Analysis: Evaluation: Knowledge: Comprehension: Synthesis: Application:

AQUA: An Ontology-Driven Question Answering System

General rules and guidelines for the PhD programme at the University of Copenhagen Adopted 3 November 2014

Construction Grammar. University of Jena.

LEXICAL COHESION ANALYSIS OF THE ARTICLE WHAT IS A GOOD RESEARCH PROJECT? BY BRIAN PALTRIDGE A JOURNAL ARTICLE

Advanced Grammar in Use

5. UPPER INTERMEDIATE

Some Principles of Automated Natural Language Information Extraction

Derivational and Inflectional Morphemes in Pak-Pak Language

TITLE: Shakespeare: The technical words. DATE(S): Project will run for four weeks during June or July

SINGLE DOCUMENT AUTOMATIC TEXT SUMMARIZATION USING TERM FREQUENCY-INVERSE DOCUMENT FREQUENCY (TF-IDF)

Formulaic Language and Fluency: ESL Teaching Applications

Word Stress and Intonation: Introduction

Basic Syntax. Doug Arnold We review some basic grammatical ideas and terminology, and look at some common constructions in English.

Methods for the Qualitative Evaluation of Lexical Association Measures

Guidelines for Writing an Internship Report

Constraining X-Bar: Theta Theory

Cross Language Information Retrieval

Testing Collocational Knowledge of Taif University English Seniors

A Minimalist Approach to Code-Switching. In the field of linguistics, the topic of bilingualism is a broad one. There are many

ELD CELDT 5 EDGE Level C Curriculum Guide LANGUAGE DEVELOPMENT VOCABULARY COMMON WRITING PROJECT. ToolKit

Improved Effects of Word-Retrieval Treatments Subsequent to Addition of the Orthographic Form

Intra-talker Variation: Audience Design Factors Affecting Lexical Selections

Introduction to HPSG. Introduction. Historical Overview. The HPSG architecture. Signature. Linguistic Objects. Descriptions.

Linguistic Variation across Sports Category of Press Reportage from British Newspapers: a Diachronic Multidimensional Analysis

The Use of Concept Maps in the Physics Teacher Education 1

Ch VI- SENTENCE PATTERNS.

CS 598 Natural Language Processing

Enhancing Unlexicalized Parsing Performance using a Wide Coverage Lexicon, Fuzzy Tag-set Mapping, and EM-HMM-based Lexical Probabilities

Towards a corpus-based online dictionary. of Italian Word Combinations

Referencing the Danish Qualifications Framework for Lifelong Learning to the European Qualifications Framework

Learning and Retaining New Vocabularies: The Case of Monolingual and Bilingual Dictionaries

Nancy Hennessy M.Ed. 1

Syntactic and Lexical Simplification: The Impact on EFL Listening Comprehension at Low and High Language Proficiency Levels

Guatemala: Teacher-Training Centers of the Salesians

Proof Theory for Syntacticians

From general dictionaries to terminological glossaries. User expectations vs editorial aims

The Common European Framework of Reference for Languages p. 58 to p. 82

Text Type Purpose Structure Language Features Article

California Department of Education English Language Development Standards for Grade 8

Using dialogue context to improve parsing performance in dialogue systems

Automated Identification of Domain Preferences of Collocations

Derivational: Inflectional: In a fit of rage the soldiers attacked them both that week, but lost the fight.

Minimalism is the name of the predominant approach in generative linguistics today. It was first

The English Monolingual Dictionary: Its Use among Second Year Students of University Technology of Malaysia, International Campus, Kuala Lumpur

Unbalanced, Idle, Canonical and Particular: Polysemous Adjectives in English Dictionaries

The Effect of Discourse Markers on the Speaking Production of EFL Students. Iman Moradimanesh

Dear Internship Supervisor:

Linking Task: Identifying authors and book titles in verbose queries

Slovak Synonym Dictionary

The Internet as a Normative Corpus: Grammar Checking with a Search Engine

Twitter Sentiment Classification on Sanders Data using Hybrid Approach

11/29/2010. Statistical Parsing. Statistical Parsing. Simple PCFG for ATIS English. Syntactic Disambiguation

Geo Risk Scan Getting grips on geotechnical risks

Lexical Collocations (Verb + Noun) Across Written Academic Genres In English

2.1 The Theory of Semantic Fields

VOCABULARY INSTRUCTION

CORPUS ANALYSIS CORPUS ANALYSIS QUANTITATIVE ANALYSIS

On the Notion Determiner

Reference to Tenure track faculty in this document includes tenured faculty, unless otherwise noted.

Written by: YULI AMRIA (RRA1B210085) ABSTRACT. Key words: ability, possessive pronouns, and possessive adjectives INTRODUCTION

The College Board Redesigned SAT Grade 12

Using Small Random Samples for the Manual Evaluation of Statistical Association Measures

Contemporary dictionaries

Words come in categories

God e-læring skabes i samarbejde Fugl, Jette; Monty, Anita

Specification and Evaluation of Machine Translation Toy Systems - Criteria for laboratory assignments

Informatics 2A: Language Complexity and the. Inf2A: Chomsky Hierarchy

School Inspection in Hesse/Germany

More ESL Teaching Ideas

Providing student writers with pre-text feedback

On document relevance and lexical cohesion between query terms

Pontificia Universidad Católica del Ecuador Facultad de Comunicación, Lingüística y Literatura Escuela de Lenguas Sección de Inglés

Emmaus Lutheran School English Language Arts Curriculum

Controlled vocabulary

Dissertation Summaries. The Acquisition of Aspect and Motion Verbs in the Native Language (Aristotle University of Thessaloniki, 2014)

Beginners French FREN 101 University Studies Program. Course Outline

Development of the First LRs for Macedonian: Current Projects

Transcription:

Sanni Nimb, The Danish Dictionary, University of Copenhagen Collocations of Nouns: How to Present Verb-noun Collocations in a Monolingual Dictionary Abstract The paper discusses how to present in a monolingual corpus-based dictionary of Danish a certain group of lexical collocations in the noun entry, namely verb-noun collocations that are transparent in meaning. These collocations can be divided into two groups: verb phrases where the noun for inexplicable reasons selects a certain verb, while synonyms are impossible, and verb phrases where the noun is just a typical object. One way of bringing these collocations is presented and certain problems are discussed. 1. Introduction The Danish Dictionary is a comprehensive monolingual dictionary of modern Danish to be published in 1999. The dictionary is mainly based on a 40 million word corpus. One of the new things we introduce compared to other monolingual Danish dictionaries is information about the ability of words to combine with other words. One way of bringing such information is to present different types of typical collocations. Bringing typical collocations in a monolingual dictionary has two purposes: it is very useful in a situation where production of language is needed, both for native speakers and learners of the language, and it can supplement the semantic definition of the entry word. The collocations I will concentrate on in this paper are verb-noun collocations that are to be mentioned without further explanation in the lexical entry of the noun. They are all semantically transparent and will therefore not be lexically defined. With the corpus as the main source we have a very good chance of finding and passing on to the users of the dictionary different kinds of such verbs that typically occur to the left of the noun we want to describe. We have at our disposal special computer tools which measure the mutual attraction between two words in the corpus. The probability of meeting exactly those two words together is calculated, and the result is a list of words ranked by their probability of co-occurring with the entry word. For example, the result of a statistic analysis of the words to the left of the noun konsekvens ('consequence') tells us that we usually 265

EURALEX '96 PROCEEDINGS combine the word with a number of different adjectives and with the verbs overskue ('to survey'), drage ('to draw'), vurdere ('to estimate') and tage ('to take)'. A similar analysis of the noun selvmord ('a suicide') shows us that the verbs begâ ('to commit') and fors0ge ('to try') are verbs which typically appear to the left of the noun. And for the noun gavn ('benefit') a mutual analysis shows that the verbs g0re ('to do'), have ('to have') and fâ ('to get') are verbs which typically appear to the left of the noun. An example of such a list can be seen in table 1. a) (b) (c) uoverskuelige (unpredictable) 1242.39 [38] vidtraekkende (farreaching) 1152,.73 [26] milj0maessige (environmental) 801,.40 [43] yderste (utmost, absolute) 358,.91 [72] overskue (to survev) 337,.89 [25] 0kologiske (ecological) 309,.99 [33] uheldige (unsuccessful) 286,.25 [27] samfundsmaessige (social) 257,.39 [19] negative (negative) 219,.37 [31] alvorlige (serious) 218,.23 [60] drage (to draw) 167,.59 [25] hvilke (which) 158,.66 [113] vurdere (to estimate) 100,.65 [21] 0konomiske (economic) 79,.98 [82] mulige (possible) 68,.30 [29] dens(its) 27,.91 [29] politiske (political) 26,.98 [33] taget (taken) 26,.68 [62] tage (to take) 23,.08 [87] Table 1. A mutual analysis of konsekvens ("consequence"), second and first place to the left. Typical word (a), degree of probability (b), and total number of co-occurence(c): Since The Danish Dictionary is corpus-based, it has been decided that we as a fundamental principle present statistically based collocations by listing them according to the order in which they appear in the statistic analysis, without further classification. For this purpose we have reserved a special element in our dictionary structure. This principle of presenting collocations that are extracted by statistical analysis from the corpus without further classification by the 266

LEXICAL COMBINATORICS lexicographer, can, however, cause a variety of problems especially with regard to the verbs. The statistic analysis will often result in a mixed list of verbs with very different relations to the noun. By simply listing the verbs in the order in which they appear on the statistical list, we run the risk of mixing very different types of information. Some of the verb-noun collocations will be semantically predictable, for instance if the noun is just a typical object of the verb for semantic reasons (collocations like to build a house, to cure a disease etc.). Information in the dictionary on these free lexical combinations mainly serves to underline the semantic definition of the noun. Two examples of this type of collocation from the list in table 1 are the collocations overskue/vurdere konsekvenserne ('to survey/to estimate the consequences'). Other verb-noun collocations found by the statistic analysis of the corpus will be more fixed verb-noun collocations that are impossible for non-native speakers to predict and where the noun selects the verb for unexplainable syntactic reasons. These are collocations like take a drag (on a cigarette), pay attention, deliver a speech where take, pay and deliver cannot be replaced by synonyms. Some charateristics of the verbs in these fixed lexical combinations are that they are loosing their concrete semantic meaning and that they contribute very little to the meaning of the phrase, acting almost like an auxiliary verb. Such verbs are called support verbs. Information on support verbs in the dictionary serves to teach mainly non-native speakers how to construct well-formed sentences with the noun. From the list in table 1 we have the following examples of collocations that consist of support verb + noun: drage/tage konsekvenserne ('to draw/to take the consequences'). By simply listing the free lexical combinations and the more fixed support-verb constructions in the same order as they appear on the statistic list, we do not make their difference clear to the user. Another problem caused by the principle of simply listing the verbnoun collocations arises with nouns that simultaneously select a verb to the left and an obligatory prepositional phrase to the right. For example the noun gavn ('benefit') cannot stand alone in verb-noun collocations with the support verbs have ('to have') and fâ ('to get') without a prepositional phrase beginning with af ('from'). Consequently, in order to present the verb-noun collocation fa gavn ('get benefit'), we would need to mention an incomplete phrase like fâ gavn af ('to get benefit from', which we do not find very satisfactory. In order to complete the phrase we would then hope to find a frequent and typical head of the prepositional phrase in the corpus, but this is rarely possible. Since the statistical element in the dictionary must only contain words which 267

EURALEX '96 PROCEEDINGS frequently appear in the corpus, we are left with a presentational problem. The procedure adopted by The Danish Dictionary in order to solve the two above-mentioned problems will be further elaborated in the next section. 2. How the Danish Dictionary classifies the different types of verbnoun collocations As mentioned above the default method in the Danish Dictionary is simply to list verb-noun collocations in the same element in the order in which they appear on the statistical list. However, to avoid the mixing of support verbs and "free" verbs in the cases where more than one support verb figures on the list, it has been decided to deviate from this default method by grouping the support verbs irrespective of their statistic order: konsekvens sb. f0lge; virkning typisk: overskue konsekvenserne, vurdere konsekvenserne, drage/tage konsekvensen ('consequence' n. result; effect typical: survey the consequences, estimate the consequences, draw/take the consequence) Table 2. (example of lexical entry) Moreover, we have in two cases decided to move the support verbs out of the statistical element and present them in another element in the dictionary, reserved for formalized information on how the noun is construed with other words. Information in this element does not need to be based on a statistic analysis, but is meant to describe more valencylike information on the entry noun, as for instance certain prepositional phrases selected by it (e.g. a key to a door). For the cases where this element is already being used for this kind of information, we have decided also to place support verbs here. The verb is only mentioned here when the noun often occurs with the verb as well as with the prepositional phrase, though the prepositional phrase does not have to be obligatory. For the presentation of the noun konsekvens ('consequence'), which optionally selects a prepositional phrase to the right: af NGT/at.. ('of something/gerund..'), this means that we have the possiblity of mentioning the two support verbs, drage og tage ('to draw' and 'to take'), in 268

LEXICAL COMBINATORICS a formalized way instead of describing the noun as seen in the example in table 2. The two verbs for which konsekvens is just a typical object, overskue and vurdere ('to survey' and 'to estimate') will still be listed as good examples of language use in the statistical element. konsekvens sb. f0lge; virkning [drage/tage konsekvensen af NGT/at..]; typisk: overskue konsekvenserne, vurdere konsekvenserne ('consequence n. result; effect [draw/take the consequence of something/gerund] ; typical: survey the consequences, estimate the consequences' 1 ) Table 3. (example of lexical entry) For nouns selecting obligatory prepositional phrases this method also gives us the possibility of solving the presentational problem mentioned above. Both verb, noun and prepositional phrase are described in a formalized way in the construction element: [fa gavn af NGT/at..] ('[to get the benefit from something/gerund..]'). The other case where we place a support verb in the construction element is when a noun can combine only with one support verb. This is often the case with nouns that are not very frequent in the corpus, and where the statistic analysis is not informative. An example of such a case is the noun helligbr0de ('sacriledge'), which appears only 24 times in the corpus and which only selects one support verb begà ('to commit'). We will therefore place this verb-noun collocation in the construction element. For all other nouns (those which do not select a prepositional phrase, but which selects more than one support verb), we only distinguish between the cases of free lexical combinations and the cases where a verb from the statistic list is a support verb by grouping the latter in the statistic element. This means that if only one support verb figures on the statistic list there will be no notable difference between the "free" verb and the support verb. The noun selvmord ('a suicide') is an example of this. The statistical result from the mutual analysis tells us that both the support verb bega (= 'to commit') and the "free" verb fors0ge are very frequent immediately to the left, but since the noun has more than one possible support verb (also g0re, 'to do', is possible but just not very frequent), and since the noun does not select a valency-like prepositional 269

EURALEX '96 PROCEEDINGS phrase, we simply list the two typical verbs begâ and fors0ge statistic analysis as good examples of language use. from the 3. Problems regarding the presentation of verb-noun collocations The presentation of verb-noun collocations described above can of course be discussed. It might seem inconsistent that we treat verb-noun collocations differently, depending on the capability of the noun to select prepositional phrases or more than one support verb. This treatment, however, is mainly due to practical circumstances. As long as the noun does not need to be described in a formalized way, not selecting valency bound prepositional phrases, we prefer not to complicate the lexicographer's analysis more than necessary by simply mentioning the results from the statistical analysis. The Danish Dictionary is mainly corpus-based and not meant to be complete in its information on support verbs - this would need a much more detailed analysis and description of each noun. The disadvantage is of course that by simply listing different types of verbs in the same element, the user will not know when the collocation mentioned is a model to be strictly followed, or just an example of language use. Therefore we have chosen to take the step of distinguishing between the verbs in the lexical description of some nouns, because we in these cases already need to introduce a more formalized presentation of the noun in order to describe its valency. In these cases we want to underline that certain verbs also play a role in the construction of sentences with the entry word, in the hope that a more formalized presentation provides more precise guidance to the user. In the cases where only one support verb is possible we also hope that the user will perceive the formalized pattern as a model to be followed when producing sentences. References Benson, Morton 1986. The BB1 combinatory dictionary of English. Amsterdam, John Benjamins Publishing Company Heid, Ulrich 1994. "On Ways Words Work Together - Topics in Lecical Combinatorics" in: Euralex '94, Proceedings 270

Powered by TCPDF (www.tcpdf.org) LEXICAL COMBINATORICS Lyly, Erika 1993. "Halvfasta fraser - ett lexikografiskt problem" in: Nordiske Studier i Leksikogra.fi II. Skrift nr.2, Copenhagen: Nordisk Forening for Leksikografi Svensén, Bo 1993. Practical Lexicography - Principles and Methods of Dictionary Making. Oxford, Oxford University Press Feil, Ruth 1995. "Funktionsverber i det danske sprog" in: Nordiske Studier i Leksikografi 3, Reykjavik: Nordisk Forening for Leksikografi 271