Contemporary dictionaries

Similar documents
1. Introduction. 2. The OMBI database editor

The development of a new learner s dictionary for Modern Standard Arabic: the linguistic corpus approach

Modeling full form lexica for Arabic

English Language and Applied Linguistics. Module Descriptions 2017/18

BULATS A2 WORDLIST 2

A corpus-based approach to the acquisition of collocational prepositional phrases

English for Life. B e g i n n e r. Lessons 1 4 Checklist Getting Started. Student s Book 3 Date. Workbook. MultiROM. Test 1 4

Lemmatization of Multi-word Lexical Units: In which Entry?

Taught Throughout the Year Foundational Skills Reading Writing Language RF.1.2 Demonstrate understanding of spoken words,

Automated Identification of Domain Preferences of Collocations

1 st Quarter (September, October, November) August/September Strand Topic Standard Notes Reading for Literature

Formulaic Language and Fluency: ESL Teaching Applications

Education & Training Plan Civil Litigation Specialist Certificate Program with Externship

CS 598 Natural Language Processing

ELA/ELD Standards Correlation Matrix for ELD Materials Grade 1 Reading

Collocations of Nouns: How to Present Verb-noun Collocations in a Monolingual Dictionary

Evaluation of Usage Patterns for Web-based Educational Systems using Web Mining

Evaluation of Usage Patterns for Web-based Educational Systems using Web Mining

A Minimalist Approach to Code-Switching. In the field of linguistics, the topic of bilingualism is a broad one. There are many

COMPETENCY-BASED STATISTICS COURSES WITH FLEXIBLE LEARNING MATERIALS

Advanced Grammar in Use

1.2 Interpretive Communication: Students will demonstrate comprehension of content from authentic audio and visual resources.

Web as Corpus. Corpus Linguistics. Web as Corpus 1 / 1. Corpus Linguistics. Web as Corpus. web.pl 3 / 1. Sketch Engine. Corpus Linguistics

What the National Curriculum requires in reading at Y5 and Y6

First Grade Curriculum Highlights: In alignment with the Common Core Standards

K 1 2 K 1 2. Iron Mountain Public Schools Standards (modified METS) Checklist by Grade Level Page 1 of 11

Introduction Brilliant French Information Books Key features

Procedia - Social and Behavioral Sciences 154 ( 2014 )

Intra-talker Variation: Audience Design Factors Affecting Lexical Selections

THE VERB ARGUMENT BROWSER

Read&Write Gold is a software application and can be downloaded in Macintosh or PC version directly from

Words come in categories

Emmaus Lutheran School English Language Arts Curriculum

ELD CELDT 5 EDGE Level C Curriculum Guide LANGUAGE DEVELOPMENT VOCABULARY COMMON WRITING PROJECT. ToolKit

Houghton Mifflin Reading Correlation to the Common Core Standards for English Language Arts (Grade1)

Opportunities for Writing Title Key Stage 1 Key Stage 2 Narrative

CEFR Overall Illustrative English Proficiency Scales

Learning Disability Functional Capacity Evaluation. Dear Doctor,

Word Stress and Intonation: Introduction

Technologies in Computerized Lexicography

LEXICAL COHESION ANALYSIS OF THE ARTICLE WHAT IS A GOOD RESEARCH PROJECT? BY BRIAN PALTRIDGE A JOURNAL ARTICLE

FOREWORD.. 5 THE PROPER RUSSIAN PRONUNCIATION. 8. УРОК (Unit) УРОК (Unit) УРОК (Unit) УРОК (Unit) 4 80.

Linguistic Variation across Sports Category of Press Reportage from British Newspapers: a Diachronic Multidimensional Analysis

PROJECT STAGE SUGGESTED ACTIVITIES STAGE TIME

CORPUS ANALYSIS CORPUS ANALYSIS QUANTITATIVE ANALYSIS

Derivational and Inflectional Morphemes in Pak-Pak Language

MULTIMEDIA Motion Graphics for Multimedia

Project in the framework of the AIM-WEST project Annotation of MWEs for translation

Improved Effects of Word-Retrieval Treatments Subsequent to Addition of the Orthographic Form

Introduction to Moodle

Introduction to HPSG. Introduction. Historical Overview. The HPSG architecture. Signature. Linguistic Objects. Descriptions.

National University of Singapore Faculty of Arts and Social Sciences Centre for Language Studies Academic Year 2014/2015 Semester 2

AQUA: An Ontology-Driven Question Answering System

5 Star Writing Persuasive Essay

Subject: Opening the American West. What are you teaching? Explorations of Lewis and Clark

Programma di Inglese

Sample Goals and Benchmarks

Primary English Curriculum Framework

Platform for the Development of Accessible Vocational Training

SINGLE DOCUMENT AUTOMATIC TEXT SUMMARIZATION USING TERM FREQUENCY-INVERSE DOCUMENT FREQUENCY (TF-IDF)

Corpus Linguistics (L615)

CODE Multimedia Manual network version

AN INTRODUCTION (2 ND ED.) (LONDON, BLOOMSBURY ACADEMIC PP. VI, 282)

Stefan Engelberg (IDS Mannheim), Workshop Corpora in Lexical Research, Bucharest, Nov [Folie 1] 6.1 Type-token ratio

Construction Grammar. University of Jena.

Intensive English Program Southwest College

Grade 7. Prentice Hall. Literature, The Penguin Edition, Grade Oregon English/Language Arts Grade-Level Standards. Grade 7

A faculty approach -learning tools. Audio Tools Tutorial and Presentation software Video Tools Authoring tools

Treebank mining with GrETEL. Liesbeth Augustinus Frank Van Eynde

Written by: YULI AMRIA (RRA1B210085) ABSTRACT. Key words: ability, possessive pronouns, and possessive adjectives INTRODUCTION

Developing Grammar in Context

Using Moodle in ESOL Writing Classes

Think A F R I C A when assessing speaking. C.E.F.R. Oral Assessment Criteria. Think A F R I C A - 1 -

Epping Elementary School Plan for Writing Instruction Fourth Grade

Minimalism is the name of the predominant approach in generative linguistics today. It was first

Education for an Information Age

Common Core State Standards for English Language Arts

Analysis of Lexical Structures from Field Linguistics and Language Engineering

BENCHMARKING OF FREE AUTHORING TOOLS FOR MULTIMEDIA COURSES DEVELOPMENT

Introduction to Yearbook / Newspaper Course Syllabus

Reading Grammar Section and Lesson Writing Chapter and Lesson Identify a purpose for reading W1-LO; W2- LO; W3- LO; W4- LO; W5-

ENGBG1 ENGBL1 Campus Linguistics. Meeting 2. Chapter 7 (Morphology) and chapter 9 (Syntax) Pia Sundqvist

Prentice Hall Literature Common Core Edition Grade 10, 2012

Books Effective Literacy Y5-8 Learning Through Talk Y4-8 Switch onto Spelling Spelling Under Scrutiny

Spring 2017 DUTCH 101 Online University of Waterloo

EdIt: A Broad-Coverage Grammar Checker Using Pattern Grammar

LANGUAGE IN INDIA Strength for Today and Bright Hope for Tomorrow Volume 11 : 12 December 2011 ISSN

Parsing of part-of-speech tagged Assamese Texts

Writing a composition

Constructing Parallel Corpus from Movie Subtitles

IN THIS UNIT YOU LEARN HOW TO: SPEAKING 1 Work in pairs. Discuss the questions. 2 Work with a new partner. Discuss the questions.

Procedia - Social and Behavioral Sciences 141 ( 2014 ) WCLTA Using Corpus Linguistics in the Development of Writing

On the Notion Determiner

English For All. Episode Guide. A General Description of EFA and A Guide to the Content and Learning Elements of Each Episode

The Effect of Extensive Reading on Developing the Grammatical. Accuracy of the EFL Freshmen at Al Al-Bayt University

Mercer County Schools

Ch VI- SENTENCE PATTERNS.

Review in ICAME Journal, Volume 38, 2014, DOI: /icame

Year 4 National Curriculum requirements

Context Free Grammars. Many slides from Michael Collins

Transcription:

Contemporary dictionaries Algemeen Nederlands Woordenboek Frequency Dictionary of Dutch

Frequency Dictionary Published in 2014 by Routledge One of a series of frequency dictionaries Book and CD-rom Written in English; Dutch words translated Top 5000 of Dutch words in the Netherlands and Belgium

Frequency Dictionary Based on a corpus of ca. 290.000.000 words Spoken and written sources Literature, newspapers and web Example sentences automatically selected with Sketch Engine (GDEX; Good Dictionary EXamples)

Frequency lists

Thematic boxes

Algemeen Nederlands Woordenboek (ANW) A Dictionary of Contemporary Dutch Tanneke Schoonheim: tanneke.schoonheim@inl.nl

The ANW - synchronic scholarly dictionary of comtemporary Dutch in Belgium and the Netherlands - describing words from 1970 onwards - only digitally available; no printed version - basic words and neologisms - semasiological and onomasiological - many information categories; much more than just word meanings

Editing ANW articles Corpus in Sketch Engine Dictionary Writing System Online application Screenshot toevoegen

ANW Corpus from 1970 onwards more than 100.000.000 tokens, regularly updated source material from the Netherlands, Belgium (and Surinam) newspapers, web material, literature knvb.nl voedingscentrum.nl dieren.startpagina.nl

SketchEngine

ANW in SketchEngine concordances word sketches special features

Concordance

Word Sketch

Special features

Special features

Dictionary Writing System All functionalities needed for writing dictionary entries

Dictionary Writing System corpus (Sketch Engine) entry list with content management system editorial guidelines and memos other dictionaries and secundary sources internet sources editor

Editor in-house developed written in Java entries stored in MySQL database entries link to other entries using persistent identifiers builds the user interface from a xml-schema

Linked to SketchEngine

Copy examples

Copy examples

Entries partly edited entries (june 2014: ca. 14.000) edited information on part of speech, spelling, abbreviation, pronunciation and use links to concordances in corpus links to information in other dictionaries fully edited entries (june 2014: ca. 15.000) edited information on part of speech, spelling, abbreviation, pronunciation and use edited information on word meanings, word combinations and collocations edited example sentences

Partly edited entry

Fully edited entry

Part of Speech noun; adjective; verb; adverb; preposition, etc. noun: word type (appellative, proper name) word gender (male, female, neuter or a combination) article (de, het or both) number (no singular, no plural, rare in singular, rare in plural) word class (personal name; abstractum; collective noun, etc.)

Part of Speech noun; adjective; verb; adverb; preposition, etc. verb: function (auxiliary verb, intransitive verb) syntactic class (transitive, intransitive, reflexive, or a combination) flexion (weak, strong, irregular, or a combination) auxiliary verb (hebben, zijn or both)

Spelling official Dutch spelling; also for neologisms 1 aprilgrap, e-fiets, facebooken abbreviation 1 april.grap, e-fiets, face.boo.ken variants 1 aprilgrap, eenaprilgrap

Pronunciation amount of syllables position of the main stress way of pronunciation phonetic transcription cornedbeef 3 syllables; stress on 2 nd syllable Dutch pronunciation [kɔrˈnɛtbif]

Morphology types simplex (hand, huis) derivation (handig, huisje) compound (handschoen, huissleutel) acronym (aids, NATO) blend (smirten, twitteratuur) shortening (appen < whatsappen)

Pragmatics language variety (Dutch in Belgium, Dutch in the Netherlands, Dutch in Surinam) style (formal, informal, vulgar, etc.) attitude (ironic, sarcastic, offensive, etc.) domain (law, politics, sport, etc.) frequency in the ANW-corpus time (archaic, neologism, etc.) medium (spoken language, written language)

Definitions Analytical definitions Short definitions Semantic collocators Remarks

Definitions

Lexical relations hyperonymy/hyponymy huis > gebouw; gebouw > huis synonymy fiets rijwiel antonymy zwart wit andronym/feminym boer boerin; Bulgaar Bulgaarse

Semagram A semagram is a conceptual structure that describes a lexical concept on the basis of its characteristics invented by Fons Moerdijk, former editor-in-chief ANW presentation of word knowledge in a frame with slots and fillers

Semagram slots are conceptual elements naming characteristics and relations of words, e.g. colour, size, place, etc. fillers are the data in the slots, e.g. is yellow, is big, lives in a birds nest, etc. part of the information can be encyclopaedic particularly useful for nouns, but also for verbs and adjectives

Why semagrams? There is often more relevant information on words than you can fit in a definition without making it unreadable for the dictionary user The definition contains the prototypical lexical semantic information on the words, the semagram contains also other relevant information Semagrams are well suited for electronic dictionaries such as the ANW, in which it is easy to search for specific information The semagram helps to formulate the right definition

Combinations Word combinations are well-known, rather conventional syntactic combinations of words. You understand a word combination because you know the meaning of the separate words that are part of it. to go to the cinema to make a decision to drink beer to smoke a cigarette

Cigarette as object to a verb een sigaret aansteken; een sigaret opsteken; een sigaret roken; een sigaret oproken; een sigaret inhaleren; een sigaret doven; een sigaret uitdoven; een sigaret uitdrukken; een sigaret uitduwen; een sigaret uitmaken; een sigaret draaien; een sigaret rollen; een sigaret aanbieden; een sigaret krijgen; een sigaret nemen; een sigaret presenteren; een sigaret bietsen sigaretten halen; sigaretten kopen; sigaretten verkopen; sigaretten smokkelen

Cigarette in other combinations Combinations with an adjective een nieuwe sigaret; een verse sigaret; de laatste sigaret; zijn laatste sigaret; een Amerikaanse sigaret; een Egyptische sigaret; een Engelse sigaret; een Franse sigaret; een Turkse sigaret; Amerikaanse sigaretten; Egyptische sigaretten; Engelse sigaretten; Franse sigaretten; Turkse sigaretten; een dunne sigaret; een losse sigaret; een lichte sigaret; lichte sigaretten; de eeuwige sigaret; zijn eeuwige sigaret; gewone sigaretten; een halve sigaret Combinations with a substantive een pakje sigaretten; een slof sigaretten; een paar sigaretten

Collocations Fixed idiomatic word groups, often in figurative speech Sometimes a few words, sometimes formula-like sentences The meaning of the collocation is not (easy) to deduct from the separate parts of it. klein bier nothing important; nothing to worry about eerste viool leading violinist in an orchestra gele trui jersey of the leader in the Tour de France

Collocation gele trui

Word family Word formations of which the headword is one of the elements. Derivations Compounds Others

Word family of azijn vinegar

Examples Example sentences illustrate the meaning of an entry Taken from the corpus via the SketchEngine If necessary taken from the internet, e.g. neologisms Inserted bij the lexicographers Corrected by the lexicographical assistents Multiple examples per entry Multiple examples per meaning At least one example per collocation

Examples in the SketchEngine

Examples in the ANW editor

Hypermedia Sometimes it is difficult to capture a concept in words. Images and sounds support the definition in the ANW pictures sounds movie clips

Hypermedia

Etymology ANW is a synchronic dictionary. ANW lexicographers don t write new etymologies. Etymological information in the ANW: link to www.etymologiebank.nl neologisms: information on first appearance, reason of introduction, inventor of the word, motive for the word, etc.

Data analysis flow Lexicographic Assistants: check automatically compiled information, grammatical information, word family Lexicographers: add word class, definition, word relations, combinations, collocations and examples Lexicographic Assistants: check data and examples, add multimedia Editor in chief/project manager: proofreading Lexicographers: add corrections Lexicographic Assistants: check multimedia Editor in chief/project manager: final check GO ONLINE

The online application http://anw.inl.nl

Technical information The application is written in Java. The user interface consists of HTML, CSS en Javascript/ECMAScript. The application uses components of the Apache Software Foundation, e.g. Tomcat, Lucene, Xalan, Log4J en Velocity. The application uses MySQL as database. For the application a querytaal called FunQY was developed. The application is tested under Explorer 6-8, Firefox 3 and Safari 4.

Word Meaning

Meaning Word

Features Word

Find examples

Neologisms

Neologisms

Help and Information

Logfiles from 12/2009 Logfile analysis Google Analytics from 4/2012

General and Technical details Period: 12/2009 3/2013 Number of pageviews: ± 2,088,000 Number of searches: ± 236,000 Number of unique IP-addresses: ± 591,000 Number of sessions: ± 857,000

General and Technical details Robots Bing! Google!

Used browsers Opera Firefox Safari Chrome Internet Explorer 0.00% 10.00% 20.00% 30.00% 40.00% 50.00% 60.00%

General and Technical details Desktop versus mobile 100% 90% 80% 70% 60% 50% 40% 30% 20% 10% 0% PC e.d. Mobile

General and Technical details Referrers

General and Technical details Users by country UA CN GB FR DE US BE NL 0% 10% 20% 30% 40% 50% 60% 70%

General and Technical details Pageviews per session 70.00% 60.00% 50.00% 40.00% 30.00% 20.00% 10.00% 0.00% 1 2 3 4 5 6 7 8 9 10

General and Technical details Entries viewed per session 70.00% 60.00% 50.00% 40.00% 30.00% 20.00% 10.00% 0.00% 1 2 3 4 5 6 7

General and Technical details Searches per session 100.00% 90.00% 80.00% 70.00% 60.00% 50.00% 40.00% 30.00% 20.00% 10.00% 0.00% 0 1 2 3 4

General and Technical details I have a PHD, I don t need help. Don t look at help I didn t spot the big, red, flashing Help link! 50 : 1 I love curling up with hot cocoa and a thick manual! I was hired to lead, not to read! I m just really stubborn. Oooh, shiny!

General and Technical details When is the ANW used?

Query details Searching the ANW

Query details Word > Meaning top 30-2012 1. b*r 2. bi?r* 3. googelen 4. sterrenkind 5. proactief 6. ook niet 7. grexit 8. Stool 9. schermtijd 10. verstarring 11. y 12. Qaidastrijder 13. koe 14. Q 15. balen 16. q 17. opportuun 18. voorproefje 19. yammeraar 20. pandapunten 21. aardhommel 22. huis 23. hybridekameel 24. mogelijkheid 25. hond 26. lolbroekerij 27. boek 28. algemeen 29. waarheidsgetrouw 30. aap

Query details Features > Words top searches 2012 Time: Language variety: Pronunciation manner: Origin: neologism (mainly) in Belgium German loanword

Conclusions after logfile analysis We should try to accommodate both older browsers and modern mobile devices Security matters. Search engine optimization and strategic partnerships with popular sites are the most promising way of increasing traffic. We only have a short time to hook our users. The interface should be self-explanatory and engaging.

Some publications on the ANW Tanneke Schoonheim and Rob Tempelaars (2010), 'Dutch Lexicography in Progress, The Algemeen Nederlands Woordenboek (ANW)'. In: Anne Dykstra and Tanneke Schoonheim (eds.), Proceedings of the XIV Euralex International Congress. Ljouwert. http://www.euralex.org/elx_proceedings/euralex2010/ 059_Euralex_2010_3_SCHOONHEIM TEMPELAARS_Dutch Lexicography in Progress_the Algemeen Nederlands Woordenboek_ANW.pdf

Some publications on the ANW Jan Niestadt (2009), 'De ANW-artikeleditor: software als strategie', in: E. Beijk, e.a. (red.), Fons verborum. Leiden/Amsterdam, pp. 215-222. www.inl.nl/images/stories/onderzoek_en_onderwijs/p ublicaties/fonsverborum2009/niestadt.pdf Carole Tiberius en Adam Kilgarriff (2009), 'The Sketch Engine for Dutch with the ANW corpus', in: E. Beijk e.a. (red.), Fons verborum. Leiden/Amsterdam, pp. 237-255. www.inl.nl/images/stories/onderzoek_en_onderwijs/p ublicaties/fonsverborum2009/tiberius_kilgarriff.pdf