THE VERB ARGUMENT BROWSER

Similar documents
CS 598 Natural Language Processing

Words come in categories

Construction Grammar. University of Jena.

Linking Task: Identifying authors and book titles in verbose queries

cmp-lg/ Jul 1995

Project in the framework of the AIM-WEST project Annotation of MWEs for translation

Introduction to HPSG. Introduction. Historical Overview. The HPSG architecture. Signature. Linguistic Objects. Descriptions.

EdIt: A Broad-Coverage Grammar Checker Using Pattern Grammar

Target Language Preposition Selection an Experiment with Transformation-Based Learning and Aligned Bilingual Data

Enhancing Unlexicalized Parsing Performance using a Wide Coverage Lexicon, Fuzzy Tag-set Mapping, and EM-HMM-based Lexical Probabilities

Approaches to control phenomena handout Obligatory control and morphological case: Icelandic and Basque

Specifying a shallow grammatical for parsing purposes

Chunk Parsing for Base Noun Phrases using Regular Expressions. Let s first let the variable s0 be the sentence tree of the first sentence.

Intra-talker Variation: Audience Design Factors Affecting Lexical Selections

5 Star Writing Persuasive Essay

Underlying and Surface Grammatical Relations in Greek consider

Linguistic Variation across Sports Category of Press Reportage from British Newspapers: a Diachronic Multidimensional Analysis

A Computational Evaluation of Case-Assignment Algorithms

Combining a Chinese Thesaurus with a Chinese Dictionary

Automated Identification of Domain Preferences of Collocations

1. Introduction. 2. The OMBI database editor

A Framework for Customizable Generation of Hypertext Presentations

Procedia - Social and Behavioral Sciences 154 ( 2014 )

Modeling full form lexica for Arabic

Web as Corpus. Corpus Linguistics. Web as Corpus 1 / 1. Corpus Linguistics. Web as Corpus. web.pl 3 / 1. Sketch Engine. Corpus Linguistics

Collocations of Nouns: How to Present Verb-noun Collocations in a Monolingual Dictionary

AQUA: An Ontology-Driven Question Answering System

Vocabulary Usage and Intelligibility in Learner Language

Developing Grammar in Context

Leveraging Sentiment to Compute Word Similarity

5 th Grade Language Arts Curriculum Map

Formulaic Language and Fluency: ESL Teaching Applications

In Udmurt (Uralic, Russia) possessors bear genitive case except in accusative DPs where they receive ablative case.

Completing the Pre-Assessment Activity for TSI Testing (designed by Maria Martinez- CARE Coordinator)

Development of the First LRs for Macedonian: Current Projects

Welcome to the Purdue OWL. Where do I begin? General Strategies. Personalizing Proofreading

A Minimalist Approach to Code-Switching. In the field of linguistics, the topic of bilingualism is a broad one. There are many

Basic Syntax. Doug Arnold We review some basic grammatical ideas and terminology, and look at some common constructions in English.

A corpus-based approach to the acquisition of collocational prepositional phrases

Methods for the Qualitative Evaluation of Lexical Association Measures

BULATS A2 WORDLIST 2

Universiteit Leiden ICT in Business

Loughton School s curriculum evening. 28 th February 2017

Compositional Semantics

Mercer County Schools

Exploiting Phrasal Lexica and Additional Morpho-syntactic Language Resources for Statistical Machine Translation with Scarce Training Data

A Domain Ontology Development Environment Using a MRD and Text Corpus

Case government vs Case agreement: modelling Modern Greek case attraction phenomena in LFG

Developing a TT-MCTAG for German with an RCG-based Parser

Search right and thou shalt find... Using Web Queries for Learner Error Detection

Modeling Attachment Decisions with a Probabilistic Parser: The Case of Head Final Structures

EAGLE: an Error-Annotated Corpus of Beginning Learner German

The Role of the Head in the Interpretation of English Deverbal Compounds

Lemmatization of Multi-word Lexical Units: In which Entry?

Building an HPSG-based Indonesian Resource Grammar (INDRA)

A Comparison of Two Text Representations for Sentiment Analysis

Korean ECM Constructions and Cyclic Linearization

California Department of Education English Language Development Standards for Grade 8

Measuring the relative compositionality of verb-noun (V-N) collocations by integrating features

Available online at ScienceDirect. Procedia Computer Science 54 (2015 )

The taming of the data:

International Examinations. IGCSE English as a Second Language Teacher s book. Second edition Peter Lucantoni and Lydia Kellas

Semantic Inference at the Lexical-Syntactic Level for Textual Entailment Recognition

CX 101/201/301 Latin Language and Literature 2015/16

The Internet as a Normative Corpus: Grammar Checking with a Search Engine

Grammar Extraction from Treebanks for Hindi and Telugu

Impact of Controlled Language on Translation Quality and Post-editing in a Statistical Machine Translation Environment

Stefan Engelberg (IDS Mannheim), Workshop Corpora in Lexical Research, Bucharest, Nov [Folie 1] 6.1 Type-token ratio

Twitter Sentiment Classification on Sanders Data using Hybrid Approach

Opportunities for Writing Title Key Stage 1 Key Stage 2 Narrative

Trend Survey on Japanese Natural Language Processing Studies over the Last Decade

On document relevance and lexical cohesion between query terms

Grade 5: Module 3A: Overview

Using dialogue context to improve parsing performance in dialogue systems

SECTION 12 E-Learning (CBT) Delivery Module

A Case Study: News Classification Based on Term Frequency

IN THIS UNIT YOU LEARN HOW TO: SPEAKING 1 Work in pairs. Discuss the questions. 2 Work with a new partner. Discuss the questions.

The Choice of Features for Classification of Verbs in Biomedical Texts

A Graph Based Authorship Identification Approach

Applications of memory-based natural language processing

The Structure of Relative Clauses in Maay Maay By Elly Zimmer

Derivational and Inflectional Morphemes in Pak-Pak Language

SINGLE DOCUMENT AUTOMATIC TEXT SUMMARIZATION USING TERM FREQUENCY-INVERSE DOCUMENT FREQUENCY (TF-IDF)

rat tail Overview: Suggestions for using the Macmillan Dictionary BuzzWord article on rat tail and the associated worksheet.

Author: Justyna Kowalczys Stowarzyszenie Angielski w Medycynie (PL) Feb 2015

Emmaus Lutheran School English Language Arts Curriculum

Assessing System Agreement and Instance Difficulty in the Lexical Sample Tasks of SENSEVAL-2

Advanced Grammar in Use

Agnès Tutin and Olivier Kraif Univ. Grenoble Alpes, LIDILEM CS Grenoble cedex 9, France

Handling Sparsity for Verb Noun MWE Token Classification

Procedia - Social and Behavioral Sciences 141 ( 2014 ) WCLTA Using Corpus Linguistics in the Development of Writing

Reading Grammar Section and Lesson Writing Chapter and Lesson Identify a purpose for reading W1-LO; W2- LO; W3- LO; W4- LO; W5-

Language Acquisition Fall 2010/Winter Lexical Categories. Afra Alishahi, Heiner Drenhaus

Senior Stenographer / Senior Typist Series (including equivalent Secretary titles)

Some Principles of Automated Natural Language Information Extraction

Grade 6: Module 4: Unit 1: Lesson 3 Tracing a Speaker s Argument: John Stossel DDT Video

Indian Institute of Technology, Kanpur

THE ROLE OF DECISION TREES IN NATURAL LANGUAGE PROCESSING

John Benjamins Publishing Company

Grammars & Parsing, Part 1:

Transcription:

THE VERB ARGUMENT BROWSER Bálint Sass sass.balint@itk.ppke.hu Péter Pázmány Catholic University, Budapest, Hungary 11 th International Conference on Text, Speech and Dialog 8-12 September 2008, Brno

PREVIEW A corpus query tool for expressions like... verb subcategorization frames institutionalized phrases light verb constructions idiomatic verbal expressions figures of speech common property: verb + arguments uniform framework Motivation: to help in manually building lexical resources Future work: apply the methodology to other languages

1 SENTENCE MODEL 2 VERBAL CONSTRUCTIONS AS COLLOCATIONS 3 USAGE & EXAMPLES 4 APPLICATIONS 5 GENERALIZATION

1 SENTENCE MODEL 2 VERBAL CONSTRUCTIONS AS COLLOCATIONS 3 USAGE & EXAMPLES 4 APPLICATIONS 5 GENERALIZATION

SENTENCE MODEL Basic unit: simple sentence or clause. A lány váll-at von. the girl shoulder-acc pull. The girl shrugs her shoulder. Clause = verb + set of arguments verb=von NOM=lány ACC=váll verb=shrug SUBJ=girl OBJ=shoulder Positions: defined... syntactically: order (in English) morphologically: case markers (in Hungarian)

SENTENCE MODEL Basic unit: simple sentence or clause. A lány váll-at von. the girl shoulder-acc pull. The girl shrugs her shoulder. Clause = verb + set of arguments verb=von NOM=lány ACC=váll verb=shrug SUBJ=girl OBJ=shoulder Positions: defined... syntactically: order (in English) morphologically: case markers (in Hungarian)

SENTENCE MODEL in Hungarian: 20 different case markers in English: usually prepositions case marker case abbr. English - nominative NOM word order -t accusative ACC word order -ban inessive INE in-phrase -ról delative DEL from-phrase 1 -ból elative ELA from-phrase 2...

EXAMPLES Az emberek az időjárás-ról beszélnek. the people the weather-del talk. People talk about the weather. verb=beszél NOM=ember DEL=időjárás verb=talk SUBJ=people ABOUT=weather Péter fél az ismeretlen-től. Peter fear the unknown-abl. Peter fears of the unknown. verb=fél NOM=Péter ABL=ismeretlen verb=fear SUBJ=Peter OF=unknown

EXAMPLES Az emberek az időjárás-ról beszélnek. the people the weather-del talk. People talk about the weather. verb=beszél NOM=ember DEL=időjárás verb=talk SUBJ=people ABOUT=weather Péter fél az ismeretlen-től. Peter fear the unknown-abl. Peter fears of the unknown. verb=fél NOM=Péter ABL=ismeretlen verb=fear SUBJ=Peter OF=unknown

EXAMPLES Az emberek az időjárás-ról beszélnek. the people the weather-del talk. People talk about the weather. verb=beszél NOM=ember DEL=időjárás verb=talk SUBJ=people ABOUT=weather Péter fél az ismeretlen-től. Peter fear the unknown-abl. Peter fears of the unknown. verb=fél NOM=Péter ABL=ismeretlen verb=fear SUBJ=Peter OF=unknown

FIXED AND FREE POSITIONS Hogy jöttek lét-re az első csillagok? how came existence-sub the first stars? How the first stars came into existence? verb=jön SUB=lét NOM=csillagok verb=come INTO=existence SUBJ=stars fixed position: cannot change the word without changing the meaning free position: can change the word without changing the meaning

FIXED AND FREE POSITIONS Hogy jöttek lét-re az első csillagok? how came existence-sub the first stars? How the first stars came into existence? verb=jön SUB=lét NOM=csillagok verb=come INTO=existence SUBJ=stars fixed position: cannot change the word without changing the meaning free position: can change the word without changing the meaning

MULTI WORD VERBS lét-re jön existence-sub come come into existence multi word verb: verb stem + fixed position(s) separate meaning own argument structure rész-t vesz ban part-acc take INE take part in sg

SENTENCE MODEL sentence = verb + set of arguments representation of arguments: position + lemma i.e. verb=jön SUB=lét NOM=csillagok verb=come INTO=existence SUBJ=stars

CORPUS PREPARATION Input: Hungarian National Corpus (POS-tagged and disambiguated) clause detection regexps based on conjunction and punctuation patterns verb normalization e.g. separated verbal prefixes attached noun phrase chunking case and lemma of the head of argument phrases representation according to the model

1 SENTENCE MODEL 2 VERBAL CONSTRUCTIONS AS COLLOCATIONS 3 USAGE & EXAMPLES 4 APPLICATIONS 5 GENERALIZATION

VERBAL CONSTRUCTIONS AS COLLOCATIONS We search for collocations in the space of these structures: verb=jön SUB=lét NOM=csillagok verb=come INTO=existence SUBJ=stars IDEA Apply an association measure taking... the lemma in one particular position as one unit, all other parts of the verb frame as the other unit of the collocation.

VERBAL CONSTRUCTIONS AS COLLOCATIONS We search for collocations in the space of these structures: verb=jön SUB=lét NOM=? verb=come INTO=existence SUBJ=? IDEA Apply an association measure taking... the lemma in one particular position as one unit, all other parts of the verb frame as the other unit of the collocation.

VERBAL CONSTRUCTIONS AS COLLOCATIONS The Verb Argument Browser can answer the following typical research question: What are the salient words which can appear in a free position of a given verb frame? What are the most important collocates of a given verb (or verb frame) in a particular morphosyntactic position? Association measure: salience (adjusted mutual information) f (x, y) S(x, y) = log 2 f (y) log 2 N f (x) f (y)

VERBAL CONSTRUCTIONS AS COLLOCATIONS Important property of the Verb Argument Browser: It can treat not just a single word but a whole verb frame (a verb together with some arguments) as one unit in collocation extraction. It can collect... salient subjects of a verb, salient objects of a given verb subject pair, salient locatives of a given verb subject object triplet...

1 SENTENCE MODEL 2 VERBAL CONSTRUCTIONS AS COLLOCATIONS 3 USAGE & EXAMPLES 4 APPLICATIONS 5 GENERALIZATION

USAGE Hungarian National Corpus integrated (187 million running words) response times: a few seconds

USAGE Hungarian National Corpus integrated (187 million running words) response times: a few seconds

USAGE Hungarian National Corpus integrated (187 million running words) response times: a few seconds

USAGE Hungarian National Corpus integrated (187 million running words) response times: a few seconds

USAGE Hungarian National Corpus integrated (187 million running words) response times: a few seconds

Query: kér t tól ask ACC ABL ask sy sg verb=kér ABL=? ACC=? verb=ask INDIR=? OBJ=?

Query: kér t tól ask ACC ABL ask sy sg verb=kér ABL=? ACC=? verb=ask INDIR=? OBJ=? Result: (Most salient direct objects:) bocsánat forgiveness segítség help elnézés also forgiveness engedély permission...

Query: kér t tól ask ACC ABL ask sy sg verb=kér ABL=? ACC=? verb=ask INDIR=? OBJ=? Result: (Most salient direct objects:) bocsánat forgiveness segítség help elnézés also forgiveness engedély permission... for English? question favour...

Query: vesz figyelem-ba t take consideration-ill ACC take sg into consideration verb=vesz ILL=figyelem ACC=? verb=take INTO=consideration OBJ=?

Query: vesz figyelem-ba t take consideration-ill ACC take sg into consideration verb=vesz ILL=figyelem ACC=? verb=take INTO=consideration OBJ=? Result: (Most salient direct objects:) szempont aspect érdek interest vélemény opinion... for English? Probably the same.

Query: ad t give ACC give sg verb=ad verb=give ACC=? OBJ=?

Query: ad t give ACC give sg verb=ad verb=give ACC=? OBJ=? Result: (Most salient direct objects:) hang voice to give voice to sg hír news to give news to report igaz true to give true to take sy s side... multi word verbs

Query: üt strike NOM sg strikes verb=üt verb=strike NOM=? SUBJ=?

Query: üt strike NOM sg strikes verb=üt verb=strike NOM=? SUBJ=? Result: (Some salient subjects:) óra clock The clock strikes twelve. forint 10 Ft strikes his palm. He receives 10 Ft. kő stone Üsse kő! Let a stone strike it! It does not matter.... multi word verbs, figures of speech

COLLECTING MWVS Important property of the Verb Argument Browser: Investigating a specific position, the tool provides constructions with this position fixed if there is any such construction (e.g. light verb constructions, idiomatic verbal expressions, figures of speech). kick + OBJ bucket eat + OBJ some kinds of food Verbal expressions with fixed position(s) are frequent, they are not to be ignored, they should be included in language models.

1 SENTENCE MODEL 2 VERBAL CONSTRUCTIONS AS COLLOCATIONS 3 USAGE & EXAMPLES 4 APPLICATIONS 5 GENERALIZATION

APPLICATIONS lexical database development of a Hungarian to English machine translation system http://www.webforditas.hu searching for MWVs to include them into the Hungarian WordNet lexicography language teaching

FUTURE WORK We are planning to create a Hungarian verb frame frequency dictionary based on this tool. If you specify a verb frame, the Verb Argument Browser tells which are the important lemmas in a chosen position. QUESTION How to collect automatically all important constructions of a verb?

1 SENTENCE MODEL 2 VERBAL CONSTRUCTIONS AS COLLOCATIONS 3 USAGE & EXAMPLES 4 APPLICATIONS 5 GENERALIZATION

GENERALIZATION The database can be anything which fits the model: a bigger unit which has positions and these positions can be filled by particular items. It is possible to use the methodology to investigate argument structure of adjectives or nouns. The sentence model is in essence language independent. The methodology can be extended to other languages, if a shallow parsed, adequately processed corpus is available.

SUMMARY Verb Argument Browser sentence model + collocation extraction important verbal constructions language independent methodology available for Hungarian: http://corpus.nytud.hu/vab (username: tsd; password: vab)... other languages? Contact: sass.balint@itk.ppke.hu

SUMMARY Verb Argument Browser sentence model + collocation extraction important verbal constructions language independent methodology available for Hungarian: http://corpus.nytud.hu/vab (username: tsd; password: vab)... other languages? Contact: sass.balint@itk.ppke.hu Thank you for your attention!