Semantic Word Sketches

Similar documents
Project in the framework of the AIM-WEST project Annotation of MWEs for translation

Vocabulary Usage and Intelligibility in Learner Language

Developing a large semantically annotated corpus

Enhancing Unlexicalized Parsing Performance using a Wide Coverage Lexicon, Fuzzy Tag-set Mapping, and EM-HMM-based Lexical Probabilities

The MEANING Multilingual Central Repository

Context Free Grammars. Many slides from Michael Collins

Word Sense Disambiguation

Automated Identification of Domain Preferences of Collocations

SEMAFOR: Frame Argument Resolution with Log-Linear Models

Target Language Preposition Selection an Experiment with Transformation-Based Learning and Aligned Bilingual Data

Web as Corpus. Corpus Linguistics. Web as Corpus 1 / 1. Corpus Linguistics. Web as Corpus. web.pl 3 / 1. Sketch Engine. Corpus Linguistics

Linking Task: Identifying authors and book titles in verbose queries

Introduction to HPSG. Introduction. Historical Overview. The HPSG architecture. Signature. Linguistic Objects. Descriptions.

Unsupervised Learning of Narrative Schemas and their Participants

Measuring the relative compositionality of verb-noun (V-N) collocations by integrating features

Modeling Attachment Decisions with a Probabilistic Parser: The Case of Head Final Structures

CS 598 Natural Language Processing

LQVSumm: A Corpus of Linguistic Quality Violations in Multi-Document Summarization

LTAG-spinal and the Treebank

Compositional Semantics

Grammar Extraction from Treebanks for Hindi and Telugu

EdIt: A Broad-Coverage Grammar Checker Using Pattern Grammar

Syntax Parsing 1. Grammars and parsing 2. Top-down and bottom-up parsing 3. Chart parsers 4. Bottom-up chart parsing 5. The Earley Algorithm

Chunk Parsing for Base Noun Phrases using Regular Expressions. Let s first let the variable s0 be the sentence tree of the first sentence.

Chapter 4: Valence & Agreement CSLI Publications

1. Introduction. 2. The OMBI database editor

DKPro WSD A Generalized UIMA-based Framework for Word Sense Disambiguation

Procedia - Social and Behavioral Sciences 141 ( 2014 ) WCLTA Using Corpus Linguistics in the Development of Writing

Can Human Verb Associations help identify Salient Features for Semantic Verb Classification?

Basic Parsing with Context-Free Grammars. Some slides adapted from Julia Hirschberg and Dan Jurafsky 1

Natural Language Processing. George Konidaris

A Semantic Similarity Measure Based on Lexico-Syntactic Patterns

On document relevance and lexical cohesion between query terms

Leveraging Sentiment to Compute Word Similarity

Argument structure and theta roles

Graph Alignment for Semi-Supervised Semantic Role Labeling

A Comparison of Two Text Representations for Sentiment Analysis

Outline. Web as Corpus. Using Web Data for Linguistic Purposes. Ines Rehbein. NCLT, Dublin City University. nclt

An Evaluation of POS Taggers for the CHILDES Corpus

! # %& ( ) ( + ) ( &, % &. / 0!!1 2/.&, 3 ( & 2/ &,

Chinese Language Parsing with Maximum-Entropy-Inspired Parser

2/15/13. POS Tagging Problem. Part-of-Speech Tagging. Example English Part-of-Speech Tagsets. More Details of the Problem. Typical Problem Cases

Parsing of part-of-speech tagged Assamese Texts

Which verb classes and why? Research questions: Semantic Basis Hypothesis (SBH) What verb classes? Why the truth of the SBH matters

Developing Grammar in Context

Prediction of Maximal Projection for Semantic Role Labeling

A Statistical Approach to the Semantics of Verb-Particles

Semi-supervised methods of text processing, and an application to medical concept extraction. Yacine Jernite Text-as-Data series September 17.

Procedia - Social and Behavioral Sciences 154 ( 2014 )

11/29/2010. Statistical Parsing. Statistical Parsing. Simple PCFG for ATIS English. Syntactic Disambiguation

A Re-examination of Lexical Association Measures

The Discourse Anaphoric Properties of Connectives

The Smart/Empire TIPSTER IR System

Language Acquisition Fall 2010/Winter Lexical Categories. Afra Alishahi, Heiner Drenhaus

Construction Grammar. University of Jena.

The Choice of Features for Classification of Verbs in Biomedical Texts

THE VERB ARGUMENT BROWSER

A Minimalist Approach to Code-Switching. In the field of linguistics, the topic of bilingualism is a broad one. There are many

Developing a TT-MCTAG for German with an RCG-based Parser

A Dataset of Syntactic-Ngrams over Time from a Very Large Corpus of English Books

Some Principles of Automated Natural Language Information Extraction

UNIVERSITY OF OSLO Department of Informatics. Dialog Act Recognition using Dependency Features. Master s thesis. Sindre Wetjen

Development of the First LRs for Macedonian: Current Projects

ENGBG1 ENGBL1 Campus Linguistics. Meeting 2. Chapter 7 (Morphology) and chapter 9 (Syntax) Pia Sundqvist

Indian Institute of Technology, Kanpur

Ensemble Technique Utilization for Indonesian Dependency Parser

The Internet as a Normative Corpus: Grammar Checking with a Search Engine

Specifying a shallow grammatical for parsing purposes

Multilingual Sentiment and Subjectivity Analysis

Chapter 9 Banked gap-filling

BULATS A2 WORDLIST 2

2.1 The Theory of Semantic Fields

Generation of Referring Expressions: Managing Structural Ambiguities

SAMPLE PAPER SYLLABUS

Predicting Student Attrition in MOOCs using Sentiment Analysis and Neural Networks

BANGLA TO ENGLISH TEXT CONVERSION USING OPENNLP TOOLS

Lexical Collocations (Verb + Noun) Across Written Academic Genres In English

Intra-talker Variation: Audience Design Factors Affecting Lexical Selections

A Corpus of Preposition Supersenses

Grammatical constructions, frame structure, and metonymy: Their contributions to metaphor computation

Distant Supervised Relation Extraction with Wikipedia and Freebase

Inleiding Taalkunde. Docent: Paola Monachesi. Blok 4, 2001/ Syntax 2. 2 Phrases and constituent structure 2. 3 A minigrammar of Italian 3

A Bayesian Learning Approach to Concept-Based Document Classification

Grammars & Parsing, Part 1:

SINGLE DOCUMENT AUTOMATIC TEXT SUMMARIZATION USING TERM FREQUENCY-INVERSE DOCUMENT FREQUENCY (TF-IDF)

Annotation Projection for Discourse Connectives

Proceedings of the 19th COLING, , 2002.

The stages of event extraction

The Evolution of Random Phenomena

Control and Boundedness

Derivational: Inflectional: In a fit of rage the soldiers attacked them both that week, but lost the fight.

Sight Word Assessment

Informatics 2A: Language Complexity and the. Inf2A: Chomsky Hierarchy

ESSLLI 2010: Resource-light Morpho-syntactic Analysis of Highly

Ch VI- SENTENCE PATTERNS.

Treebank mining with GrETEL. Liesbeth Augustinus Frank Van Eynde

Building an HPSG-based Indonesian Resource Grammar (INDRA)

A Case Study: News Classification Based on Term Frequency

Objectives. Chapter 2: The Representation of Knowledge. Expert Systems: Principles and Programming, Fourth Edition

THE ROLE OF DECISION TREES IN NATURAL LANGUAGE PROCESSING

Transcription:

Diana McCarthy, Adam Kilgarriff, Miloš Jakubíček, Siva Reddy DTAL University of Cambridge, Lexical Computing, University of Edinburgh, Masaryk University July 2015

Outline 1 The Sketch Engine Concordances Word Sketches 2 Super Sense Tagger (sst) sst Supersenses 3 In the Concordance Other Possibilities from sst Output 4 5

Concordances Word Sketches The Sketch Engine concordances, word lists, collocations word sketches create and examine syntactic profiles and collocations of words input automatic part-of-speech tags and a bespoke sketch grammar automatic thesauruses: which other words have similar profiles? sketch differences between words

Concordances Word Sketches The Sketch Engine for viewing corpora

Concordances Word Sketches The Sketch Engine Word Sketches: syntactic profiles

Concordances Word Sketches Sketch Grammars Under the hood Definitions: define( any noun, N.. )... Relations =subject/subject of 2:any noun rel start? adv aux string incl be 1:verb not pp 2:any noun rel start? adv aux string incl be aux have adv string 1:past part 1:past part adv string [word= by ] long np

Super Sense Tagger (sst) sst Supersenses Semantic Class Tagging aim to build word sketches on syntactic and semantic information automatic superclass tagging technology superclass: a coarse grained semantic class that is applicable to multiple words (e.g. animal for cat, fly, hare, pig etc... allow search and analysis with these classes and semantic word sketches: basic semantic frame with semantic preferences for arguments

Super Sense Tagger (sst) sst Supersenses Semantic Class Tagging Super Sense Tagger (sst) Ciaramita and Altun (2006) (http://sourceforge.net/projects/supersensetag/) semantic tags are WordNet Fellbaum (1998) lexicographer classes supervised word sense disambiguation (i.e. it requires hand labelled data for training) using a Hidden Markov Model e.g. labels mouse as animal, artifact) SemCor (Landes et al., 1998) used as training data Named Entity Recognition e.g. < RHM Technology Ltd.> organization Multiword tagging using multiwords from WordNet e.g. couch potato

Super Sense Tagger (sst) sst Supersenses sst WordNet Noun Classes (25) act acts or actions object natural objects (not man-made) animal animals quantity quantities and units of measure artifact man-made objects phenomenon natural phenomena attribute attributes of people and objects plant plants food food and drinks......

Super Sense Tagger (sst) sst Supersenses sst WordNet Verb Classes (15) body grooming, dressing and bodily care emotion feeling change size, temperature change, intensifying motion walking, flying, swimming cognition thinking, judging, analyzing, doubting perception seeing, hearing, feeling communication telling, asking, ordering, singing possession buying, selling, owning creation sewing, baking, painting, performing......

In the Concordance Other Possibilities from sst Output Experiments just over 25% of the UKWaC Ferraresi et al. (2008) sst tagged with part-of-speech tags (Penn TreeBank) supersenses (WordNet labels) Named Entity Labels WordNet multiwords

Semantic Tags in the Concordance

Semantic Tags in the Word Sketch (selected)

Semantic Tags in the Word Sketch (selected)

In the Concordance Other Possibilities from sst Output Semantic Word Sketch Grammar An example for the intransitive frame =intransframe *COLLOC %(2.sense) *%(1.sense)-x 2:any noun rel start? adv aux string incl be 1:verb not pp not np start 2:any noun rel start? adv aux string incl be aux have adv string 1:past part not np start

MWEs: detected by sst

MWEs: Sketch Diff chip (green) vs chips (red)

Portion of Sketch Diff laugh (green) vs cry (red)

Semantic Word Lists: CQL + Word Frequency (Communication Verbs)

Semantic Word Lists: FindX (communication verbs)

Comparing to FrameNet (Ruppenhofer et al., 2010) FrameNet contains lots of useful information e.g. [FRAME employing: Frame Elements: Employer Employee Position Tasks Compensation... Definition: An Employer employs an Employee whose Position entails that the Employee perform certain Tasks in exchange for Compensation lots of other information lexical units employ.v commision.v staff.n employment.n precedes frame firing with corpus examples, I employed him as Chief Gardener for ten years but manually produced so low coverage Semantic word sketches can provide additional information and high coverage

Summary semantic tagging alongside part-of-speech for semantic word sketches provide syntactic and semantic profiling for semantic queries and word lists semantic and syntactic profiling in the word sketch comparing words by the profiles

Future Possibilities try other semantic tagsets, taggers and tools sketch grammar could be developed further no identification of semantic roles as yet in contrast to FrameNet (Ruppenhofer et al., 2010), Propbank (Palmer et al., 2005) and VerbNet (Kipper-Schuler, 2005) Semantic word sketches could be used to provide selectional preferences and corpus information to such resources

Thank You

Ciaramita, M. and Altun, Y. (2006). Broad-coverage sense disambiguation and information extraction with a supersense sequence tagger. In Proceedings of the 2006 Conference on Empirical Methods in Natural Language Processing, pages 594 602, Sydney, Australia. Association for Computational Linguistics. Fellbaum, C., editor (1998). WordNet, An Electronic Lexical Database. The MIT Press, Cambridge, MA. Ferraresi, A., Zanchetta, E., Baroni, M., and Bernardini, S. (2008). Introducing and evaluating ukwac, a very large web-derived corpus of english. In Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC 2008), Marrakech, Morocco. Kipper-Schuler, K. (2005). VerbNet: A broad-coverage, comprehensive verb lexicon. PhD thesis, Computer and

Information Science Dept., University of Pennsylvania. Philadelphia, PA. Landes, S., Leacock, C., and Randee, I. T. (1998). Building semantic concordances. In Fellbaum, C., editor, WordNet: an Electronic Lexical Database, pages 199 237. MIT Press. Palmer, M., Gildea, D., and Kingsbury, P. (2005). The proposition bank: A corpus annotated with semantic roles. Computational Linguistics, 31(1):71 106. Ruppenhofer, J., Ellsworth, M., Petruck, M. R. L., Johnson, C. R., and Scheffczyk, J. (2010). FrameNet II: Extended theory and practice. Technical report, International Computer Science Institute, Berkeley. http://framenet.icsi.berkeley.edu/.