The MEANING Multilingual Central Repository

Similar documents
AQUA: An Ontology-Driven Question Answering System

A Bottom-up Comparative Study of EuroWordNet and WordNet 3.0 Lexical and Semantic Relations

2.1 The Theory of Semantic Fields

Ontologies vs. classification systems

Visual CP Representation of Knowledge

Word Sense Disambiguation

Compositional Semantics

Argument structure and theta roles

MULTILINGUAL INFORMATION ACCESS IN DIGITAL LIBRARY

An Interactive Intelligent Language Tutor Over The Internet

Chapter 9 Banked gap-filling

An Introduction to the Minimalist Program

A Domain Ontology Development Environment Using a MRD and Text Corpus

A Minimalist Approach to Code-Switching. In the field of linguistics, the topic of bilingualism is a broad one. There are many

Multilingual Document Clustering: an Heuristic Approach Based on Cognate Named Entities

Vocabulary Usage and Intelligibility in Learner Language

TextGraphs: Graph-based algorithms for Natural Language Processing

Proceedings of the 19th COLING, , 2002.

Knowledge-Based - Systems

Fourth Grade. Reporting Student Progress. Libertyville School District 70. Fourth Grade

Applications of memory-based natural language processing

Data Fusion Models in WSNs: Comparison and Analysis

Approaches to control phenomena handout Obligatory control and morphological case: Icelandic and Basque

Target Language Preposition Selection an Experiment with Transformation-Based Learning and Aligned Bilingual Data

Language Independent Passage Retrieval for Question Answering

Ontological spine, localization and multilingual access

Multilingual Sentiment and Subjectivity Analysis

Chunk Parsing for Base Noun Phrases using Regular Expressions. Let s first let the variable s0 be the sentence tree of the first sentence.

DKPro WSD A Generalized UIMA-based Framework for Word Sense Disambiguation

Designing e-learning materials with learning objects

METHODS FOR EXTRACTING AND CLASSIFYING PAIRS OF COGNATES AND FALSE FRIENDS

Developing True/False Test Sheet Generating System with Diagnosing Basic Cognitive Ability

Multi-Lingual Text Leveling

Outline. Web as Corpus. Using Web Data for Linguistic Purposes. Ines Rehbein. NCLT, Dublin City University. nclt

Modeling user preferences and norms in context-aware systems

Language Acquisition Fall 2010/Winter Lexical Categories. Afra Alishahi, Heiner Drenhaus

Which verb classes and why? Research questions: Semantic Basis Hypothesis (SBH) What verb classes? Why the truth of the SBH matters

A Bayesian Learning Approach to Concept-Based Document Classification

Introduction to HPSG. Introduction. Historical Overview. The HPSG architecture. Signature. Linguistic Objects. Descriptions.

Update on Soar-based language processing

The stages of event extraction

Semi-supervised methods of text processing, and an application to medical concept extraction. Yacine Jernite Text-as-Data series September 17.

Cross Language Information Retrieval

Part III: Semantics. Notes on Natural Language Processing. Chia-Ping Chen

MYCIN. The MYCIN Task

! # %& ( ) ( + ) ( &, % &. / 0!!1 2/.&, 3 ( & 2/ &,

On-Line Data Analytics

Assessing System Agreement and Instance Difficulty in the Lexical Sample Tasks of SENSEVAL-2

EdIt: A Broad-Coverage Grammar Checker Using Pattern Grammar

On document relevance and lexical cohesion between query terms

A Case Study: News Classification Based on Term Frequency

SEMAFOR: Frame Argument Resolution with Log-Linear Models

Leveraging Sentiment to Compute Word Similarity

Corpus Linguistics (L615)

Analysis of Lexical Structures from Field Linguistics and Language Engineering

International Conference on Education and Educational Psychology (ICEEPSY 2012)

Graph Alignment for Semi-Supervised Semantic Role Labeling

Computerized Adaptive Psychological Testing A Personalisation Perspective

Standards Alignment... 5 Safe Science... 9 Scientific Inquiry Assembling Rubber Band Books... 15

The taming of the data:

An Empirical and Computational Test of Linguistic Relativity

CS 598 Natural Language Processing

USER ADAPTATION IN E-LEARNING ENVIRONMENTS

MASTER OF SCIENCE (M.S.) MAJOR IN COMPUTER SCIENCE

Probabilistic Latent Semantic Analysis

Syntax Parsing 1. Grammars and parsing 2. Top-down and bottom-up parsing 3. Chart parsers 4. Bottom-up chart parsing 5. The Earley Algorithm

Can Human Verb Associations help identify Salient Features for Semantic Verb Classification?

Using Semantic Relations to Refine Coreference Decisions

Objectives. Chapter 2: The Representation of Knowledge. Expert Systems: Principles and Programming, Fourth Edition

LANGUAGE IN INDIA Strength for Today and Bright Hope for Tomorrow Volume 12: 9 September 2012 ISSN

BYLINE [Heng Ji, Computer Science Department, New York University,

A DISTRIBUTIONAL STRUCTURED SEMANTIC SPACE FOR QUERYING RDF GRAPH DATA

ON BEHAVIORAL PROCESS MODEL SIMILARITY MATCHING A CENTROID-BASED APPROACH

EOSC Governance Development Forum 4 May 2017 Per Öster

EDITORIAL: ICT SUPPORT FOR KNOWLEDGE MANAGEMENT IN CONSTRUCTION

PROCESS USE CASES: USE CASES IDENTIFICATION

CREATING SHARABLE LEARNING OBJECTS FROM EXISTING DIGITAL COURSE CONTENT

Learning Methods in Multilingual Speech Recognition

Effect of Word Complexity on L2 Vocabulary Learning

Towards a Collaboration Framework for Selection of ICT Tools

Controlled vocabulary

Implementing a tool to Support KAOS-Beta Process Model Using EPF

QuickStroke: An Incremental On-line Chinese Handwriting Recognition System

Software Maintenance

THE WEB 2.0 AS A PLATFORM FOR THE ACQUISITION OF SKILLS, IMPROVE ACADEMIC PERFORMANCE AND DESIGNER CAREER PROMOTION IN THE UNIVERSITY

Organizational Knowledge Distribution: An Experimental Evaluation

Modeling full form lexica for Arabic

Postprint.

DOCTORAL SCHOOL TRAINING AND DEVELOPMENT PROGRAMME

Evolution of Symbolisation in Chimpanzees and Neural Nets

Building an HPSG-based Indonesian Resource Grammar (INDRA)

An extended dual search space model of scientific discovery learning

Increasing the Learning Potential from Events: Case studies

Combining Bidirectional Translation and Synonymy for Cross-Language Information Retrieval

Practice Examination IREB

A Neural Network GUI Tested on Text-To-Phoneme Mapping

Abstractions and the Brain

Operational Knowledge Management: a way to manage competence

Nearing Completion of Prototype 1: Discovery

KBS : Knowledge Representation. Motivation. Epistemology. Objectives

Transcription:

The MEANING Multilingual Central Repository J. Atserias, L. Villarejo, G. Rigau, E. Agirre, J. Carroll, B. Magnini, P. Vossen January 27, 2004 http://www.lsi.upc.es/ nlp/meaning Jordi Atserias TALP

Index The MEANING framework First Multilingual Central Repository First Uploading Process SUMO Selectional Preferences Top Concept ontology First Porting Process The pasta example Conclusions and Future Work Jordi Atserias TALP 1

The MEANING Project The Meaning project [Rigau et al. 02] 1 Inter dependency between WSD and knowledge acquisition Exploiting a multilingual architecture based on EuroWordNet [Vossen 98] Three consecutive cycles of large-scale WSD and acquisition 1 http://www.lsi.upc.es/~nlp/meaning/meaning.html Jordi Atserias TALP 2

MEANING Cycle Jordi Atserias TALP 3

MCR Architecture Jordi Atserias TALP 4

Content of the MCR Ili (WordNet 1.6) Upgraded EuroWordNet Base Concepts Upgraded EuroWordNet Top Concept ontology MultiWordNet Domains Suggested Upper Merged Ontology (SUMO) Local wordnets English WordNet 1.5, 1.6, 1.7, 1.7.1 Basque, Catalan, Italian and Spanish wordnets Large collections of semantic preferences Acquired from SemCor Acquired from BNC Instances Jordi Atserias TALP 5

Uploading Main task: Transforming resources from wn1.5 to wn1.6. Project them using the mapping [Daudé et al. 99] 2 Recovering Consistency (Base Concepts, Top Concept Ontology) Cross-Checking Resources 2 http://www.lsi.upc.es/~nlp/tools/mapping.html Jordi Atserias TALP 6

Uploading SUMO The Suggested Upper Merged Ontology (SUMO) [Niles & Pease 01] is an upper ontology proposed as starting point for the IEEE Standard Upper Ontology Working group. SUMO provides definitions for general purpose terms Currently only the SUMO labels and the SUMO ontology hyperonym relations are loaded into the Mcr. We plan to cross check the Top Concept ontology expansion and the Domain ontology with the SUMO ontology. Jordi Atserias TALP 7

Uploading Selectional Preferences 390,549 weighted Selectional Preferences (SPs) A set [McCarthy 01] of weighted SPs was obtained by computing probability distributions over the Wn1.6 noun hierarchy from parsing the BNC. The second set [Agirre & Martinez 02] was obtained from generalizations of grammatical relations extracted from Semcor. Example: pasta (money sense) has the following preferences as object: 1.44 01576902-v {raise#4}, 0.45 01518840-v {take in#5, collect#2} or 0.23 01565625-v {earn#2, garner#1} Jordi Atserias TALP 8

Uploading the Top Ontology TC aimed to enforce uniformity and compatibility of the different Wns Ewn only performed a complete validation of its consistency among the Base Concepts. The properties assigned to the Base Concepts were not explicitly available from rest ILIs/synsets. Thus, we decide to perform an automatic expansion. Jordi Atserias TALP 9

Uploading the Top Ontology The Top Concept ontology has been uploaded in three steps: 1. Properties are assigned to Wn1.6 synsets through the mapping. 2. Assign properties to the remaining Wn1.6 Tops 3. The properties are propagated top down through the Wn hierarchy 4. The incompatibilities between properties block the propagation. Jordi Atserias TALP 10

Expansion Problems Problematic cases can be detected by cross-checking. WordNet hierarchy The classification of Wn is not always consistent with the Top Concept ontology Multiple inheritance A synset inherits incompatible attributes from its ancestors Cross-checking resources with different granularities Human Creature Animal Hominid Jordi Atserias TALP 11

WordNet hierarchy Animal vs Plant 00911639n phytoplankton 1 (SUMO.Plant+) and its direct descendant 00911809n planktonic algae 1 (SUMO.Alga). Liquid/Substance/Solid vs Object body part Object Liquid 04195761n 105 liquid body substance 1 bodily fluid 1 body fluid 1 Substances 4086329n 117 body substance 1 the substance of the body Solid 06672286n covering 1 natural covering 1 cover 5 any covering for the body or a body part Jordi Atserias TALP 12

Jordi Atserias TALP 13

Different Granularity Human vs Animal All the Hominids are considered animal by the semantic File) But Human by the top Concept ontology (SUMO Hominid+) Human vs Creature All the creatures (mainly the descendants of imaginary being 1 imaginary creature 1) are classified as Human by the semantic File. Jordi Atserias TALP 14

First Porting All the knowledge in Mcr has been ported directly to the local Wns No extra semantic knowledge has been inferred in this process. All WordNets gain some knowledge Spanish/Catalan/Basque EuWn gained Domains, SP Italian MultiwordNet gained TC, EuWn-relations, SP English WordNet gained EuWn-relations, (Domains, TC) Jordi Atserias TALP 15

Porting Results I/II Spanish English Italian Relations UPLOAD PORT0 UPLOAD PORT0 UPLOAD PORT0 be in state 1,302 = 1,300 +2 364 +2 causes 240 = 224 +19 117 +15 near antonym 7,444 = 7,449 +221 3,266 = near synonym 10,965 = 21,858 +19 4,887 +54 role 106 = 0 +106 0 +46 role agent 516 = 0 +516 0 +227 role instrument 291 = 0 +291 0 +151 role location 83 = 0 +83 0 +39 role patient 6 = 0 +6 0 +3 xpos fuzzynym 37 = 0 +37 0 +23 xpos near synonym 319 = 0 +319 0 +181 Other relations 31,644 = 29,120 +2,627 9,541 +22 Total 53,272 = 59,951 +4,246 18,175 +763 PORT0 Main figures for Spanish, English and Italian Jordi Atserias TALP 16

Porting Results II/II Spanish English Italian Relations UPLOAD PORT0 UPLOAD PORT0 UPLOAD PORT0 role agent-semcor 0 +52,394 69,840 = 0 +41,910 role agent-bnc 0 +67,109 95,065 = 0 +40,853 role patient-semcor 0 +80,378 110,102 = 0 +41,910 role patient-bnc 0 +79,443 115,102 = 0 +50,264 Role 0 +279,324 390,109 = 0 +174,937 Instances 0 +1,599 0 +2,198 791 = Proper Nouns 1,806 = 17,842 = 2,161 = Base Concepts 1,169 = 1,535 = 0 +935 Domains Links 0 +55,239 109,621 = 35,174 = Domains Synsets 0 +48,053 96,067 = 30,607 = Top Ontology Links 3,438 = 0 +4,148 0 +2,544 Top Ontology Synsets 1,290 = 0 +1,554 0 +946 PORT0 Main figures for Spanish, English and Italian Jordi Atserias TALP 17

The pasta Example Domain: chemistry-pure science Semantic File: 27-Substance SUMO: Substance-SelfConnectedObject-Object-Physical-Entity Top Concept ontology Natural-Origin-1stOrderEntity Substance-Form-1stOrderEntity pasta#n#7 10541786-n paste#1 gloss: any mixture of a soft and malleable consistency Jordi Atserias TALP 18

The pasta Example Domain: money-economy-soc.science Semantic File: 21-MONEY SUMO: CurrencyMeasure-ConstantQuantity-PhysicalQuantity-Quantity- Abstract-Entity Top Concept ontology Artifact-Origin-1stOrderEntity Function-1stOrderEntity MoneyRepresentation-Representation-Function-1stOrderEntity pasta#n#6 09640280-n dough#2,bread#2,loot#2,... gloss: informal terms for money Jordi Atserias TALP 19

The pasta Example Domain: gastronomy-alimentation-applied science Semantic File: 13-FOOD SUMO: Food-... Top Concept ontology Comestible-Function-1stOrderEntity Substance-Form-1stOrderEntity pasta#n#4 05886080-n spread#5,paste#3 gloss: a tasty mixture to be spread on bread or crackers pasta#n#3 05739733-n pasta#1,alimentary paste#1 gloss: shaped and dried dough made from flour and water & sometimes egg pasta#n#2 05671439-n pie crust#1,pie shell#1 gloss: pastry used to hold pie fillings pasta#n#1 05671312-n pastry#1,pastry dough#1 gloss: a dough of flour and water and shortening pasta#n#5 05889686-n dough#1 gloss: a dough of flour and water and shortenings Jordi Atserias TALP 20

Conclusions Mcr v0 integrates in a Ewn framework (upgraded Base Concepts and Top Concept ontology and Mwnd) five local Wns (with four English Wn versions) with hundreds of thousands of new semantic relations, instances and properties fully expanded. All Wns gain some kind of knowledge from other Wns (porting process). We intend the Mcr to be a natural multilingual large-scale knowledge resource. A full range of new possibilities appears for improving both Acquisition and WSD tasks in the next two Meaning rounds. Jordi Atserias TALP 21

Future Work Upload more resources (wn2.0, extended Wordnet) Maybe including language dependent data, such as syntactic information, subcategorization frames, diathesis alternations... porting process, to investigate inference mechanisms to infer new explicit relations and knowledge (regular polysemy, nominalizations, etc). Investigate/check the correctness the semantic knowledge ported across languages. Jordi Atserias TALP 22

Thanks for your attention http://www.lsi.upc.es/ nlp/meaning This research has been partially funded by the Spanish Research Department (HERMES TIC2000-0335-C03-02) and by the European Commission (MEANING IST-2001-34460). Jordi Atserias TALP 23

Bibliography References [Agirre & Martinez 02] (Agirre & Martinez 02) E. Agirre and D. Martinez. Integrating selectional preferences in wordnet. In Proceedings of the first International WordNet Conference in Mysore, India, 21-25 January 2002. [Daudé et al. 99] (Daudé et al. 99) J. Daudé, L. Padró, and G. Rigau. Mapping Multilingual Hierarchies Using Relaxation Labeling. In Joint SIGDAT Conference on Empirical Methods in Natural Language Processing and Very Large Corpora (EMNLP/VLC 99), Maryland, US, 1999. [McCarthy 01] (McCarthy 01) D. McCarthy. Lexical Acqusition at the Syntax- Semantics Interface: Diathesis Alternations, Subcategorization Frames and Selectional Preferences. Unpublished PhD thesis, University of Sussex, 2001. Jordi Atserias TALP 24

[Niles & Pease 01] (Niles & Pease 01) I. Niles and A. Pease. Towards a standard upper ontology. In In Proceedings of the 2nd International Conference on Formal Ontology in Information Systems (FOIS- 2001), pages 17 19. Chris Welty and Barry Smith, eds, 2001. [Rigau et al. 02] (Rigau et al. 02) G. Rigau, B. Magnini, E. Agirre, P. Vossen, and J. Carroll. Meaning: A roadmap to knowledge technologies. In Proceedings of COLLING Workshop A Roadmap for Computational Linguistics, Taipei, Taiwan, 2002. [Vossen 98] (Vossen 98) P. Vossen, editor. EuroWordNet: A Multilingual Database with Lexical Semantic Networks. Kluwer Academic Publishers, 1998. Jordi Atserias TALP 25