Automatic Information Extraction and Building of Lexical Semantic Resources for NLP Applications

Similar documents
The MEANING Multilingual Central Repository

Modeling Attachment Decisions with a Probabilistic Parser: The Case of Head Final Structures

Applications of memory-based natural language processing

Language Independent Passage Retrieval for Question Answering

Cross Language Information Retrieval

Word Sense Disambiguation

METHODS FOR EXTRACTING AND CLASSIFYING PAIRS OF COGNATES AND FALSE FRIENDS

Multilingual Document Clustering: an Heuristic Approach Based on Cognate Named Entities

Integrating Semantic Knowledge into Text Similarity and Information Retrieval

MULTILINGUAL INFORMATION ACCESS IN DIGITAL LIBRARY

The stages of event extraction

Update on Soar-based language processing

Measuring the relative compositionality of verb-noun (V-N) collocations by integrating features

Linking Task: Identifying authors and book titles in verbose queries

Memory-based grammatical error correction

A Case Study: News Classification Based on Term Frequency

Target Language Preposition Selection an Experiment with Transformation-Based Learning and Aligned Bilingual Data

Proceedings of the 19th COLING, , 2002.

Annotation Projection for Discourse Connectives

Semi-supervised methods of text processing, and an application to medical concept extraction. Yacine Jernite Text-as-Data series September 17.

Distant Supervised Relation Extraction with Wikipedia and Freebase

Developing a TT-MCTAG for German with an RCG-based Parser

Accuracy (%) # features

Predicting Student Attrition in MOOCs using Sentiment Analysis and Neural Networks

Matching Similarity for Keyword-Based Clustering

Exploiting Phrasal Lexica and Additional Morpho-syntactic Language Resources for Statistical Machine Translation with Scarce Training Data

AQUA: An Ontology-Driven Question Answering System

Lecture Notes in Artificial Intelligence 7175

Development of the First LRs for Macedonian: Current Projects

2.1 The Theory of Semantic Fields

Natural Language Processing. George Konidaris

Language Acquisition Fall 2010/Winter Lexical Categories. Afra Alishahi, Heiner Drenhaus

THE VERB ARGUMENT BROWSER

Towards a corpus-based online dictionary. of Italian Word Combinations

The Choice of Features for Classification of Verbs in Biomedical Texts

EdIt: A Broad-Coverage Grammar Checker Using Pattern Grammar

BYLINE [Heng Ji, Computer Science Department, New York University,

2/15/13. POS Tagging Problem. Part-of-Speech Tagging. Example English Part-of-Speech Tagsets. More Details of the Problem. Typical Problem Cases

DKPro WSD A Generalized UIMA-based Framework for Word Sense Disambiguation

Knowledge-Based - Systems

The Smart/Empire TIPSTER IR System

Combining a Chinese Thesaurus with a Chinese Dictionary

THE ROLE OF DECISION TREES IN NATURAL LANGUAGE PROCESSING

Vocabulary Usage and Intelligibility in Learner Language

! # %& ( ) ( + ) ( &, % &. / 0!!1 2/.&, 3 ( & 2/ &,

The taming of the data:

Multilingual Sentiment and Subjectivity Analysis

Experiments with SMS Translation and Stochastic Gradient Descent in Spanish Text Author Profiling

LANGUAGE IN INDIA Strength for Today and Bright Hope for Tomorrow Volume 11 : 12 December 2011 ISSN

SEMAFOR: Frame Argument Resolution with Log-Linear Models

Parsing of part-of-speech tagged Assamese Texts

A Bottom-up Comparative Study of EuroWordNet and WordNet 3.0 Lexical and Semantic Relations

Using Semantic Relations to Refine Coreference Decisions

Constructing Parallel Corpus from Movie Subtitles

Ontologies vs. classification systems

Heuristic Sample Selection to Minimize Reference Standard Training Set for a Part-Of-Speech Tagger

Prediction of Maximal Projection for Semantic Role Labeling

Agnès Tutin and Olivier Kraif Univ. Grenoble Alpes, LIDILEM CS Grenoble cedex 9, France

SINGLE DOCUMENT AUTOMATIC TEXT SUMMARIZATION USING TERM FREQUENCY-INVERSE DOCUMENT FREQUENCY (TF-IDF)

Analysis of Lexical Structures from Field Linguistics and Language Engineering

A Bayesian Learning Approach to Concept-Based Document Classification

A Comparison of Two Text Representations for Sentiment Analysis

Learning Structural Correspondences Across Different Linguistic Domains with Synchronous Neural Language Models

A Statistical Approach to the Semantics of Verb-Particles

Levels of processing: Qualitative differences or task-demand differences?

IT4BI, Semester 2, UFRT. Welcome address, February 1 st, 2013 Arnaud Giacometti / Patrick Marcel

A Graph Based Authorship Identification Approach

Intension, Attitude, and Tense Annotation in a High-Fidelity Semantic Representation

Welcome to. ECML/PKDD 2004 Community meeting

A Comparative Evaluation of Word Sense Disambiguation Algorithms for German

UNIVERSITÀ DEGLI STUDI DI ROMA TOR VERGATA. Economia. Facoltà di CEIS MASTER ECONOMICS ECONOMETRICS

Language Center. Course Catalog

The Acquisition of Person and Number Morphology Within the Verbal Domain in Early Greek

Short Text Understanding Through Lexical-Semantic Analysis

The College Board Redesigned SAT Grade 12

Ensemble Technique Utilization for Indonesian Dependency Parser

Semantic Evidence for Automatic Identification of Cognates

NCU IISR English-Korean and English-Chinese Named Entity Transliteration Using Different Grapheme Segmentation Approaches

Elena Papassissa. Freelance type designer for Jeffery Keedy, Los Angeles, USA. London, UK. In studio part-time designer for Fraser Muggeridge studio,

Resolving Ambiguity for Cross-language Retrieval

CEF, oral assessment and autonomous learning in daily college practice

Cross-Lingual Text Categorization

Chunk Parsing for Base Noun Phrases using Regular Expressions. Let s first let the variable s0 be the sentence tree of the first sentence.

Introduction. Beáta B. Megyesi. Uppsala University Department of Linguistics and Philology Introduction 1(48)

A Semantic Similarity Measure Based on Lexico-Syntactic Patterns

The Karlsruhe Institute of Technology Translation Systems for the WMT 2011

Analysis of Probabilistic Parsing in NLP

Discovering Knowledge in Texts for the learning of DOGMA-inspired ontologies

Specification and Evaluation of Machine Translation Toy Systems - Criteria for laboratory assignments

Module 12. Machine Learning. Version 2 CSE IIT, Kharagpur

Introduction to Text Mining

have to be modeled) or isolated words. Output of the system is a grapheme-tophoneme conversion system which takes as its input the spelling of words,

CS 598 Natural Language Processing

Generative models and adversarial training

Accurate Unlexicalized Parsing for Modern Hebrew

Artificial Intelligence

Minding the Source: Automatic Tagging of Reported Speech in Newspaper Articles

VI Jaen Conference on Approximation

Lecture 1: Basic Concepts of Machine Learning

Structure Discovery and Visualization in Scientific Literature

Transcription:

Automatic Information Extraction and Building of Lexical Semantic Resources for NLP Applications ACL/EACL-97 Workshop Proceedings July 12th 1997 Madrid Editors Piek Vossen (Chair) Geert Adriaens Nicoletta Calzolari Antonio Sanfilippo Yorick Wilks Organized under the auspices of the Language Engineering section of the European Commission, Directorale General XIII Luxembourg, by the projects EuroWordNet(LE2 4003), Sparkle (LE 1 2111) and Ecran

1997, Association for Computational Linguistics Order additional copies from: ACL P.O. Box 6090 Somerset, NJ, 08875 USA +1-908-873-3898 acl @bellcore.com

Preface In the past years the development of high-quality and overall language resources has been the focus of many research groups. More recently also the corpus-based extraction of such resources has gained a wider interest. EuroWordNet, Sparkle and Ecran try to package some of this know-how and expertise into stateof-the-art tools and resources that can directly be applied in NLP-based services. In the EuroWordNet project a multilingual database is developed with wordnets for four European Languages linked to the existing Princeton WordNet (version 1.5). Such a database can be used in multilingual retrieval applications but it can also be seen as a starting point for automatic-translation aids, inferencing systems, and information extraction systems. Sparkle and Ecran both address the creation of language resources and technologies for real-world NLP applications in parallel. This objective is carried out through the development of software tools in the areas of shallow parsing and lexical acquisition. These tools are used to induce linguistic knowledge from text corpora and are progressively enriched by the information acquired. In all three projects the current limits of Linguistic Technology are being explored for their practical benefits. Whereas EuroWordNet aims at the broadening and extension of the Princeton WordNet to a generic multitingual resource which is the first in its kind, Sparkle and Ecran aim at the dynamic anchoring of resources and information to the data and corpora that are of a user's interest. The availability of these resources and tools is essential for the new generation of applications and products dealing with information in electronic form. The projects have finished their specification phase and are in the process of generating the results. In this workshop we want to discuss the scope and formats of semantic resources and information acquisition tools with scholars in the field and researchers from commercial R&D departments who have experience in developing and using them. The main themes of the workshop are: compatibility and standards of multilingual semantic resources and lexical acquisition tools. the validation ofmultilingual semantic resources and lexical acquisition tools. performances of semantic resources and lexical acquisition tools in NLP tasks. partial or phrasal parsing of text. linking text with lexical databases: sense-differentiation, sense-tagging and sense-disambiguation tasks, domain-differentiation of text and iexical resources. The first three papers in the proceedings address issues related to the building and checking of lexical semantic resources. The remainder of the papers mainly deal with the application of lexical semantic resources in various NLP tasks, ranging from information retrieval, semantic tagging and information extraction, or they deal with the extraction of information from text-corpora to build such resources eventually.

ORGANIZING COMMITTEE Piek Vossen, Computer Centrum Letteren, University of Amsterdam e-mail: Piek.Vossen@let.uva.nl Cintha Harjadi, Computer Centrum Letteren, University of Amsterdam e-mail: Cintha.Harjadi@let.uva.nl Horacio Rodriquez, Universitat Politecnica de Catalunya, e-mail: Horacio@lsi.upc.es PROGAM COMMITTEE Piek Vossen, University of Amsterdam, The Netherlands, e-mail: Piek.Vossen@let.uva.nl Nicoletta Calzolari, Istituto di Linguisnca Computazionale del CNR, Italy, e-mail: glottolo@vm.cnuce.cnr.it Antonio Sanfilippo, Sharp Laboratories, UK, email: Antonio.Sanfilippo@sharp.co.uk Geert Adriaens, Novell Linguistic Development, Belgium, e-mail: Gee~_Adriaens@novell.com Yorick Wilks, University of Sheffield, UK, e-mail: yorick@dcs.shef.ac.uk

CONTENTS Vossen, P., Diez-Orzas, P. & Peters, W Multilingual Design of EuroWordNet Hamp, B., Feldweg, H GermaNet - a Lexical-Semantic Net for German Takunaga, T., Fujii, A., Iwayama, M., Sakurai, N. & Tanaka, H Extending a thesaurus by classifying words 16 Fischer, D Formal redundancy and consistency checking rules for the lexical database WordNet 1.5 22 Artale A., Magnini, B. & Strapparava, C Lexical Discrimination with the Italian Version of WordNet 32 Gomez-Hidalgo, J.M., de Buenaga Rodriguez, M. Integrating a Lexical Database and a Training Collection for Text Categorization 39 Fujii, A., Hasegawa, T., Tokunaga, T. & Tanaka, H Integration of Hand-Crafted and Statistical Resources in Measuring Word Similarity 45 McCarthy, D. Word Sense Disambiguation for Acquisition of Selectional Preferences 52 Chai. J., Bierman, A.W The Use of Lexical Semantics in Information Extraction 61 Ait-Mokhtar, S., Chanod, J.P Subject and Object Dependency Extraction Using Finite-State Transducers 71 Segond, F., Schiller, A., Grefenstette, G. & Chanod, J.P An Experiment in Semantic Tagging using Hidden Markov Model Tagging 78 Sanfilippo, A. Using Semantic Similarity to Acquire Co-occurrence Restrictions from Corpora 82 Federici, S., Montemagni, S. & Pirelli, V Inferring Semantic Similarity from Distributional Evidence: an Analogy-based Approach to Word Sense Disambiguation 90