elexicography, Terminology, and Global Content Management Gerhard Budin University of Vienna Austrian Academy of Sciences

Similar documents
Ontologies vs. classification systems

Ontological spine, localization and multilingual access

P. Belsis, C. Sgouropoulou, K. Sfikas, G. Pantziou, C. Skourlas, J. Varnas

Improving the impact of development projects in Sub-Saharan Africa through increased UK/Brazil cooperation and partnerships Held in Brasilia

1. Introduction. 2. The OMBI database editor

Specification and Evaluation of Machine Translation Toy Systems - Criteria for laboratory assignments

The MEANING Multilingual Central Repository

The CESAR Project: Enabling LRT for 70M+ Speakers

Cross Language Information Retrieval

AUTHORING E-LEARNING CONTENT TRENDS AND SOLUTIONS

GALICIAN TEACHERS PERCEPTIONS ON THE USABILITY AND USEFULNESS OF THE ODS PORTAL

Automating the E-learning Personalization

arxiv: v1 [cs.cl] 2 Apr 2017

Use of CIM in AEP Enterprise Architecture. Randy Lowe Director, Enterprise Architecture October 24, 2012

Europeana Creative. Bringing Cultural Heritage Institutions and Creative Industries Europeana Day, April 11, 2014 Zagreb

Characteristics of Collaborative Network Models. ed. by Line Gry Knudsen

Community-oriented Course Authoring to Support Topic-based Student Modeling

Modeling full form lexica for Arabic

AN INTRODUCTION (2 ND ED.) (LONDON, BLOOMSBURY ACADEMIC PP. VI, 282)

UNIVERSITY OF THESSALY DEPARTMENT OF EARLY CHILDHOOD EDUCATION POSTGRADUATE STUDIES INFORMATION GUIDE

2.1 The Theory of Semantic Fields

Master s Programme in European Studies

From Empire to Twenty-First Century Britain: Economic and Political Development of Great Britain in the 19th and 20th Centuries 5HD391

Master Program: Strategic Management. Master s Thesis a roadmap to success. Innsbruck University School of Management

Software Maintenance

THE HUMAN SEMANTIC WEB SHIFTING FROM KNOWLEDGE PUSH TO KNOWLEDGE PULL

AQUA: An Ontology-Driven Question Answering System

ehealth Governance Initiative: Joint Action JA-EHGov & Thematic Network SEHGovIA DELIVERABLE Version: 2.4 Date:

OECD THEMATIC REVIEW OF TERTIARY EDUCATION GUIDELINES FOR COUNTRY PARTICIPATION IN THE REVIEW

A process by any other name

English-German Medical Dictionary And Phrasebook By A.H. Zemback

The development of a new learner s dictionary for Modern Standard Arabic: the linguistic corpus approach

Modeling user preferences and norms in context-aware systems

CROSS-LANGUAGE INFORMATION RETRIEVAL USING PARAFAC2

Evaluation of Learning Management System software. Part II of LMS Evaluation

THE ROLE OF TOOL AND TEACHER MEDIATIONS IN THE CONSTRUCTION OF MEANINGS FOR REFLECTION

An Open Framework for Integrated Qualification Management Portals

CollaboFramework. Framework and Methodologies for Collaborative Research in Digital Humanities. DHN Workshop. Organizers:

UNEP-WCMC report on activities to ICRI

HARPER ADAMS UNIVERSITY Programme Specification

Designing e-learning materials with learning objects

Oakland Unified School District English/ Language Arts Course Syllabus

Memorandum. COMPNET memo. Introduction. References.

Knowledge Synthesis and Integration: Changing Models, Changing Practices

Document number: 2013/ Programs Committee 6/2014 (July) Agenda Item 42.0 Bachelor of Engineering with Honours in Software Engineering

Teacher Development to Support English Language Learners in the Context of Common Core State Standards

Intra-talker Variation: Audience Design Factors Affecting Lexical Selections

OilSim. Talent Management and Retention in the Oil and Gas Industry. Global network of training centers and technical facilities

Modelling interaction during small-group synchronous problem-solving activities: The Synergo approach.

5.7 Course Descriptions

Activities, Exercises, Assignments Copyright 2009 Cem Kaner 1

Regional Bureau for Education in Africa (BREDA)

BPS Information and Digital Literacy Goals

European Cooperation in the field of Scientific and Technical Research - COST - Brussels, 24 May 2013 COST 024/13

University Library Collection Development and Management Policy

Call for International Experts for. The 2018 BFSU International Summer School BEIJING FOREIGN STUDIES UNIVERSITY

LANGUAGE IN INDIA Strength for Today and Bright Hope for Tomorrow Volume 12: 9 September 2012 ISSN

Arts, Literature and Communication (500.A1)

PROCESS USE CASES: USE CASES IDENTIFICATION

MASTER OF SCIENCE (M.S.) MAJOR IN COMPUTER SCIENCE

The leaky translation process

the contribution of the European Centre for Modern Languages Frank Heyworth

CEN/ISSS ecat Workshop

Protocols for building an Organic Chemical Ontology

Productive partnerships to promote media and information literacy for knowledge societies: IFLA and UNESCO s collaborative work

A Minimalist Approach to Code-Switching. In the field of linguistics, the topic of bilingualism is a broad one. There are many

University of Alabama in Huntsville

Courses below are sorted by the column Field of study for your better orientation. The list is subject to change.

Using Virtual Manipulatives to Support Teaching and Learning Mathematics

Web as Corpus. Corpus Linguistics. Web as Corpus 1 / 1. Corpus Linguistics. Web as Corpus. web.pl 3 / 1. Sketch Engine. Corpus Linguistics

Stefan Engelberg (IDS Mannheim), Workshop Corpora in Lexical Research, Bucharest, Nov [Folie 1] 6.1 Type-token ratio

COMPETENCY-BASED STATISTICS COURSES WITH FLEXIBLE LEARNING MATERIALS

THE WEB 2.0 AS A PLATFORM FOR THE ACQUISITION OF SKILLS, IMPROVE ACADEMIC PERFORMANCE AND DESIGNER CAREER PROMOTION IN THE UNIVERSITY

Douglas Proctor, University College Dublin Markus Laitinen, University of Helsinki & EAIE Christopher Johnstone, University of Minnesota

On document relevance and lexical cohesion between query terms

PROJECT PERIODIC REPORT

COURSE LISTING. Courses Listed. Training for Cloud with SAP SuccessFactors in Integration. 23 November 2017 (08:13 GMT) Beginner.

Deliverable n. 6 Report on Financing and Co- Finacing of Internships

MULTILINGUAL INFORMATION ACCESS IN DIGITAL LIBRARY

3. Improving Weather and Emergency Management Messaging: The Tulsa Weather Message Experiment. Arizona State University

Towards a Collaboration Framework for Selection of ICT Tools

METHODS FOR EXTRACTING AND CLASSIFYING PAIRS OF COGNATES AND FALSE FRIENDS

A Bayesian Learning Approach to Concept-Based Document Classification

On the Open Access Strategy of the Max Planck Society

Baku Regional Seminar in a nutshell

Summary BEACON Project IST-FP

SEDETEP Transformation of the Spanish Operation Research Simulation Working Environment

English Language and Applied Linguistics. Module Descriptions 2017/18

What is a Mental Model?

Online Marking of Essay-type Assignments

Effect of Word Complexity on L2 Vocabulary Learning

Degree Qualification Profiles Intellectual Skills

Practice Examination IREB

The Enterprise Knowledge Portal: The Concept

Multilingual Sentiment and Subjectivity Analysis

Ben Kokkeler University of Twente 10 th September 2015 HEIR Network Conference University of the West of Scotland, Paisley

ACCREDITATION STANDARDS

EDITORIAL: ICT SUPPORT FOR KNOWLEDGE MANAGEMENT IN CONSTRUCTION

CREATING SHARABLE LEARNING OBJECTS FROM EXISTING DIGITAL COURSE CONTENT

Transcription:

elexicography, Terminology, and Global Content Management Gerhard Budin University of Vienna Austrian Academy of Sciences Lexicograffiti European Parliament Luxembourg 23 rd of January, 2012

Overview My background, my perspectives First Part: Trends analysis: elexicography, Terminology, and Global Content Management Summary of Terminology Management principles Second Part: Examples and Case Studies Conclusions

My research background

My perspectives Socio-cognitive orientation: process-, user- and taskoriented, functional approach Computational orientation: data modelling, corpus-based perspectives Media orientation: multi- and cross-media, singlesourcing, re-usability of content Management perspectives: knowledge management; content management; terminology management; workflow management; quality management Empirical approach: inspired by best practices

First Part: Trends analysis: elexicography Concurrent paradigm shifts: From traditional lexicography to elexicography: corpus-driven analysis of linguistic usage dictionaries as information and reference tools on-the-fly generation of tailor-made and taskoriented lexical content

First Part: Trends analysis: Terminology, and Global Content Management From top-down terminology work to terminology-based content management domain-specific or multi-domain corpusbased term extraction and linguistic- and cognitive resource enrichment cross-media, multi-functional, multi-format, and single sourcing-oriented global content resource generation based on terminology management principles and methods

Some Terminology Management principles and methods What is terminology? What is terminology management? Who is doing multilingual terminology management and why? How to identify best practices? Which criteria can we use?

Terminology and its functions What is a terminology? a structured set of concepts and terms of a specific subject field in a specific language Terminology as an abstract noun denotes the subject field of terminology studies, terminology work, etc.

Terminology and its functions Communication and discourse (mutual understanding) Information (data storage, logistics) Cognition (concept formation, creative thinking and naming the world) Knowledge (dissemination, learning, storage, coding, etc.) Professional work in science Technology Industry Business Trade Public and social affairs Culture Sports Language policies Language development

What is Terminology Management? A broad concept, covering a wide range of practical activities for manipulating terminological information for specific purposes Operationalizes theoretical principles of terminology as methodologies A type of Information Management A Key to Language Management, Knowledge Management, Content Management

Types of Terminology Management Descriptive Terminology Management Aiming at documenting terminological diversity, for research purposes or for creating a sound basis for decisions to be taken Translation-Oriented Terminology Management: comparative approach, documenting cross-cultural differences in terminological structures in source and target languages Corpus-driven Terminology Management: usageoriented, term extraction from real life discourse, computational terminology paradigm

Types of Terminology Management Prescriptive Terminology Management a normative approach as part of language planning and technical and scientific standardization, decisions are taken on the basis of existing information sources, terminology collections, etc., aiming at reducing terminological complexity and diversity) Standardizing terminological information (engineering, natural sciences, medicine, etc.) Standardizing the methods of terminology creation, of terminology management (term formation, quality management, process management, terminography)

Approaches (to all types of TM) Ad-hoc approach aiming at instant problem solving, as part of other processes [translation, technical documentation, technical standardization, etc.] Text-oriented approach Systematic approach consistent application of work methods, systematic problem solving, interaction of workflows Domain knowledge-oriented approach

Operational Principles Concept-orientation of any kind of terminological activity Difference from lexicography (although mixed forms exist and are applied as well) Knowledge-oriented (conceptual structures are knowledge structures) Systematic and adaptive approach Applying work methodologies in a consistent way Methods to be adapted and fine-tuned to specific purposes, application environments, projects, sociocultural traditions, economic constraints, existing workflows, etc.

Operational Principles The sources of terminological information are always documented (in line with scientific traditions, as a pre-requisite for decision support in workflows, and for subsequent reuse of terminological resources) Management aspects (work flows and project management, economic and legal aspects, training and staff motivation, marketing, exploitation, dissemination) Cooperative approach Purpose-driven selection and application of tools Re-usability, open standards

Second Part: Examples and Case studies from our own research & development more details in the afternoon session if desired I Multilingual glossary, dictionary, termbase, and ontology for global risk management communication II The LISE project quality management for legal and administrative terminologies

I The Making of a Multilingual Glossary on Risk Management

Motivations and Methods: Terminologies for Risk Communication The Role of LSP Lexicography in domain communication Increasing the transparency of terms Help negotiate a common understanding of terms in intra-, inter-disciplinary and transcultural discourse Help increase the consistency of risk discourse and increase understanding in target audiences Reduce unnecessary synonyms, disambiguate polysems, help separate homonyms Help create risk terminologies in many languages Support knowledge sharing and knowledge transfer in cooperative work environments Support translation work

The Domains of Risk Management Multidisciplinary, diverse, and fragmented or Transdisciplinary, overlapping, converging, integrated, and complementary The need for mediating between different approaches, cultures, and discourses: Technological, engineering, research, science Administration, legislation, monitoring Social, sociological, political, cultural Domain approaches (financial, ecological, chemical, safety, geographical, planning and forecast, health, etc.)

WIN Project WP Human Language Interoperability Objectives designed to support international risk management and risk communication processes Achieved results (with ongoing work) Large parallel corpora collection with risk-related texts and lexical resources (fr, en, de, es, ro, fi, hu, ru) Multilingual index with conceptual structure Bibliography and codes of sources Risk Ontology Multilingual online terminology database

The global risk communication scenario Thesaurus building Creating multilingual terminology databases Creating multilingual text corpora Lexicographical glossary Semantic enrichment (conceptual links, frame semantics) Collection of relevant knowledge organization systems Annotation of resources Mark-up of resources (TBX) Ontology building Communication design

Integrated Workflows Termbase: Export XML Domain Models meta-models -> patterns Text corpus: Term extraction comparative testing ProTerm, MultiTerm Extract, MultiCorpora Aligning with termbase Convert to RDF Ontology import -> editor Mappings (TBX/TMF, XML, RDF, OWL, UML, comma delimited, RDB)

The Glossary The glossary is used by risk managers, civil engineers, but also teachers, students, translators, journalists, etc. the purpose of such multilingual conceptual glossaries is to improve domain communication and to facilitate mutual understanding across linguistic boundaries. The multilingual glossary presented here includes 8 languages: English and French as main pivot languages, as well as German, Spanish, Romanian, Finnish, Hungarian, and Russian. It comprises about 230 central concepts of risk management with about 400 definitions and about 1400 terms representing these concepts in each language (including synonyms and hyperonyms), indicating the conceptual relations between the entries.

Thematic macro-structure of the glossary: Risk assessment and technology assessment Public perception of risk, planning, alarm, Risk events, equipment and operations, general terms Fire - events, equipment and operations Floods - events, equipment and operations Oil spills - events, equipment and operations. Each glossary entry follows a micro-structure with elements: A concept number for a theme from the macro-structure The terms in 8 languages, with grammatical information The definitions in each language and their sources Related terms and expressions.

II LISE Legal Language Interoperability Services

Main objectives of the Project help terminology managers in public institutions improve the quality of their terminological resources The web-based, interactive terminology service is workflow oriented and provides input and feedback from best practices in legal terminology management. To facilitate data expansion and terminological enrichment (adding more data such as definitions, concept relations; language expansion; re-purposing of data; working towards communicative goals such as transparency, clarity, precision, uniqueness, etc. Harmonisation work is supported upon request by and in cooperation with data owners The project has received funding from the European Community (ICT-PSP 4th call) under Grant Agreement n 270917.

Technological approach Service Definitions The LISE Web Application design follows the SOA (Service-oriented architecture) All LISE Services follow W3C definitions and the Web Services Description Language (WSDL). Platforms LISE Services are designed to be platform-independent

WP 3- Legal Terminologies and Workflows Tasks Analysis of existing workflows of terminology work Linking terminology workflows to test scenarios of the LISE web services Preparing a best practice guideline for optimised terminology management workflows Deliverables D3.1 Report Analysis of Existing Terminology Workflows D3.2 Report Workflow Management for LISE D3.3 Guidelines for Collaborative Terminology Work

Conclusions not to separate or de-contextualize terminological resources from their natural habitats in order to use their full potential in further use (e.g. to explore the real meaning of a term in a particular occurrence) Interoperability is still an abstract notion in most cases: TBX needs better tools embedded in workflows Towards an ecosystem of terminological web services human dimension: pragmatic interoperability still missing > mutual understanding of people > socio-cognitive dimension to be focused on in R&D Integrative approaches in language industry

Thank you for your attention! Questions & Answers More information (publications, web links, resources, etc.) available in the afternoon session and upon request later on (by e-mail) Gerhard Budin Center for Translation Studies, University of Vienna Institute of Corpus Linguistics and Text Technology Austrian Academy of Sciences gerhard.budin@univie.ac.at