CLARIN-PL Research User-driven Language Technology Infrastructure

Size: px
Start display at page:

Download "CLARIN-PL Research User-driven Language Technology Infrastructure"

Transcription

1 Research User-driven Language Technology Infrastructure Maciej Piasecki Wrocław University of Technology G4.19 Research Group

2 Basic Notions Language Technology (LT) language resources and tools robust in terms of quality and coverage multipurpose component based Language Technology Infrastructure a software framework (architecture or platform) for combining language tools with language resources into processing chains (or pipelines) the defined processing chains are next applied to language data sources interoperability, also with the external systems

3 LT in Humanities and Social Sciences: Barriers Physical language tools and resources are not accessible in Internet Informational descriptions are not available or there is no means for searching Technological lack of commonly accepted standards for LT, lack of a common platform, varieties of technological solutions, insufficient users computers Related to knowledge the use of LT requires programming skills or knowledge from the area of natural language engineering Legal licences for language resources and tools (LRTs) limit their applications

4 CLARIN Support for Humanities & Social Sciences CLARIN is ERIC type consortium of 11 countries (Austria, Bulgaria, Czech Republic, Denmark, Estonia, Germany, Lithuania, The Netherlands, Poland, Portugal, Sweden) and The Dutch Language Union 1 observer: Norway Focus area: Supporting research in Humanities and Social Sciences Users: researchers, PhD students, students and scientific institutions CLARIN Mission To significantly lower the barriers for the use of Language Technology in Humanities & Social Sciences (H&SS) To facilitate or enable research methods based on automated analysis of text and speech resources

5 CLARIN Offer Integration of different LT components into one interoperable system Common, flexible meta-data standard (CMDI) Central searching for resources (Virtual Language Observatory) One sign on and one login into the distributed infrastructure Decreased Physical and Informational Barriers Common standards: promoting, co-ordinating, harmonising Web Services for Language Tools and Resources Decreased Technological Barrier Installation-free, access via Web Applications Decreased Knowledge Barrier Common licences and promotion of the open access Decreased Legal Barrier

6 CLARIN: Portal

7 CLARIN: Virtual Language Observatory

8 CLARIN: Federated Content Search Searching Corpora

9 LTI Development Paradigms Bottom-up a collected offer approach based on linking together the already existing Language Resources and Tools focused on accessibility, technical interoperability and processing chains Top-down following on user-centred design paradigm research applications for H&SS are a starting point Bi-directional linking of Language Resources and Tools combined with the development of research applications

10 Bi-directional LTI Development Idea development of the necessary elements a distributed network infrastructure basic LT processing chain combined with user-centred approach to the development of research applications Top-down part close co-operation with key users from the H&SS domain a metaphor of the Agile-like light weight software designing method with emphasis to prototyping amendments to the shape of the technical basis: LRTs, standards, inspirations, identification of the further user needs, next iterations

11 : the Consortium Polish scientific consortium Wrocław University of Technology, G4.19 Research Group Institute of Computer Science, Polish Academy of Science Polish-Japanese Institute of Information Technology, Chair of Multimedia University of Łódź, PELCRA group at Chair of English Language and Applied Linguistics Institute of Slavic Studies, Polish Academy of Science Wrocław University Goal: implementation of the Polish part of the CLARIN ERIC LTI Follows the bi-directional approach to LTI development

12 : Mission Starting point Several publicly available language resources and tools for Polish, But still many were lacking Deeper technological barrier: restricted applications Pillars: Language Technology Centre the Polish node of the CLARIN distributed infrastructure Complete set of the basic Language Resources & Tools for Polish Research applications for H&SS first set for key users and selected H&SS sub-domains.

13 Language Technology Centre Location in Wrocław University of Technology based on modified D-Space system from Lindat (Czech CLARIN) One sign-on, one login (a member of the Pioneer.id Federation) Advanced repository system for language resources Persistent Identifiers for resources and tools Rich CMDI meta-data CLARIN wide visibility in the central search Interface for Federated Content Search depositing service for researchers from H&SS application for the Data Seal of Approval Adherence to all CLARIN specifications about standards and protocols Web Services for LRTs: the basic processing chain of Polish Prototype system for flexible composition of the natural language processing chains support for developers SOAP & REST interfaces Web Applications for LRTs Knowledge Sharing: expertise and support for the users

14 : Language Resources 1. Polish Morphological Dictionary 2. Polish Speech Corpora 3. Annotated Polish Corpora 4. Bilingual Corpora 5. Polish Historical Corpus 6. Semantic lexicon Wordnet for Polish formal description of lexical meanings 7. Dictionary of Multiword Expressions 8. Bilingual semantic lexicon 9. Lexicon of Proper Names 10.Syntactic-semantic Valency Dictionary 11.Robust syntactic-semantic grammar

15 : Language Resources 1. Polish Morphological Dictionary 2. Polish Speech Corpora 3. Annotated Polish Corpora 4. Bilingual Corpora 5. Polish Historical Corpus 6. Semantic lexicon plwordnet 3.0 formal description of lexical meanings 7. Dictionary of Multiword Expressions 8. Bilingual semantic lexicon 9. Lexicon of Proper Names 10.Syntactic-semantic Valency Dictionary: 11.Robust syntactic-semantic grammar

16 : Language Resources Starting point a set of large resources a huge National Corpus of Polish (1 billion tokens) plwordnet 2.1 a very large wordnet for Polish Korpus Politechniki Wrocławskiej an open Polish corpus with rich annotation Expanded resources plwordnet 3.0 a huge semantic lexicon of Polish a comprehensive description of the Polish lexico-semantic system (~ lemmas, ~ senses) fully mapped to English Princeton WordNet described formally by mapping to an ontology Dictionary of multiword expressions described syntactically NELexicon 2.0 a huge lexicon of Polish Proper Names (2.5 mln)

17 : Language Resources for Polish Expanded resources Conversational corpus (following PELCRA and NKJP) A large semantic valency lexicon for Polish predicative lexical units Newly built resources Transcribed training-testing Polish speech corpus Bi-lingual corpora: Polish-English, Polish-Bulgarian-Russian, Polish-Lithuanian Polish historical corpus (for the years ) Corpora annotated for: meta-data, anaphora, time expressions, spatial expressions, semantic relations and situations

18 plwordnet 2.2 in

19 plwordnet 2.2 in

20 : Language Tools for Polish Systems for searching corpora, especially Polish corpora Spokes for conversational and bilingual corpora Poliqarp 2.0 for richly annotated Historical corpora [New] Text mining (information extraction) Recognition and classification of Proper Names Recognition of anaphoric links Recognition and classification of time expressions and spatial expressions [New] Situation recognition [New] Extraction of multiword expressions (collocations) A generic set of morpho-syntactic tools for Polish that can be adapted to a domain specified by the user [New]

21 : Language Tools for Polish Word Sense Disambiguation based on plwordnet Shallow semantic parser [New] Deep syntactic-semantic parser [New] Tools for the extraction of the semantic-pragmatic information from documents and collections of documents, e.g. keywords [New], semantic relations between text fragments and text summaries

22 Basic Language Tools for Polish 1. Segmentation into tokens and sentences 2. Morphological analysis 3. Morphological guessing of unknown words (both without context and context sensitive) 4. Morpho-syntactic tagging 5. Word Sense Disambiguation 6. Chunker and shallow syntactic parser 7. Named Entity Recognition and disambiguation 8. Co-reference and anaphora resolution 9. Temporal expression recognition 10. Semantic relation recognition 11. Event recognition 12. Shallow semantic parser 13. Deep syntactic parser with disambiguated output: dependency and constituent 14. Deep semantic parser

23 Basic Language Tools for Polish 1. Segmentation into tokens and sentences 2. Morphological analysis 3. Morphological guessing of unknown words (both without context and context sensitive) 4. Morpho-syntactic tagging 5. Word Sense Disambiguation 6. Chunker and shallow syntactic parser 7. Named Entity Recognition and disambiguation 8. Co-reference and anaphora resolution 9. Temporal expression recognition 10. Semantic relation recognition 11. Event recognition 12. Shallow semantic parser 13. Deep syntactic parser with disambiguated output: dependency and constituent 14. Deep semantic parser

24 Basic Language Tools for Polish 1. Segmentation into tokens and sentences 2. Morphological analysis 3. Morphological guessing of unknown words (both without context and context sensitive) 4. Morpho-syntactic tagging 5. Word Sense Disambiguation 6. Chunker and shallow syntactic parser 7. Named Entity Recognition and disambiguation 8. Co-reference and anaphora resolution 9. Temporal expression recognition 10. Semantic relation recognition 11. Event recognition 12. Shallow semantic parser 13. Deep syntactic parser with disambiguated output: dependency and constituent 14. Deep semantic parser

25 : Processing Chain for Polish

26 : Recognition and classification of Proper Names

27 Bi-directional - Top-down Part: First Applications Approaching users already active, interested, working on large textual and speech resources, covering a maximal variety of research areas, e.g. linguistics, literary studies, psychology, political studies and sociology matching the available language tools for Polish the first set of several prototype application illustrating possibilities and facilitating identification of the needs First applications Spokes searching corpora of conversational data A system for collecting Polish text corpora from the Web A open textometric and stylometric system focused on Polish Semantic text classification for sociology Literary Map

28 Spokes (University of Łódź)

29 System for Collecting Polish Text Corpora from the Web Requests from the users revealed gaps in the available technology existing corpus building systems were too sensitive to text encoding errors found in the web not designed for informal corpora like blogs A system for collecting Polish text corpora from the Web had to be constructed: based on tools from the Masaryk University in Brno to detect texts including larger number of errors (by morphological analysis) supports semi-automated extraction of texts from blogs, posts on forums, etc. integrated with tools for processing

30 Open Textometric and Stylometric System System designed for characteristic features of Polish like rich inflection, weakly constrained word order Based on several existing components including Stylo (Eder & Rybicki) Enabling the use of features defined on any level of the linguistic structure: from the level of word forms up to the level of the semantic-pragmatic structures. Available as Web Application and a Web Service Stylometric techniques appear to be applicable in many tasks of H&SS sociology (characteristic features that are for different subgroups), political studies (similarity and differences between political parties), literary studies

31 Semantic Text Classification for Sociology Users: Collegium Civitas, Warsaw Goal Support for large scale analysis of the source materials Automatically annotate documents and text fragments with pre-defined semantic categories Definition of categories by examples Automated semantic grouping of documents and text fragments Support for Corpus building Manual annotation of the learning sub-corpus Automated annotation process Statistical analysis of the results

32 GeTClasS Generalised Text Classification for Sociology

33 Literary Map Users: Digital Humanities Centre of The Institute of Literary Research (Polish Academy of Sciences) Goal Support for using maps in the literary criticism Tool for the identification of all geographical names in the literary text (or a corpus) and mapping them onto a geographical map Tasks 1.Identification and semantic classification of the referring language expressions 2.Disambiguation of the referents 3.Mapping the referents onto a map (geo-location) 4.Recognition of the semantic relations and statistical analysis

34 Literary Map

35 Conclusions Application of LT to the research in Humanities & Social Sciences seem to be much more challenging than in commercial systems! LT for Polish achieved a stage in which valuable support can be provided for research applications Bi-directional approach combines development of the basic, universal set of language tools and resources with inspirations from the research applications

36 Thank you very much for your attention! Supported by the Polish Ministry of Science and Higher Education []

37 Bi-directional: bottom-up part PALC 2014 Łódź LRTs and LRT chains can be useful if the required tools and resources exist, and, they are robust! What is the minimal set of LRTs? What kind of LRTs can be called robust? automated applications in H&SS seem to require high quality of language tools and mostly large coverage of resource BLARK The Basic Language Resource Kit the minimal set of language resources that is necessary to do any precompetitive research and education at all (Krauwer, 2003) and also basic processing chains possible reference point to compare LRTs for different languages

The CESAR Project: Enabling LRT for 70M+ Speakers

The CESAR Project: Enabling LRT for 70M+ Speakers The CESAR Project: Enabling LRT for 70M+ Speakers Marko Tadić University of Zagreb, Faculty of Humanities and Social Sciences Zagreb, Croatia marko.tadic@ffzg.hr META-FORUM 2011 Budapest, Hungary, 2011-06-28

More information

Specification and Evaluation of Machine Translation Toy Systems - Criteria for laboratory assignments

Specification and Evaluation of Machine Translation Toy Systems - Criteria for laboratory assignments Specification and Evaluation of Machine Translation Toy Systems - Criteria for laboratory assignments Cristina Vertan, Walther v. Hahn University of Hamburg, Natural Language Systems Division Hamburg,

More information

AQUA: An Ontology-Driven Question Answering System

AQUA: An Ontology-Driven Question Answering System AQUA: An Ontology-Driven Question Answering System Maria Vargas-Vera, Enrico Motta and John Domingue Knowledge Media Institute (KMI) The Open University, Walton Hall, Milton Keynes, MK7 6AA, United Kingdom.

More information

Applications of memory-based natural language processing

Applications of memory-based natural language processing Applications of memory-based natural language processing Antal van den Bosch and Roser Morante ILK Research Group Tilburg University Prague, June 24, 2007 Current ILK members Principal investigator: Antal

More information

Semi-supervised methods of text processing, and an application to medical concept extraction. Yacine Jernite Text-as-Data series September 17.

Semi-supervised methods of text processing, and an application to medical concept extraction. Yacine Jernite Text-as-Data series September 17. Semi-supervised methods of text processing, and an application to medical concept extraction Yacine Jernite Text-as-Data series September 17. 2015 What do we want from text? 1. Extract information 2. Link

More information

Project in the framework of the AIM-WEST project Annotation of MWEs for translation

Project in the framework of the AIM-WEST project Annotation of MWEs for translation Project in the framework of the AIM-WEST project Annotation of MWEs for translation 1 Agnès Tutin LIDILEM/LIG Université Grenoble Alpes 30 october 2014 Outline 2 Why annotate MWEs in corpora? A first experiment

More information

MULTILINGUAL INFORMATION ACCESS IN DIGITAL LIBRARY

MULTILINGUAL INFORMATION ACCESS IN DIGITAL LIBRARY MULTILINGUAL INFORMATION ACCESS IN DIGITAL LIBRARY Chen, Hsin-Hsi Department of Computer Science and Information Engineering National Taiwan University Taipei, Taiwan E-mail: hh_chen@csie.ntu.edu.tw Abstract

More information

Developing a TT-MCTAG for German with an RCG-based Parser

Developing a TT-MCTAG for German with an RCG-based Parser Developing a TT-MCTAG for German with an RCG-based Parser Laura Kallmeyer, Timm Lichte, Wolfgang Maier, Yannick Parmentier, Johannes Dellert University of Tübingen, Germany CNRS-LORIA, France LREC 2008,

More information

Modeling full form lexica for Arabic

Modeling full form lexica for Arabic Modeling full form lexica for Arabic Susanne Alt Amine Akrout Atilf-CNRS Laurent Romary Loria-CNRS Objectives Presentation of the current standardization activity in the domain of lexical data modeling

More information

Enhancing Unlexicalized Parsing Performance using a Wide Coverage Lexicon, Fuzzy Tag-set Mapping, and EM-HMM-based Lexical Probabilities

Enhancing Unlexicalized Parsing Performance using a Wide Coverage Lexicon, Fuzzy Tag-set Mapping, and EM-HMM-based Lexical Probabilities Enhancing Unlexicalized Parsing Performance using a Wide Coverage Lexicon, Fuzzy Tag-set Mapping, and EM-HMM-based Lexical Probabilities Yoav Goldberg Reut Tsarfaty Meni Adler Michael Elhadad Ben Gurion

More information

DICTE PLATFORM: AN INPUT TO COLLABORATION AND KNOWLEDGE SHARING

DICTE PLATFORM: AN INPUT TO COLLABORATION AND KNOWLEDGE SHARING DICTE PLATFORM: AN INPUT TO COLLABORATION AND KNOWLEDGE SHARING Annalisa Terracina, Stefano Beco ElsagDatamat Spa Via Laurentina, 760, 00143 Rome, Italy Adrian Grenham, Iain Le Duc SciSys Ltd Methuen Park

More information

National Academies STEM Workforce Summit

National Academies STEM Workforce Summit National Academies STEM Workforce Summit September 21-22, 2015 Irwin Kirsch Director, Center for Global Assessment PIAAC and Policy Research ETS Policy Research using PIAAC data America s Skills Challenge:

More information

Linking Task: Identifying authors and book titles in verbose queries

Linking Task: Identifying authors and book titles in verbose queries Linking Task: Identifying authors and book titles in verbose queries Anaïs Ollagnier, Sébastien Fournier, and Patrice Bellot Aix-Marseille University, CNRS, ENSAM, University of Toulon, LSIS UMR 7296,

More information

AUTHORING E-LEARNING CONTENT TRENDS AND SOLUTIONS

AUTHORING E-LEARNING CONTENT TRENDS AND SOLUTIONS AUTHORING E-LEARNING CONTENT TRENDS AND SOLUTIONS Danail Dochev 1, Radoslav Pavlov 2 1 Institute of Information Technologies Bulgarian Academy of Sciences Bulgaria, Sofia 1113, Acad. Bonchev str., Bl.

More information

Intension, Attitude, and Tense Annotation in a High-Fidelity Semantic Representation

Intension, Attitude, and Tense Annotation in a High-Fidelity Semantic Representation Intension, Attitude, and Tense Annotation in a High-Fidelity Semantic Representation Gene Kim and Lenhart Schubert Presented by: Gene Kim April 2017 Project Overview Project: Annotate a large, topically

More information

Towards a Machine-Learning Architecture for Lexical Functional Grammar Parsing. Grzegorz Chrupa la

Towards a Machine-Learning Architecture for Lexical Functional Grammar Parsing. Grzegorz Chrupa la Towards a Machine-Learning Architecture for Lexical Functional Grammar Parsing Grzegorz Chrupa la A dissertation submitted in fulfilment of the requirements for the award of Doctor of Philosophy (Ph.D.)

More information

Introduction to Text Mining

Introduction to Text Mining Prelude Overview Introduction to Text Mining Tutorial at EDBT 06 René Witte Faculty of Informatics Institute for Program Structures and Data Organization (IPD) Universität Karlsruhe, Germany http://rene-witte.net

More information

Development of the First LRs for Macedonian: Current Projects

Development of the First LRs for Macedonian: Current Projects Development of the First LRs for Macedonian: Current Projects Ruska Ivanovska-Naskova Faculty of Philology- University St. Cyril and Methodius Bul. Krste Petkov Misirkov bb, 1000 Skopje, Macedonia rivanovska@flf.ukim.edu.mk

More information

1. Introduction. 2. The OMBI database editor

1. Introduction. 2. The OMBI database editor OMBI bilingual lexical resources: Arabic-Dutch / Dutch-Arabic Carole Tiberius, Anna Aalstein, Instituut voor Nederlandse Lexicologie Jan Hoogland, Nederlands Instituut in Marokko (NIMAR) In this paper

More information

PIRLS. International Achievement in the Processes of Reading Comprehension Results from PIRLS 2001 in 35 Countries

PIRLS. International Achievement in the Processes of Reading Comprehension Results from PIRLS 2001 in 35 Countries Ina V.S. Mullis Michael O. Martin Eugenio J. Gonzalez PIRLS International Achievement in the Processes of Reading Comprehension Results from PIRLS 2001 in 35 Countries International Study Center International

More information

Lessons from a Massive Open Online Course (MOOC) on Natural Language Processing for Digital Humanities

Lessons from a Massive Open Online Course (MOOC) on Natural Language Processing for Digital Humanities Lessons from a Massive Open Online Course (MOOC) on Natural Language Processing for Digital Humanities Simon Clematide, Isabel Meraner, Noah Bubenhofer, Martin Volk Institute of Computational Linguistics

More information

Web as Corpus. Corpus Linguistics. Web as Corpus 1 / 1. Corpus Linguistics. Web as Corpus. web.pl 3 / 1. Sketch Engine. Corpus Linguistics

Web as Corpus. Corpus Linguistics. Web as Corpus 1 / 1. Corpus Linguistics. Web as Corpus. web.pl 3 / 1. Sketch Engine. Corpus Linguistics (L615) Markus Dickinson Department of Linguistics, Indiana University Spring 2013 The web provides new opportunities for gathering data Viable source of disposable corpora, built ad hoc for specific purposes

More information

Target Language Preposition Selection an Experiment with Transformation-Based Learning and Aligned Bilingual Data

Target Language Preposition Selection an Experiment with Transformation-Based Learning and Aligned Bilingual Data Target Language Preposition Selection an Experiment with Transformation-Based Learning and Aligned Bilingual Data Ebba Gustavii Department of Linguistics and Philology, Uppsala University, Sweden ebbag@stp.ling.uu.se

More information

Multilingual Sentiment and Subjectivity Analysis

Multilingual Sentiment and Subjectivity Analysis Multilingual Sentiment and Subjectivity Analysis Carmen Banea and Rada Mihalcea Department of Computer Science University of North Texas rada@cs.unt.edu, carmen.banea@gmail.com Janyce Wiebe Department

More information

Using Semantic Relations to Refine Coreference Decisions

Using Semantic Relations to Refine Coreference Decisions Using Semantic Relations to Refine Coreference Decisions Heng Ji David Westbrook Ralph Grishman Department of Computer Science New York University New York, NY, 10003, USA hengji@cs.nyu.edu westbroo@cs.nyu.edu

More information

Exploiting Phrasal Lexica and Additional Morpho-syntactic Language Resources for Statistical Machine Translation with Scarce Training Data

Exploiting Phrasal Lexica and Additional Morpho-syntactic Language Resources for Statistical Machine Translation with Scarce Training Data Exploiting Phrasal Lexica and Additional Morpho-syntactic Language Resources for Statistical Machine Translation with Scarce Training Data Maja Popović and Hermann Ney Lehrstuhl für Informatik VI, Computer

More information

Semantic Inference at the Lexical-Syntactic Level for Textual Entailment Recognition

Semantic Inference at the Lexical-Syntactic Level for Textual Entailment Recognition Semantic Inference at the Lexical-Syntactic Level for Textual Entailment Recognition Roy Bar-Haim,Ido Dagan, Iddo Greental, Idan Szpektor and Moshe Friedman Computer Science Department, Bar-Ilan University,

More information

CREATING SHARABLE LEARNING OBJECTS FROM EXISTING DIGITAL COURSE CONTENT

CREATING SHARABLE LEARNING OBJECTS FROM EXISTING DIGITAL COURSE CONTENT CREATING SHARABLE LEARNING OBJECTS FROM EXISTING DIGITAL COURSE CONTENT Rajendra G. Singh Margaret Bernard Ross Gardler rajsingh@tstt.net.tt mbernard@fsa.uwi.tt rgardler@saafe.org Department of Mathematics

More information

Impact of Educational Reforms to International Cooperation CASE: Finland

Impact of Educational Reforms to International Cooperation CASE: Finland Impact of Educational Reforms to International Cooperation CASE: Finland February 11, 2016 10 th Seminar on Cooperation between Russian and Finnish Institutions of Higher Education Tiina Vihma-Purovaara

More information

The recognition, evaluation and accreditation of European Postgraduate Programmes.

The recognition, evaluation and accreditation of European Postgraduate Programmes. 1 The recognition, evaluation and accreditation of European Postgraduate Programmes. Sue Lawrence and Nol Reverda Introduction The validation of awards and courses within higher education has traditionally,

More information

SOCRATES PROGRAMME GUIDELINES FOR APPLICANTS

SOCRATES PROGRAMME GUIDELINES FOR APPLICANTS SOCRATES PROGRAMME GUIDELINES FOR APPLICANTS The present document contains a description of the financial support available under all parts of the Community action programme in the field of education,

More information

Using dialogue context to improve parsing performance in dialogue systems

Using dialogue context to improve parsing performance in dialogue systems Using dialogue context to improve parsing performance in dialogue systems Ivan Meza-Ruiz and Oliver Lemon School of Informatics, Edinburgh University 2 Buccleuch Place, Edinburgh I.V.Meza-Ruiz@sms.ed.ac.uk,

More information

AN INTRODUCTION (2 ND ED.) (LONDON, BLOOMSBURY ACADEMIC PP. VI, 282)

AN INTRODUCTION (2 ND ED.) (LONDON, BLOOMSBURY ACADEMIC PP. VI, 282) B. PALTRIDGE, DISCOURSE ANALYSIS: AN INTRODUCTION (2 ND ED.) (LONDON, BLOOMSBURY ACADEMIC. 2012. PP. VI, 282) Review by Glenda Shopen _ This book is a revised edition of the author s 2006 introductory

More information

THE VERB ARGUMENT BROWSER

THE VERB ARGUMENT BROWSER THE VERB ARGUMENT BROWSER Bálint Sass sass.balint@itk.ppke.hu Péter Pázmány Catholic University, Budapest, Hungary 11 th International Conference on Text, Speech and Dialog 8-12 September 2008, Brno PREVIEW

More information

Europeana Creative. Bringing Cultural Heritage Institutions and Creative Industries Europeana Day, April 11, 2014 Zagreb

Europeana Creative. Bringing Cultural Heritage Institutions and Creative Industries Europeana Day, April 11, 2014 Zagreb Europeana Creative Bringing Cultural Heritage Institutions and Creative Industries Together @ecreativeeu Europeana Day, April 11, 2014 Zagreb What is Europeana Creative? Europeana Creative in a Nutshell

More information

Vocabulary Usage and Intelligibility in Learner Language

Vocabulary Usage and Intelligibility in Learner Language Vocabulary Usage and Intelligibility in Learner Language Emi Izumi, 1 Kiyotaka Uchimoto 1 and Hitoshi Isahara 1 1. Introduction In verbal communication, the primary purpose of which is to convey and understand

More information

The taming of the data:

The taming of the data: The taming of the data: Using text mining in building a corpus for diachronic analysis Stefania Degaetano-Ortlieb, Hannah Kermes, Ashraf Khamis, Jörg Knappen, Noam Ordan and Elke Teich Background Big data

More information

An Introduction to the Minimalist Program

An Introduction to the Minimalist Program An Introduction to the Minimalist Program Luke Smith University of Arizona Summer 2016 Some findings of traditional syntax Human languages vary greatly, but digging deeper, they all have distinct commonalities:

More information

The Internet as a Normative Corpus: Grammar Checking with a Search Engine

The Internet as a Normative Corpus: Grammar Checking with a Search Engine The Internet as a Normative Corpus: Grammar Checking with a Search Engine Jonas Sjöbergh KTH Nada SE-100 44 Stockholm, Sweden jsh@nada.kth.se Abstract In this paper some methods using the Internet as a

More information

Automated Identification of Domain Preferences of Collocations

Automated Identification of Domain Preferences of Collocations Automated Identification of Domain Preferences of Collocations Jelena Kallas 1, Vit Suchomel 2, Maria Khokhlova 3 1 Institute of the Estonian Language, Estonia 2 Masaryk University, Czech Republic 3 St.

More information

The stages of event extraction

The stages of event extraction The stages of event extraction David Ahn Intelligent Systems Lab Amsterdam University of Amsterdam ahn@science.uva.nl Abstract Event detection and recognition is a complex task consisting of multiple sub-tasks

More information

ehealth Governance Initiative: Joint Action JA-EHGov & Thematic Network SEHGovIA DELIVERABLE Version: 2.4 Date:

ehealth Governance Initiative: Joint Action JA-EHGov & Thematic Network SEHGovIA DELIVERABLE Version: 2.4 Date: ehealth Governance Initiative: Joint Action JA-EHGov & Thematic Network SEHGovIA DELIVERABLE JA D4.1.1 Strategy & Policy Alignment Documents I WP4 (JA) - Policy Development and Strategy Alignment Version:

More information

GALICIAN TEACHERS PERCEPTIONS ON THE USABILITY AND USEFULNESS OF THE ODS PORTAL

GALICIAN TEACHERS PERCEPTIONS ON THE USABILITY AND USEFULNESS OF THE ODS PORTAL The Fifth International Conference on e-learning (elearning-2014), 22-23 September 2014, Belgrade, Serbia GALICIAN TEACHERS PERCEPTIONS ON THE USABILITY AND USEFULNESS OF THE ODS PORTAL SONIA VALLADARES-RODRIGUEZ

More information

EdIt: A Broad-Coverage Grammar Checker Using Pattern Grammar

EdIt: A Broad-Coverage Grammar Checker Using Pattern Grammar EdIt: A Broad-Coverage Grammar Checker Using Pattern Grammar Chung-Chi Huang Mei-Hua Chen Shih-Ting Huang Jason S. Chang Institute of Information Systems and Applications, National Tsing Hua University,

More information

The Survey of Adult Skills (PIAAC) provides a picture of adults proficiency in three key information-processing skills:

The Survey of Adult Skills (PIAAC) provides a picture of adults proficiency in three key information-processing skills: SPAIN Key issues The gap between the skills proficiency of the youngest and oldest adults in Spain is the second largest in the survey. About one in four adults in Spain scores at the lowest levels in

More information

A Framework for Customizable Generation of Hypertext Presentations

A Framework for Customizable Generation of Hypertext Presentations A Framework for Customizable Generation of Hypertext Presentations Benoit Lavoie and Owen Rambow CoGenTex, Inc. 840 Hanshaw Road, Ithaca, NY 14850, USA benoit, owen~cogentex, com Abstract In this paper,

More information

EOSC Governance Development Forum 4 May 2017 Per Öster

EOSC Governance Development Forum 4 May 2017 Per Öster EOSC Governance Development Forum 4 May 2017 Per Öster per.oster@csc.fi Governance Development Forum Enable stakeholders to contribute to the governance development A platform for information, dialogue,

More information

Introduction Research Teaching Cooperation Faculties. University of Oulu

Introduction Research Teaching Cooperation Faculties. University of Oulu University of Oulu Founded in 1958 faculties 1 000 students 2900 employees Total funding EUR 22 million Among the largest universities in Finland with an exceptionally wide scientific base Three universities

More information

SEMAFOR: Frame Argument Resolution with Log-Linear Models

SEMAFOR: Frame Argument Resolution with Log-Linear Models SEMAFOR: Frame Argument Resolution with Log-Linear Models Desai Chen or, The Case of the Missing Arguments Nathan Schneider SemEval July 16, 2010 Dipanjan Das School of Computer Science Carnegie Mellon

More information

Postprint.

Postprint. http://www.diva-portal.org Postprint This is the accepted version of a paper presented at CLEF 2013 Conference and Labs of the Evaluation Forum Information Access Evaluation meets Multilinguality, Multimodality,

More information

Requirements-Gathering Collaborative Networks in Distributed Software Projects

Requirements-Gathering Collaborative Networks in Distributed Software Projects Requirements-Gathering Collaborative Networks in Distributed Software Projects Paula Laurent and Jane Cleland-Huang Systems and Requirements Engineering Center DePaul University {plaurent, jhuang}@cs.depaul.edu

More information

A Bayesian Learning Approach to Concept-Based Document Classification

A Bayesian Learning Approach to Concept-Based Document Classification Databases and Information Systems Group (AG5) Max-Planck-Institute for Computer Science Saarbrücken, Germany A Bayesian Learning Approach to Concept-Based Document Classification by Georgiana Ifrim Supervisors

More information

2/15/13. POS Tagging Problem. Part-of-Speech Tagging. Example English Part-of-Speech Tagsets. More Details of the Problem. Typical Problem Cases

2/15/13. POS Tagging Problem. Part-of-Speech Tagging. Example English Part-of-Speech Tagsets. More Details of the Problem. Typical Problem Cases POS Tagging Problem Part-of-Speech Tagging L545 Spring 203 Given a sentence W Wn and a tagset of lexical categories, find the most likely tag T..Tn for each word in the sentence Example Secretariat/P is/vbz

More information

A Grammar for Battle Management Language

A Grammar for Battle Management Language Bastian Haarmann 1 Dr. Ulrich Schade 1 Dr. Michael R. Hieb 2 1 Fraunhofer Institute for Communication, Information Processing and Ergonomics 2 George Mason University bastian.haarmann@fkie.fraunhofer.de

More information

arxiv: v1 [cs.cl] 2 Apr 2017

arxiv: v1 [cs.cl] 2 Apr 2017 Word-Alignment-Based Segment-Level Machine Translation Evaluation using Word Embeddings Junki Matsuo and Mamoru Komachi Graduate School of System Design, Tokyo Metropolitan University, Japan matsuo-junki@ed.tmu.ac.jp,

More information

Executive summary (in English)

Executive summary (in English) Executive summary (in English) Project description The project "Open Educational Resources in institutional repositories has been carried out in collaboration between Göteborg university, University of

More information

Open Discovery Space: Unique Resources just a click away! Andy Galloway

Open Discovery Space: Unique Resources just a click away! Andy Galloway Open Discovery Space: Unique Resources just a click away! Andy Galloway Open Discovery Space Unique Resources just a click away! The European Reference Framework sets out eight key competences: 1. Communication

More information

Challenges for Higher Education in Europe: Socio-economic and Political Transformations

Challenges for Higher Education in Europe: Socio-economic and Political Transformations Challenges for Higher Education in Europe: Socio-economic and Political Transformations Steinhardt Institute NYU 15 June, 2017 Peter Maassen US governance of higher education EU governance of higher

More information

Learning Methods in Multilingual Speech Recognition

Learning Methods in Multilingual Speech Recognition Learning Methods in Multilingual Speech Recognition Hui Lin Department of Electrical Engineering University of Washington Seattle, WA 98125 linhui@u.washington.edu Li Deng, Jasha Droppo, Dong Yu, and Alex

More information

PROCESS USE CASES: USE CASES IDENTIFICATION

PROCESS USE CASES: USE CASES IDENTIFICATION International Conference on Enterprise Information Systems, ICEIS 2007, Volume EIS June 12-16, 2007, Funchal, Portugal. PROCESS USE CASES: USE CASES IDENTIFICATION Pedro Valente, Paulo N. M. Sampaio Distributed

More information

Tailoring i EW-MFA (Economy-Wide Material Flow Accounting/Analysis) information and indicators

Tailoring i EW-MFA (Economy-Wide Material Flow Accounting/Analysis) information and indicators Tailoring i EW-MFA (Economy-Wide Material Flow Accounting/Analysis) information and indicators to developing Asia: increasing research capacity and stimulating policy demand for resource productivity Chika

More information

Distant Supervised Relation Extraction with Wikipedia and Freebase

Distant Supervised Relation Extraction with Wikipedia and Freebase Distant Supervised Relation Extraction with Wikipedia and Freebase Marcel Ackermann TU Darmstadt ackermann@tk.informatik.tu-darmstadt.de Abstract In this paper we discuss a new approach to extract relational

More information

Intra-talker Variation: Audience Design Factors Affecting Lexical Selections

Intra-talker Variation: Audience Design Factors Affecting Lexical Selections Tyler Perrachione LING 451-0 Proseminar in Sound Structure Prof. A. Bradlow 17 March 2006 Intra-talker Variation: Audience Design Factors Affecting Lexical Selections Abstract Although the acoustic and

More information

The Discourse Anaphoric Properties of Connectives

The Discourse Anaphoric Properties of Connectives The Discourse Anaphoric Properties of Connectives Cassandre Creswell, Kate Forbes, Eleni Miltsakaki, Rashmi Prasad, Aravind Joshi Λ, Bonnie Webber y Λ University of Pennsylvania 3401 Walnut Street Philadelphia,

More information

Prediction of Maximal Projection for Semantic Role Labeling

Prediction of Maximal Projection for Semantic Role Labeling Prediction of Maximal Projection for Semantic Role Labeling Weiwei Sun, Zhifang Sui Institute of Computational Linguistics Peking University Beijing, 100871, China {ws, szf}@pku.edu.cn Haifeng Wang Toshiba

More information

BYLINE [Heng Ji, Computer Science Department, New York University,

BYLINE [Heng Ji, Computer Science Department, New York University, INFORMATION EXTRACTION BYLINE [Heng Ji, Computer Science Department, New York University, hengji@cs.nyu.edu] SYNONYMS NONE DEFINITION Information Extraction (IE) is a task of extracting pre-specified types

More information

Extended Similarity Test for the Evaluation of Semantic Similarity Functions

Extended Similarity Test for the Evaluation of Semantic Similarity Functions Extended Similarity Test for the Evaluation of Semantic Similarity Functions Maciej Piasecki 1, Stanisław Szpakowicz 2,3, Bartosz Broda 1 1 Institute of Applied Informatics, Wrocław University of Technology,

More information

Language Independent Passage Retrieval for Question Answering

Language Independent Passage Retrieval for Question Answering Language Independent Passage Retrieval for Question Answering José Manuel Gómez-Soriano 1, Manuel Montes-y-Gómez 2, Emilio Sanchis-Arnal 1, Luis Villaseñor-Pineda 2, Paolo Rosso 1 1 Polytechnic University

More information

English Language and Applied Linguistics. Module Descriptions 2017/18

English Language and Applied Linguistics. Module Descriptions 2017/18 English Language and Applied Linguistics Module Descriptions 2017/18 Level I (i.e. 2 nd Yr.) Modules Please be aware that all modules are subject to availability. If you have any questions about the modules,

More information

MODERNISATION OF HIGHER EDUCATION PROGRAMMES IN THE FRAMEWORK OF BOLOGNA: ECTS AND THE TUNING APPROACH

MODERNISATION OF HIGHER EDUCATION PROGRAMMES IN THE FRAMEWORK OF BOLOGNA: ECTS AND THE TUNING APPROACH EUROPEAN CREDIT TRANSFER AND ACCUMULATION SYSTEM (ECTS): Priorities and challenges for Lithuanian Higher Education Vilnius 27 April 2011 MODERNISATION OF HIGHER EDUCATION PROGRAMMES IN THE FRAMEWORK OF

More information

Speech Recognition at ICSI: Broadcast News and beyond

Speech Recognition at ICSI: Broadcast News and beyond Speech Recognition at ICSI: Broadcast News and beyond Dan Ellis International Computer Science Institute, Berkeley CA Outline 1 2 3 The DARPA Broadcast News task Aspects of ICSI

More information

LQVSumm: A Corpus of Linguistic Quality Violations in Multi-Document Summarization

LQVSumm: A Corpus of Linguistic Quality Violations in Multi-Document Summarization LQVSumm: A Corpus of Linguistic Quality Violations in Multi-Document Summarization Annemarie Friedrich, Marina Valeeva and Alexis Palmer COMPUTATIONAL LINGUISTICS & PHONETICS SAARLAND UNIVERSITY, GERMANY

More information

UCEAS: User-centred Evaluations of Adaptive Systems

UCEAS: User-centred Evaluations of Adaptive Systems UCEAS: User-centred Evaluations of Adaptive Systems Catherine Mulwa, Séamus Lawless, Mary Sharp, Vincent Wade Knowledge and Data Engineering Group School of Computer Science and Statistics Trinity College,

More information

Accurate Unlexicalized Parsing for Modern Hebrew

Accurate Unlexicalized Parsing for Modern Hebrew Accurate Unlexicalized Parsing for Modern Hebrew Reut Tsarfaty and Khalil Sima an Institute for Logic, Language and Computation, University of Amsterdam Plantage Muidergracht 24, 1018TV Amsterdam, The

More information

2.1 The Theory of Semantic Fields

2.1 The Theory of Semantic Fields 2 Semantic Domains In this chapter we define the concept of Semantic Domain, recently introduced in Computational Linguistics [56] and successfully exploited in NLP [29]. This notion is inspired by the

More information

Natural Language Processing. George Konidaris

Natural Language Processing. George Konidaris Natural Language Processing George Konidaris gdk@cs.brown.edu Fall 2017 Natural Language Processing Understanding spoken/written sentences in a natural language. Major area of research in AI. Why? Humans

More information

LING 329 : MORPHOLOGY

LING 329 : MORPHOLOGY LING 329 : MORPHOLOGY TTh 10:30 11:50 AM, Physics 121 Course Syllabus Spring 2013 Matt Pearson Office: Vollum 313 Email: pearsonm@reed.edu Phone: 7618 (off campus: 503-517-7618) Office hrs: Mon 1:30 2:30,

More information

Evaluation of Learning Management System software. Part II of LMS Evaluation

Evaluation of Learning Management System software. Part II of LMS Evaluation Version DRAFT 1.0 Evaluation of Learning Management System software Author: Richard Wyles Date: 1 August 2003 Part II of LMS Evaluation Open Source e-learning Environment and Community Platform Project

More information

CS 598 Natural Language Processing

CS 598 Natural Language Processing CS 598 Natural Language Processing Natural language is everywhere Natural language is everywhere Natural language is everywhere Natural language is everywhere!"#$%&'&()*+,-./012 34*5665756638/9:;< =>?@ABCDEFGHIJ5KL@

More information

DEVELOPMENT OF A MULTILINGUAL PARALLEL CORPUS AND A PART-OF-SPEECH TAGGER FOR AFRIKAANS

DEVELOPMENT OF A MULTILINGUAL PARALLEL CORPUS AND A PART-OF-SPEECH TAGGER FOR AFRIKAANS DEVELOPMENT OF A MULTILINGUAL PARALLEL CORPUS AND A PART-OF-SPEECH TAGGER FOR AFRIKAANS Julia Tmshkina Centre for Text Techitology, North-West University, 253 Potchefstroom, South Africa 2025770@puk.ac.za

More information

Towards a MWE-driven A* parsing with LTAGs [WG2,WG3]

Towards a MWE-driven A* parsing with LTAGs [WG2,WG3] Towards a MWE-driven A* parsing with LTAGs [WG2,WG3] Jakub Waszczuk, Agata Savary To cite this version: Jakub Waszczuk, Agata Savary. Towards a MWE-driven A* parsing with LTAGs [WG2,WG3]. PARSEME 6th general

More information

AGENDA LEARNING THEORIES LEARNING THEORIES. Advanced Learning Theories 2/22/2016

AGENDA LEARNING THEORIES LEARNING THEORIES. Advanced Learning Theories 2/22/2016 AGENDA Advanced Learning Theories Alejandra J. Magana, Ph.D. admagana@purdue.edu Introduction to Learning Theories Role of Learning Theories and Frameworks Learning Design Research Design Dual Coding Theory

More information

Twenty years of TIMSS in England. NFER Education Briefings. What is TIMSS?

Twenty years of TIMSS in England. NFER Education Briefings. What is TIMSS? NFER Education Briefings Twenty years of TIMSS in England What is TIMSS? The Trends in International Mathematics and Science Study (TIMSS) is a worldwide research project run by the IEA 1. It takes place

More information

Some Principles of Automated Natural Language Information Extraction

Some Principles of Automated Natural Language Information Extraction Some Principles of Automated Natural Language Information Extraction Gregers Koch Department of Computer Science, Copenhagen University DIKU, Universitetsparken 1, DK-2100 Copenhagen, Denmark Abstract

More information

Department of Education and Skills. Memorandum

Department of Education and Skills. Memorandum Department of Education and Skills Memorandum Irish Students Performance in PISA 2012 1. Background 1.1. What is PISA? The Programme for International Student Assessment (PISA) is a project of the Organisation

More information

Treebank mining with GrETEL. Liesbeth Augustinus Frank Van Eynde

Treebank mining with GrETEL. Liesbeth Augustinus Frank Van Eynde Treebank mining with GrETEL Liesbeth Augustinus Frank Van Eynde GrETEL tutorial - 27 March, 2015 GrETEL Greedy Extraction of Trees for Empirical Linguistics Search engine for treebanks GrETEL Greedy Extraction

More information

Universiteit Leiden ICT in Business

Universiteit Leiden ICT in Business Universiteit Leiden ICT in Business Ranking of Multi-Word Terms Name: Ricardo R.M. Blikman Student-no: s1184164 Internal report number: 2012-11 Date: 07/03/2013 1st supervisor: Prof. Dr. J.N. Kok 2nd supervisor:

More information

ScienceDirect. Malayalam question answering system

ScienceDirect. Malayalam question answering system Available online at www.sciencedirect.com ScienceDirect Procedia Technology 24 (2016 ) 1388 1392 International Conference on Emerging Trends in Engineering, Science and Technology (ICETEST - 2015) Malayalam

More information

A Neural Network GUI Tested on Text-To-Phoneme Mapping

A Neural Network GUI Tested on Text-To-Phoneme Mapping A Neural Network GUI Tested on Text-To-Phoneme Mapping MAARTEN TROMPPER Universiteit Utrecht m.f.a.trompper@students.uu.nl Abstract Text-to-phoneme (T2P) mapping is a necessary step in any speech synthesis

More information

The Smart/Empire TIPSTER IR System

The Smart/Empire TIPSTER IR System The Smart/Empire TIPSTER IR System Chris Buckley, Janet Walz Sabir Research, Gaithersburg, MD chrisb,walz@sabir.com Claire Cardie, Scott Mardis, Mandar Mitra, David Pierce, Kiri Wagstaff Department of

More information

Ministry of Education, Republic of Palau Executive Summary

Ministry of Education, Republic of Palau Executive Summary Ministry of Education, Republic of Palau Executive Summary Student Consultant, Jasmine Han Community Partner, Edwel Ongrung I. Background Information The Ministry of Education is one of the eight ministries

More information

An Open Framework for Integrated Qualification Management Portals

An Open Framework for Integrated Qualification Management Portals An Open Framework for Integrated Qualification Management Portals Michael Fuchs, Claudio Muscogiuri, Claudia Niederée, Matthias Hemmje FhG IPSI D-64293 Darmstadt, Germany {fuchs,musco,niederee,hemmje}@ipsi.fhg.de

More information

Emergency Management Games and Test Case Utility:

Emergency Management Games and Test Case Utility: IST Project N 027568 IRRIIS Project Rome Workshop, 18-19 October 2006 Emergency Management Games and Test Case Utility: a Synthetic Methodological Socio-Cognitive Perspective Adam Maria Gadomski, ENEA

More information

Update on Soar-based language processing

Update on Soar-based language processing Update on Soar-based language processing Deryle Lonsdale (and the rest of the BYU NL-Soar Research Group) BYU Linguistics lonz@byu.edu Soar 2006 1 NL-Soar Soar 2006 2 NL-Soar developments Discourse/robotic

More information

Clumps and collection description in the information environment in the UK with particular reference to Scotland

Clumps and collection description in the information environment in the UK with particular reference to Scotland Clumps and collection description in the information environment in the UK with particular reference to Scotland Gordon Dunsire, Gordon Dunsire (g.dunsire@strath.ac) is Deputy Director, at the Centre for

More information

Interactive Corpus Annotation of Anaphor Using NLP Algorithms

Interactive Corpus Annotation of Anaphor Using NLP Algorithms Interactive Corpus Annotation of Anaphor Using NLP Algorithms Catherine Smith 1 and Matthew Brook O Donnell 1 1. Introduction Pronouns occur with a relatively high frequency in all forms English discourse.

More information

Summary and policy recommendations

Summary and policy recommendations Skills Beyond School Synthesis Report OECD 2014 Summary and policy recommendations The hidden world of professional education and training Post-secondary vocational education and training plays an under-recognised

More information

Trend Survey on Japanese Natural Language Processing Studies over the Last Decade

Trend Survey on Japanese Natural Language Processing Studies over the Last Decade Trend Survey on Japanese Natural Language Processing Studies over the Last Decade Masaki Murata, Koji Ichii, Qing Ma,, Tamotsu Shirado, Toshiyuki Kanamaru,, and Hitoshi Isahara National Institute of Information

More information

Software Maintenance

Software Maintenance 1 What is Software Maintenance? Software Maintenance is a very broad activity that includes error corrections, enhancements of capabilities, deletion of obsolete capabilities, and optimization. 2 Categories

More information

Chamilo 2.0: A Second Generation Open Source E-learning and Collaboration Platform

Chamilo 2.0: A Second Generation Open Source E-learning and Collaboration Platform Chamilo 2.0: A Second Generation Open Source E-learning and Collaboration Platform doi:10.3991/ijac.v3i3.1364 Jean-Marie Maes University College Ghent, Ghent, Belgium Abstract Dokeos used to be one of

More information