OCLC Research and Europeana

Similar documents
Europeana Creative. Bringing Cultural Heritage Institutions and Creative Industries Europeana Day, April 11, 2014 Zagreb

Protocols for building an Organic Chemical Ontology

OPEN ACCESS TO SCIENTIFIC RESULTS AND DATA. EUROPEAN UNION S EFFORTS THROUGH OPENAIRE AND OPENAIREPLUS FP7 PROJECTS: CYPRIOT PARTICIPATION

Memorandum. COMPNET memo. Introduction. References.

OVERVIEW Getty Center Richard Meier Robert Irwin J. Paul Getty Museum Getty Research Institute Getty Conservation Institute Getty Foundation

Davidson College Library Strategic Plan

THE ST. OLAF COLLEGE LIBRARIES FRAMEWORK FOR THE FUTURE

Seminar - Organic Computing

EOSC Governance Development Forum 4 May 2017 Per Öster

SME Academia cooperation in research projects in Research for the Benefit of SMEs within FP7 Capacities programme

AGENDA LEARNING THEORIES LEARNING THEORIES. Advanced Learning Theories 2/22/2016

Institutional repository policies: best practices for encouraging self-archiving

The Ohio State University Library System Improvement Request,

Development of a Library 2.0 service model for an African library

CWIS 23,3. Nikolaos Avouris Human Computer Interaction Group, University of Patras, Patras, Greece

On the Open Access Strategy of the Max Planck Society

ATENEA UPC AND THE NEW "Activity Stream" or "WALL" FEATURE Jesus Alcober 1, Oriol Sánchez 2, Javier Otero 3, Ramon Martí 4

DICE - Final Report. Project Information Project Acronym DICE Project Title

Designing e-learning materials with learning objects

The Future of Consortia among Indian Libraries - FORSA Consortium as Forerunner?

Shared Mental Models

Summary BEACON Project IST-FP

Study in Berlin at the HTW. Study in Berlin at the HTW

ODS Portal Share educational resources in communities Upload your educational content!

Evaluation of Usage Patterns for Web-based Educational Systems using Web Mining

Evaluation of Usage Patterns for Web-based Educational Systems using Web Mining

M-Learning. Hauptseminar E-Learning Sommersemester Michael Kellerer LFE Medieninformatik

May 23, sead-data.net

Use of Online Information Resources for Knowledge Organisation in Library and Information Centres: A Case Study of CUSAT

Ontological spine, localization and multilingual access

Researcher Development Assessment A: Knowledge and intellectual abilities

Knowledge Sharing Workshop, Tiel The Netherlands, 20 September 2016

LIBRARY AND RECORDS AND ARCHIVES SERVICES STRATEGIC PLAN 2016 to 2020

Investment in e- journals, use and research outcomes

Blended E-learning in the Architectural Design Studio

Orientation Workshop on Outcome Based Accreditation. May 21st, 2016

Teaching Colorado s Heritage with Digital Sources Case Overview

Scientific information management policies and information literacy schemes in Greek higher education institutions and libraries

Aronson, E., Wilson, T. D., & Akert, R. M. (2010). Social psychology (7th ed.). Upper Saddle River, NJ: Prentice Hall.

Use of CIM in AEP Enterprise Architecture. Randy Lowe Director, Enterprise Architecture October 24, 2012

Skillsoft Acquires SumTotal: Frequently Asked Questions. October 2014

The development and promotion of Electronic Theses and Dissertations (ETDs) within the UK

P. Belsis, C. Sgouropoulou, K. Sfikas, G. Pantziou, C. Skourlas, J. Varnas

EDIT 576 (2 credits) Mobile Learning and Applications Fall Semester 2015 August 31 October 18, 2015 Fully Online Course

PROCESS USE CASES: USE CASES IDENTIFICATION

OCR LEVEL 3 CAMBRIDGE TECHNICAL

Finding Translations in Scanned Book Collections

Resource Package. Community Action Day

Ontologies vs. classification systems

Regional Bureau for Education in Africa (BREDA)

Jessica Gardner (Principal Investigator) with James Green (Project Manager) Date 13 October 2008 Filename CHARTER Project Plan.

Knowledge based expert systems D H A N A N J A Y K A L B A N D E

Computer Science PhD Program Evaluation Proposal Based on Domain and Non-Domain Characteristics

Senior Research Fellow, Intelligent Mobility Design Centre

Clumps and collection description in the information environment in the UK with particular reference to Scotland

Texas Woman s University Libraries

EDIT 576 DL1 (2 credits) Mobile Learning and Applications Fall Semester 2014 August 25 October 12, 2014 Fully Online Course

EUROPEAN UNIVERSITIES LOOKING FORWARD WITH CONFIDENCE PRAGUE DECLARATION 2009

European Cooperation in the field of Scientific and Technical Research - COST - Brussels, 24 May 2013 COST 024/13

Open Sharing, Global Benefits The OpenCourseWare Consortium

Linking Task: Identifying authors and book titles in verbose queries

Personal Tutoring at Staffordshire University

A LIBRARY STRATEGY FOR SUTTON 2015 TO 2019

Building shared services more bang for your buck

Running Head: STUDENT CENTRIC INTEGRATED TECHNOLOGY

BEYOND THE BLEND. Getting Learning & Development Right. By Charles Jennings

Australian Journal of Basic and Applied Sciences

On the Combined Behavior of Autonomous Resource Management Agents

Library Consortia: Advantages and Disadvantages

e-portfolios in Australian education and training 2008 National Symposium Report

USER ADAPTATION IN E-LEARNING ENVIRONMENTS

Change Your Life. Change The World.

National Survey of Student Engagement (NSSE) Temple University 2016 Results

3. Improving Weather and Emergency Management Messaging: The Tulsa Weather Message Experiment. Arizona State University

OCW Global Conference 2009 MONTERREY, MEXICO BY GARY W. MATKIN DEAN, CONTINUING EDUCATION LARRY COOPERMAN DIRECTOR, UC IRVINE OCW

INSPIRE A NEW GENERATION OF LIFELONG LEARNERS

Graduate Program in Education

eportfolios in Education - Learning Tools or Means of Assessment?

Probability estimates in a scenario tree

City University of Hong Kong Course Syllabus. offered by Department of Architecture and Civil Engineering with effect from Semester A 2017/18

Dr Padraig Walsh. Presentation to CHEA International Seminar, Washington DC, 26 January 2012

Marie Skłodowska-Curie Actions in H2020

E-Learning Using Open Source Software in African Universities

Strategic Planning for Retaining Women in Undergraduate Computing

Python Machine Learning

EPA RESOURCE KIT: EPA RESEARCH Report Series No. 131 BRIDGING THE GAP BETWEEN SCIENCE AND POLICY

Chamilo 2.0: A Second Generation Open Source E-learning and Collaboration Platform

1. Programme title and designation International Management N/A

Top US Tech Talent for the Top China Tech Company

Director, Intelligent Mobility Design Centre

ICDE SCOP Lillehammer, Norway June Open Educational Resources: Deliberations of a Community of Interest

CREATING SHARABLE LEARNING OBJECTS FROM EXISTING DIGITAL COURSE CONTENT

Educational system gaps in Romania. Roberta Mihaela Stanef *, Alina Magdalena Manole

Beyond the Blend: Optimizing the Use of your Learning Technologies. Bryan Chapman, Chapman Alliance

Rachel Edmondson Adult Learner Analyst Jaci Leonard, UIC Analyst

Knowledge for the Future Developments in Higher Education and Research in the Netherlands

Big Fish. Big Fish The Book. Big Fish. The Shooting Script. The Movie

Improving the impact of development projects in Sub-Saharan Africa through increased UK/Brazil cooperation and partnerships Held in Brasilia

The College Board Redesigned SAT Grade 12

EUA Quality Culture: Implementing Bologna Reforms

Transcription:

Utrecht, 2 October 2012 OCLC Research and Europeana Shenghui Wang Research Scientist OCLC Valentine Charles Interoperability Specialist Europeana

OCLC Research is one of the world s leading centers devoted exclusively to the challenges facing libraries and archives in a rapidly changing information technology environment. Our mission is to expand knowledge that advances OCLC s public purposes of furthering access to the world s information and reducing library costs. Since 1978, we have carried out research and made technological advances that enhance the value of library services and improve the productivity of librarians and library users.

OCLC Research: Three roles 1. To act as a community resource for shared Research and Development (R&D) 2. To provide advanced development and technical support within OCLC itself 3. To enhance OCLC s engagement with members and to mobilize the community around shared concerns. http://www.oclc.org/research.html

OCLC Research 3 constituencies

OCLC Research Process PERFORM RESEARCH DEVELOP ARCHITECTURE & STANDARDS CREATE CONSENSUS BUILD COMMUNITY CONVENE EXPERTS IDENTIFY BEST PRACTICE BUILD PROTOTYPES DEVELOP & DEPLOY TRANSFER TECHNOLOGY PRODUCE OUTCOMES Shared Uncertainties Community Solutions

OCLC Research work agenda 1 2 3 4 5 6 Research Information Management Opportunities for libraries in support of research process and outputs Mobilizing Unique Materials Describe, disclose, discover, deliver effectively Metadata Support and Management New models, workflows for network level services Infrastructure and Standards Support Support new architectures and their adoption System-wide Organization Cooperative models of acquiring and managing collections User behavior studies & Synthesis DEFINE FUTURE RESEARCH LIBRARY SERVICES REVITALIZE OUR VALUE PROPOSITION TRANSFORM OUR CURRENT OPERATING PRACTICES AND PROCESSES IMPLEMENT SYSTEMIC CHANGE

OCLC Research Library Partnership 156 Partners at January 2012 50% of ARL 63% of RLUK 25 of top 30 in the World University Rankings

Strength/weakness OCLC Research in Europe Strength: 50 experts dedicated to innovation for the library community globally Applied research, hands-on Little overhead No political/commercial agenda Results are shared and in the open Weaknesses: European partners in the minority, cultural/language differences ORLP partnership weak on the continent; little awareness Image problem (OCLC as vendor; strong association with metadata) OCLC IPR regime with metadata needs clarification

Positioning OCLC Research in Europe Develop a strategy ORLP: too few members in Europe => no impactful cooperation opportunities yet Choose for strategic cooperation with influencial consortia: The European Library, Europeana, Open Planets Foundation (OPF) Make use of the networking strength of existing associations in Europe: LIBER

Positioning OCLC Research in Europe Develop a strategy Encourage European partners to participate in ongoing OCLC Research activities Engage with existing networks in areas where OCLC Research can help make a difference

Outline of an European Research Programme Three collaboration areas: 1. with Europeana: Innovation pilots 2. with OPF: Preservation Health Check pilot 3. with national libraries: Develop strategies for the scalable and sustainable management of digital collections.

Collaboration areas Leading to: 1. Metadata quality services (dedup, enrichment, intelligent clustering, NER and automatic tagging) 2. Health check services (quality assessment, risk assessments) 3. Good practices for the scalable and sustainable management of digital collections and infrastructures 4. Usage data analysis (web site traffic, added value of aggregations, hard data on real user behaviour)

A short introduction on Europeana Europeana is a service that aggregates data from the cultural heritage sector in Europe. libraries, museums, archives and audio-visual archives http://www.europeana.eu/ Provides a portal for users to access that data Metadata, previews and links to source Will make the metadata freely available for anyone to re-use under Creative Commons Zero (CC0) -public domain dedication Enriches data, provides tools Link to data from other sites, embed on wikipedia, API Makes data available as Linked Open Data http://data.europeana.eu/

Context of collaboration between OCLC&Europeana In Europeana: R&D is driven by funded EU projects Aggregation of metadata from heterogeneous collections leads to data quality challenges OCLC Research has extensive experience and provides expertise in metadata quality management. The collaboration serves research objectives which are open-ended.

Innovation pilot 1 Connect as many Europeana objects (books, paintings, etc) to resources of the Virtual International Authority file. Europeana is currently enriching resources that represent places, time periods, concept and persons with selected vocabularies and datasets. http://viaf.org/viaf/60351476

Innovation pilot 1 The Europeana case is quite different from many library-focused ones Persons are referred to in the simple ESE (Europeana Semantic Element) metadata There is no indirect linking, for example, via a reference to an authority number used at a national library. The project would allow an improvement of the enrichment process.

Innovation pilot 2 Connect related Europeana records Detect duplicates or near-duplicates Identify and create semantic links between objects that are related translated copies of the same publication a painting and a photograph of that painting different editions of one book, or a collection of letters that belong to the same person.

Current situation in Europeana A current related items feature already exists based on the enrichment fields what, who, where, when and the similarities in the metadata fields such as dc:title and dc:description. But an improvement of the enrichment process would be needed to make the relations more explicit.

OCLC Research: Two-step approach 1. Rough clustering millions of records into small clusters Clustering 1 million records costs less than one minute Using min-hashes, compression-based similarity measures, parallel computing Using different similarity thresholds for a hierarchical view of objects 2. Categorising clusters and identifying specific semantic links within clusters.

Analysis of the results A selection of clusters have been analysed. Selection of examples Formulation of hypothesis of the cluster generation Comparison of the clusters with the similar items found in the Europeana portal Clusters have been categorised

Clusters overview

Categories of clusters Same objects/duplicates clusters with same objects that have been either: provided more than once to Europeana within the same dataset or via two different channels. duplicated during the Europeana ingestion process (quality issue)

Categories of clusters Parts of one Cultural Heritage Object (CHO) clusters of objects that are structurally composed of other objects/parts.

Categories of clusters Views of the same CHO clusters of objects which have multiple representations. Each representation offers a different view of the CHO. In most of the case metadata is the same. It would be possible to attach all these views to the same record. Derivatives works

Categories of clusters Thematic clusters These clusters are often too small to be considered as a complete collection. They have in common some metadata that relate them to a similar topic, location, event Depending of the focus, the way we define the CHO they could be considered as different views of the same CHO. Collections

Findings On the clusters Clusters are generally good but are limited to close relationships On the data use for the research Quality issues in the data Standard are interpreted differently by providers despite the presence of guidelines Creation of digital object is not always in line with the creation of descriptive metadata Logical structure of cultural heritage object is not always reflected in the metadata.

Next steps (1) Re-use the categories to find ways of automatizing the finding of such categories. some cluster categories may be deduced from common metadata values in given fields Patterns might exist for each type of categories. Categorise the clusters in terms of FRBR entities and relation (like a manifestation of an expression). Experiment with visualization methods.

Next steps (2) Applying the types of relations available in EDM to the types of clusters found during the experiment. dc:subject, edm:isrepresentationof for "aboutness" links (Mona Lisa and a historical picture of Mona Lisa) edm:realizes, which is quite FRBR-related (An item of the Gutenberg s edition realizes the Bible) edm:issimilarto (covering true and cases of derivation) and its sub-properties edm:isderivativeof (for real derivation cases like re-working, extension), edm:incorporated (for inclusion / re-use) and edm:issuccessorof (for "sequels") more general links (dc:relation), general part-whole relation (dcterms:haspart), citation (dcterms:references), direct versioning links (dcterms:hasversion). Findings from the pilot could feed into best practice guides for content providers and thereby improve the quality of the whole Europeana dataset

Everyone is happy OCLC internal data (digital gateway, worldcat, etc) Data services for third parties Methods Clustering and enrichment innovation Results Europeana data model New browsing experiences Mutual benefits

What can we do for you? Titia van der Werf Senior program officer titia.vanderwerf@oclc.org Shenghui Wang Research scientist shenghui.wang@oclc.org Rob Koopman Innovation lab architect rob.koopman@oclc.org

Thank you! Valentine Charles at valentine.charles@kb.nl Shenghui Wang at shenghui.wang@oclc.org