The TEXT-TO-ONTO Ontology Learning Environment

Similar documents
Specification and Evaluation of Machine Translation Toy Systems - Criteria for laboratory assignments

A Domain Ontology Development Environment Using a MRD and Text Corpus

AQUA: An Ontology-Driven Question Answering System

Ontologies vs. classification systems

CREATING SHARABLE LEARNING OBJECTS FROM EXISTING DIGITAL COURSE CONTENT

Applications of memory-based natural language processing

What s in a Step? Toward General, Abstract Representations of Tutoring System Log Data

Linking Task: Identifying authors and book titles in verbose queries

Module 12. Machine Learning. Version 2 CSE IIT, Kharagpur

Automating the E-learning Personalization

Agent-Based Software Engineering

Computerized Adaptive Psychological Testing A Personalisation Perspective

An Open Framework for Integrated Qualification Management Portals

AUTHORING E-LEARNING CONTENT TRENDS AND SOLUTIONS

Visual CP Representation of Knowledge

Rule discovery in Web-based educational systems using Grammar-Based Genetic Programming

Learning Methods for Fuzzy Systems

An Interactive Intelligent Language Tutor Over The Internet

Developing a TT-MCTAG for German with an RCG-based Parser

Collaborative Problem Solving using an Open Modeling Environment

On-Line Data Analytics

Evaluation of Usage Patterns for Web-based Educational Systems using Web Mining

Evaluation of Usage Patterns for Web-based Educational Systems using Web Mining

Exploiting Phrasal Lexica and Additional Morpho-syntactic Language Resources for Statistical Machine Translation with Scarce Training Data

MASTER OF SCIENCE (M.S.) MAJOR IN COMPUTER SCIENCE

Semi-supervised methods of text processing, and an application to medical concept extraction. Yacine Jernite Text-as-Data series September 17.

Notes on The Sciences of the Artificial Adapted from a shorter document written for course (Deciding What to Design) 1

Intra-talker Variation: Audience Design Factors Affecting Lexical Selections

ReinForest: Multi-Domain Dialogue Management Using Hierarchical Policies and Knowledge Ontology

Parsing of part-of-speech tagged Assamese Texts

Efficient Use of Space Over Time Deployment of the MoreSpace Tool

SINGLE DOCUMENT AUTOMATIC TEXT SUMMARIZATION USING TERM FREQUENCY-INVERSE DOCUMENT FREQUENCY (TF-IDF)

A Case Study: News Classification Based on Term Frequency

Predicting Students Performance with SimStudent: Learning Cognitive Skills from Observation

Knowledge-Based - Systems

Enhancing Unlexicalized Parsing Performance using a Wide Coverage Lexicon, Fuzzy Tag-set Mapping, and EM-HMM-based Lexical Probabilities

Online Marking of Essay-type Assignments

Community-oriented Course Authoring to Support Topic-based Student Modeling

DYNAMIC ADAPTIVE HYPERMEDIA SYSTEMS FOR E-LEARNING

THE VERB ARGUMENT BROWSER

A Minimalist Approach to Code-Switching. In the field of linguistics, the topic of bilingualism is a broad one. There are many

Product Feature-based Ratings foropinionsummarization of E-Commerce Feedback Comments

A MULTI-AGENT SYSTEM FOR A DISTANCE SUPPORT IN EDUCATIONAL ROBOTICS

COMPUTATIONAL COMPLEXITY OF LEFT-ASSOCIATIVE GRAMMAR

Development of an IT Curriculum. Dr. Jochen Koubek Humboldt-Universität zu Berlin Technische Universität Berlin 2008

Objectives. Chapter 2: The Representation of Knowledge. Expert Systems: Principles and Programming, Fourth Edition

Content-free collaborative learning modeling using data mining

21 st Century Skills and New Models of Assessment for a Global Workplace

Data Integration through Clustering and Finding Statistical Relations - Validation of Approach

The Verbmobil Semantic Database. Humboldt{Univ. zu Berlin. Computerlinguistik. Abstract

USER ADAPTATION IN E-LEARNING ENVIRONMENTS

Beyond the Pipeline: Discrete Optimization in NLP

A Comparison of Two Text Representations for Sentiment Analysis

Word Segmentation of Off-line Handwritten Documents

ECE-492 SENIOR ADVANCED DESIGN PROJECT

CONCEPT MAPS AS A DEVICE FOR LEARNING DATABASE CONCEPTS

The Strong Minimalist Thesis and Bounded Optimality

Research directions on Semantic Web and education

Rule Learning With Negation: Issues Regarding Effectiveness

Distant Supervised Relation Extraction with Wikipedia and Freebase

Phonological and Phonetic Representations: The Case of Neutralization

SEMAFOR: Frame Argument Resolution with Log-Linear Models

Some Principles of Automated Natural Language Information Extraction

Lessons from a Massive Open Online Course (MOOC) on Natural Language Processing for Digital Humanities

Software Maintenance

Knowledge Elicitation Tool Classification. Janet E. Burge. Artificial Intelligence Research Group. Worcester Polytechnic Institute

Getting the Story Right: Making Computer-Generated Stories More Entertaining

Cross Language Information Retrieval

Operational Knowledge Management: a way to manage competence

Managing Experience for Process Improvement in Manufacturing

FUZZY EXPERT. Dr. Kasim M. Al-Aubidy. Philadelphia University. Computer Eng. Dept February 2002 University of Damascus-Syria

P. Belsis, C. Sgouropoulou, K. Sfikas, G. Pantziou, C. Skourlas, J. Varnas

Modeling full form lexica for Arabic

A Corpus-based Evaluation of a Domain-specific Text to Knowledge Mapping Prototype

The Language of Football England vs. Germany (working title) by Elmar Thalhammer. Abstract

have to be modeled) or isolated words. Output of the system is a grapheme-tophoneme conversion system which takes as its input the spelling of words,

Using Virtual Manipulatives to Support Teaching and Learning Mathematics

Natural Language Processing. George Konidaris

Learning Structural Correspondences Across Different Linguistic Domains with Synchronous Neural Language Models

Essentials of Rapid elearning (REL) Design

Different Requirements Gathering Techniques and Issues. Javaria Mushtaq

Modeling function word errors in DNN-HMM based LVCSR systems

Patterns for Adaptive Web-based Educational Systems

The MEANING Multilingual Central Repository

Specification of the Verity Learning Companion and Self-Assessment Tool

Introduction of Open-Source e-learning Environment and Resources: A Novel Approach for Secondary Schools in Tanzania

A 3D SIMULATION GAME TO PRESENT CURTAIN WALL SYSTEMS IN ARCHITECTURAL EDUCATION

Guide to Teaching Computer Science

Designing a Rubric to Assess the Modelling Phase of Student Design Projects in Upper Year Engineering Courses

Language Acquisition Fall 2010/Winter Lexical Categories. Afra Alishahi, Heiner Drenhaus

Data Fusion Models in WSNs: Comparison and Analysis

Introduction to HPSG. Introduction. Historical Overview. The HPSG architecture. Signature. Linguistic Objects. Descriptions.

BYLINE [Heng Ji, Computer Science Department, New York University,

COMPUTER-AIDED DESIGN TOOLS THAT ADAPT

A Framework for Customizable Generation of Hypertext Presentations

A Coding System for Dynamic Topic Analysis: A Computer-Mediated Discourse Analysis Technique

1. Introduction. 2. The OMBI database editor

Python Machine Learning

Twitter Sentiment Classification on Sanders Data using Hybrid Approach

Running Head: STUDENT CENTRIC INTEGRATED TECHNOLOGY

Transcription:

The TEXT-TO-ONTO Ontology Learning Environment Alexander Maedche and Steffen Staab Institute AIFB, University of Karlsruhe, 76128 Karlsruhe, Germany fmaedche,staabg@aifb.uni-karlsruhe.de http://www.aifb.uni-karlsruhe.de/wbs Abstract Ontologies have become an important means for structuring information and information systems and, hence, important in knowledge as well as in software engineering. However, there remains the problem of engineering large and adequate ontologies within short time frames in order to keep costs low. For this purpose, we present the TEXT-TO-ONTO Ontology Learning Environment, which is based on a general architecture for discovering conceptual structures and engineering ontologies from text. Our Ontology Learning Environment supports as well the acquisition of conceptual structures as mapping linguistic resources to the acquired structures. 1 Introduction Ontologies 1 have shown their usefulness in application areas such as intelligent information integration, information brokering and natural-language processing, to name but a few. However, their wide-spread usage is still hindered by ontology engineering being rather time-consuming and, hence, expensive. Our system TEXT-TO-ONTO tries to overcome this knowledge acquisition bottleneck through learning and discovering conceptual structures from texts. Natural language texts exhibit morphological, syntactic, semantic, pragmatic and conceptual constraints that interact in order to convey a particular meaning to the reader. Thus, the text transports information to the reader and the reader embeds this information into his background knowledge. Through the understanding of the text data is associated with conceptual structures and new conceptual structures are learned from the interacting constraints given through language. TEXT- TO-ONTO exploits the interacting constraints on the various language levels (from morphology to pragmatics and background knowledge) in order to discover new concepts and stipulate relationships between concepts. The system follows an balanced cooperation approach described in [4], i.e. each modeling task can be done by the user or by a learning tool of the system. This balanced interaction of system and user contributes to the preparation of background knowledge, enhancing the domain knowledge (ontology) and to inspecting the learned knowledge. 1 We restrict our attention in this paper to domain ontologies that describe a particular small model of of the world as relevant to applications, in contrast to top-level ontologies and representational ontologies that aim at the description of generally applicable conceptual structures and meta-structures, respectively, and that are mostly based on philosophical and logical point of views rather than focused on applications.

2 TEXT-TO-ONTO Ontology Learning Environment The process of semi-automatic ontology learning from text is embedded in an architecture that comprises several core features described as a kind of pipeline in the following. (cf. the overall schema in Figure 1). Nevertheless, the reader may bear in mind that the overall development of ontologies remains a cyclic process (cf. [1]). In fact, we provide a broad set of interactions such that the engineer may start with primitive methods first. These methods require very little or even no background knowledge, but they may also be restricted to return only simple hints, like term frequencies. While the knowledge model matures during the semi-automatic learning process, the engineer may turn towards more advanced and more knowledge-intensive algorithms, such as our mechanism for discovering generalized non-taxonomic relations described in [2]. natural language texts feed Text & Processing Management (XML tagged) text &selected algorithms Learning & Discovering Algorithms proposes selected text & preprocessing method XMLtagged text against manual model Evaluation Text Processing Server Ontology references models OntoEdit Ontology Modeling Environment Stemming POS tagging chunk parsing Information Extraction... domain lexicon models Lexical DB Figure1. Architecture of the Ontology Learning Environment A comprehensive architecture lays the foundation for acquiring domain ontologies and linguistic resources ([3]). The main components of the architecture are the (i) Text & Processing Management, the (ii) Text Processing Server, (iii) a Lexical Database and Domain Lexicon, a (iv) Learning Module and the (v) Ontology Engineering Environment OntoEdit: Text & Processing Management Component. The ontology engineer the Text & Processing Management Component to select domain texts exploited in the further discovery process. She chooses among a set of text (pre-)processing methods available on the Text Processing Server and among a set of algorithms available at the Learning &

Discovering component. The former module returns text that is annotated by XML and this XML-tagged text is fed to the Learning & Discovering component. Text Processing Server. The Text Processing Server may comprise a broad set of different methods. In our case, it contains a shallow text processor based on the core system SMES (Saarbrücken Message Extraction System) [5]. SMES is a system that performs syntactic analysis on natural language documents. In general, the Text Processing Server is organized in modules, such as a tokenizer, morphological and lexical processing, and chunk parsing that use lexical resources to produce mixed syntactic/semantic information. The results of text processing are stored in annotations using XML-tagged text. Figure2. The TEXT-TO-ONTO Ontology Learning Environment Lexical DB & Domain Lexicon. Syntactic processing relies on lexical knowledge. In our system, SMES accesses a lexical database with more than 120.000 stem entries and more than 12,000 subcategorization frames that are used for lexical analysis and chunk parsing. The domain-specific part of the lexicon (abbreviated domain lexicon ; cf. left lower part of Figure 2) associates word stems with concepts available in the concept taxonomy. Hence, it links syntactic information with semantic knowledge that may be further refined in the ontology.

Learning & Discovering component. The Learning & Discovering component various discovering methods on the annotated texts, e.g. term extraction methods for concept acquisition. Our scenario for discovering non-taxonomic relations the learning algorithm for discovering generalized association rules described in [2]. Conceptual structures that exist at learning time (e.g. a concept taxonomy) may be incorporated into the learning algorithms as background knowledge. The evaluation of the applied algorithms such as described in [2] is performed in a submodule based on the results of the learning algorithm. Ontology Engineering Environment. The Ontology Engineering Environment ONTOEDIT, which is a submodule of the Ontology Learning Environment TEXT-TO-ONTO supports the ontology engineer in semi-automatically adding newly discovered conceptual structures to the ontology. A comprehensive description of the ontology engineering system ONTOEDIT and the underlying methodology is given in [8,9]. The screenshot depicted in Figure 2 shows on the left side the object-model backbone of an ontology. In addition to core capabilities for structuring the ontology, the engineering environment provides some additional features for the purpose of documentation, maintenance, and ontology exchange. OntoEdit internally stores modeled ontologies using an XML serialization. 3 Discovering Non-Taxonomic Conceptual Relations from Text using TEXT-TO-ONTO In [2] we describe our approach for discovering non-taxonomic conceptual relations from text faciliting ontology engineering. Building on the user-modeled taxonomic part of the ontology, our approach analyzes domain-specific texts. It shallow text processing methods to identify linguistically related pairs of words, which are mapped to concepts using the domain lexicon. An algorithm for discovering generalized association rules [6] analyzes statistical information about the linguistic output. Thereby, it the background knowledge from the taxonomy in order to propose relations at the appropriate level of abstraction. For instance, the linguistic processing may find that the word costs frequently co-occurs with each of the words hotel, guest house, and youth hostel in sentences such as (1). (1) Costs at the youth hostel amount to $ 20 per night. From this statistical linguistic data our approach derives correlations at the conceptual level, viz. between the concept Costs and the concepts, Hotel, Guest House, and Youth Hostel. The learning algorithm determines support and confidence measures for the relationships between these three pairs, as well as for relationships at higher levels of abstraction, such as between Accommodation and Costs. In a final step, the algorithm determines the level of abstraction most suited to describe the conceptual relationships by pruning appearingly less adequate ones. Here, the relation between Accommodation and Costs may be proposed for inclusion in the ontology. Results of the learning algorithm are visualized as a graph such as depicted on the right side of Figure 2.

4 Conclusion We have presented an approach and an implemented system towards learning ontologies from text. Core idea of this approach is to support the knowledge engineer using an balanced cooperative modeling paradigm. We have to emphasize that we do not consider fully automatic ontology acquisition from text as realistic, so we support the knowledge engineer as much as possible with graphical user interfaces and visualization of discovered conceptual structures. The system has been evaluated and applied for building domain ontologies in the tourism domain [7] and the insurance domain. References 1. A. Maedche, H.-P. Schnurr, S. Staab, and R. Studer. Representation language-neutral modeling of ontologies. In U. Frank, editor, Proceedings of the German Workshop Modellierung- 2000. Koblenz, Germany, April, 5-7, 2000. Fölbach-Verlag, 2000. 2. A. Maedche and S.Staab. Discovering conceptual relations from text. In W. Horn (ed.): ECAI 2000. Proceedings of the 14th European Conference on Artificial Intelligence. IOS Press, Amsterdam, 2000. 3. A. Maedche and S. Staab. Semi-automatic engineering of ontologies from text. In Proceedings of the 12th Internal Conference on Software and Knowledge Engineering. Chicago, USA, July, 5-7, 2000. KSI, 2000. 4. K. Morik. Balanced cooperative modeling. Machine Learning, 11:217 235, 1993. 5. G. Neumann, R. Backofen, J. Baur, M. Becker, and C. Braun. An information extraction core system for real world german text processing. In ANLP 97 Proceedings of the Conference on Applied Natural Language Processing, pages 208 215, Washington, USA, 1997. 6. R. Srikant and R. Agrawal. Mining generalized association rules. In Proc. of VLDB 95, pages 407 419, 1995. 7. S. Staab, C. Braun, I. Bruder, A. Düsterhöft, A. Heuer, M. Klettke, G. Neumann, B. Prager, J. Pretzel, H.-P. Schnurr, R. Studer, H. Uszkoreit, and B. Wrenger. GETESS searching the web exploiting german texts. In CIA 99 Proceedings of the 3rd Workshop on Cooperative Information Agents, LNAI 1652, pages 113 124, Berlin, 1999. Springer. 8. S. Staab and A. Maedche. Axioms are Objects, too - Ontology Engineering beyond the modeling of Concepts and Relations. Technical Report 400, Institute AIFB, Karlsruhe University, 2000. 9. S. Staab and A. Maedche. Ontology engineering beyond the modeling of concepts and relations. In A. Gomez-Perez (ed.): Proceedings of the ECAI 2000 Workshop on Application of Ontologies and Problem-Solving Methods. IOS Press, Amsterdam, 2000.