Available online at ScienceDirect. Athia Saelan*, Ayu Purwarianti

Similar documents
Ensemble Technique Utilization for Indonesian Dependency Parser

ScienceDirect. Malayalam question answering system

Parsing of part-of-speech tagged Assamese Texts

SINGLE DOCUMENT AUTOMATIC TEXT SUMMARIZATION USING TERM FREQUENCY-INVERSE DOCUMENT FREQUENCY (TF-IDF)

Enhancing Unlexicalized Parsing Performance using a Wide Coverage Lexicon, Fuzzy Tag-set Mapping, and EM-HMM-based Lexical Probabilities

CS 598 Natural Language Processing

AQUA: An Ontology-Driven Question Answering System

Basic Parsing with Context-Free Grammars. Some slides adapted from Julia Hirschberg and Dan Jurafsky 1

Analysis of Probabilistic Parsing in NLP

ScienceDirect. A Framework for Clustering Cardiac Patient s Records Using Unsupervised Learning Techniques

Developing a TT-MCTAG for German with an RCG-based Parser

Linking Task: Identifying authors and book titles in verbose queries

By. Candra Pantura Panlaysia Dr. CH. Evy Tri Widyahening, S.S., M.Hum Slamet Riyadi University Surakarta ABSTRACT

Natural Language Processing. George Konidaris

Syntax Parsing 1. Grammars and parsing 2. Top-down and bottom-up parsing 3. Chart parsers 4. Bottom-up chart parsing 5. The Earley Algorithm

Procedia - Social and Behavioral Sciences 154 ( 2014 )

Grammars & Parsing, Part 1:

11/29/2010. Statistical Parsing. Statistical Parsing. Simple PCFG for ATIS English. Syntactic Disambiguation

Prediction of Maximal Projection for Semantic Role Labeling

Procedia - Social and Behavioral Sciences 237 ( 2017 )

Product Feature-based Ratings foropinionsummarization of E-Commerce Feedback Comments

BANGLA TO ENGLISH TEXT CONVERSION USING OPENNLP TOOLS

Specification and Evaluation of Machine Translation Toy Systems - Criteria for laboratory assignments

Procedia - Social and Behavioral Sciences 141 ( 2014 ) WCLTA Using Corpus Linguistics in the Development of Writing

Developing True/False Test Sheet Generating System with Diagnosing Basic Cognitive Ability

A heuristic framework for pivot-based bilingual dictionary induction

Some Principles of Automated Natural Language Information Extraction

THE INFLUENCE OF MIND MAPPING IN TEACHING READING COMPREHENSION TO THE EIGHTH GRADE STUDENTS OF SMP MUHAMMADIYAH 1 RAWA BENING

Procedia - Social and Behavioral Sciences 143 ( 2014 ) CY-ICER Teacher intervention in the process of L2 writing acquisition

Derivational: Inflectional: In a fit of rage the soldiers attacked them both that week, but lost the fight.

A Graph Based Authorship Identification Approach

Applications of memory-based natural language processing

Semi-supervised methods of text processing, and an application to medical concept extraction. Yacine Jernite Text-as-Data series September 17.

UNIVERSITY OF OSLO Department of Informatics. Dialog Act Recognition using Dependency Features. Master s thesis. Sindre Wetjen

Using interactive simulation-based learning objects in introductory course of programming

Chamilo 2.0: A Second Generation Open Source E-learning and Collaboration Platform

Compositional Semantics

Abdul Rahman Chik a*, Tg. Ainul Farha Tg. Abdul Rahman b

Chinese Language Parsing with Maximum-Entropy-Inspired Parser

Target Language Preposition Selection an Experiment with Transformation-Based Learning and Aligned Bilingual Data

Distant Supervised Relation Extraction with Wikipedia and Freebase

ScienceDirect. Noorminshah A Iahad a *, Marva Mirabolghasemi a, Noorfa Haszlinna Mustaffa a, Muhammad Shafie Abd. Latif a, Yahya Buntat b

Using dialogue context to improve parsing performance in dialogue systems

knarrator: A Model For Authors To Simplify Authoring Process Using Natural Language Processing To Portuguese

Chunk Parsing for Base Noun Phrases using Regular Expressions. Let s first let the variable s0 be the sentence tree of the first sentence.

A Minimalist Approach to Code-Switching. In the field of linguistics, the topic of bilingualism is a broad one. There are many

IMPROVING STUDENTS READING COMPREHENSION USING FISHBONE DIAGRAM (A

Analysis of Students Incorrect Answer on Two- Dimensional Shape Lesson Unit of the Third- Grade of a Primary School

DEVELOPING A PROTOTYPE OF SUPPLEMENTARY MATERIAL FOR VOCABULARY FOR THE THIRD GRADERS OF ELEMENTARY SCHOOLS

Procedia - Social and Behavioral Sciences 180 ( 2015 )

Intra-talker Variation: Audience Design Factors Affecting Lexical Selections

An Interactive Intelligent Language Tutor Over The Internet

CONCEPT MAPS AS A DEVICE FOR LEARNING DATABASE CONCEPTS

The Smart/Empire TIPSTER IR System

LING 329 : MORPHOLOGY

Interactive Corpus Annotation of Anaphor Using NLP Algorithms

POS tagging of Chinese Buddhist texts using Recurrent Neural Networks

Twitter Sentiment Classification on Sanders Data using Hybrid Approach

Specifying Logic Programs in Controlled Natural Language

Context Free Grammars. Many slides from Michael Collins

Derivational and Inflectional Morphemes in Pak-Pak Language

THE VERB ARGUMENT BROWSER

Towards a Machine-Learning Architecture for Lexical Functional Grammar Parsing. Grzegorz Chrupa la

Machine Learning from Garden Path Sentences: The Application of Computational Linguistics

Learning Methods in Multilingual Speech Recognition

Linguistic Variation across Sports Category of Press Reportage from British Newspapers: a Diachronic Multidimensional Analysis

Approaches to control phenomena handout Obligatory control and morphological case: Icelandic and Basque

Heuristic Sample Selection to Minimize Reference Standard Training Set for a Part-Of-Speech Tagger

Visual CP Representation of Knowledge

Is M-learning versus E-learning or are they supporting each other?

Using Semantic Relations to Refine Coreference Decisions

EdIt: A Broad-Coverage Grammar Checker Using Pattern Grammar

Word Segmentation of Off-line Handwritten Documents

Radius STEM Readiness TM

INCREASING STUDENTS ABILITY IN WRITING OF RECOUNT TEXT THROUGH PEER CORRECTION

Introduction to Text Mining

Development of a scoring system to assess mind maps

2/15/13. POS Tagging Problem. Part-of-Speech Tagging. Example English Part-of-Speech Tagsets. More Details of the Problem. Typical Problem Cases

Procedia - Social and Behavioral Sciences 98 ( 2014 ) International Conference on Current Trends in ELT

Modeling user preferences and norms in context-aware systems

LEGO training. An educational program for vocational professions

Accurate Unlexicalized Parsing for Modern Hebrew

Vocabulary Usage and Intelligibility in Learner Language

Procedia - Social and Behavioral Sciences 191 ( 2015 ) WCES Why Do Students Choose To Study Information And Communications Technology?

The 9 th International Scientific Conference elearning and software for Education Bucharest, April 25-26, / X

Character Stream Parsing of Mixed-lingual Text

LQVSumm: A Corpus of Linguistic Quality Violations in Multi-Document Summarization

SURAT PERMOHONAN PUBLIKASI

Project in the framework of the AIM-WEST project Annotation of MWEs for translation

Beyond the Pipeline: Discrete Optimization in NLP

Jurnal Pendidikan IPA Indonesia

Memory-based grammatical error correction

A Ruled-Based Part of Speech (RPOS) Tagger for Malay Text Articles

Modeling Attachment Decisions with a Probabilistic Parser: The Case of Head Final Structures

Indian Institute of Technology, Kanpur

Constraining X-Bar: Theta Theory

Written by: YULI AMRIA (RRA1B210085) ABSTRACT. Key words: ability, possessive pronouns, and possessive adjectives INTRODUCTION

Netpix: A Method of Feature Selection Leading. to Accurate Sentiment-Based Classification Models

IMPROVING SPEAKING SKILL OF THE TENTH GRADE STUDENTS OF SMK 17 AGUSTUS 1945 MUNCAR THROUGH DIRECT PRACTICE WITH THE NATIVE SPEAKER

Transcription:

Available online at www.sciencedirect.com ScienceDirect Procedia Technology 11 ( 2013 ) 1163 1169 The 4th International Conference on Electrical Engineering and Informatics (ICEEI 2013) Generating Mind Map from Indonesian Text using Natural Language Processing Tools Athia Saelan*, Ayu Purwarianti Bandung Insitute of Technology, Jl. Ganesha no. 10, Bandung 40132, Indonesia Abstract People usually make mind map by drawing each object and its relation with other object from scratch. This research aims to make the process easier by generating mind map from text (here is Indonesian text) and providing mind map editor to manipulate the object and relation set. To build such tool, we employ available Indonesian NLP (Natural Language Processing) tools. There are three components needed: semantic net generator, mind map visualization and interaction handler. In the semantic net generator, the resulted first order logic (FOL) resulted by the semantic analyzer is changed into semantic net which is represented by list of objects and list of relations. The resulted semantic net is then visualized by using combination method of radial and layering drawing. The interaction is available for editing the object and the relation. The tool was then evaluated by 2 experiment set: testing the semantic net generation and testing the resulted visualization. The semantic net generation was evaluated by using the valid input text, while the visualization was evaluated by user acceptance test. As the result, although the semantic net generation (from FOL) is a correct one, but the whole semantic analyzer for Indonesian text still has a low accuracy especially for complex sentence. As for the user acceptance test, the automatic generation still gives unimportant object which should be corrected by the interaction. 2013 The Authors. Published by Elsevier B.V. Ltd. Open access under CC BY-NC-ND license. Selection and peer-review under responsibility of of the the Faculty of of Information Science & & Technology, Universiti Kebangsaan Malaysia. Keywords : mind map; Indonesian; semantic; visualization; first order logic * Corresponding author. E-mail address: athiasaelan@yahoo.co.id 2212-0173 2013 The Authors. Published by Elsevier Ltd. Open access under CC BY-NC-ND license. Selection and peer-review under responsibility of the Faculty of Information Science & Technology, Universiti Kebangsaan Malaysia. doi: 10.1016/j.protcy.2013.12.309

1164 Athia Saelan and Ayu Purwarianti / Procedia Technology 11 ( 2013 ) 1163 1169 1. Introduction Nowadays, information is easy to be collected, especially by the internet technology development. People can gain their needed information just in second. Moreover, now books are available in digital form. Unfortunately, most information is available in text which is not easy to be caught by human mind. On the other hand, picture representing information is easy to be caught by human mind. Mind map is a concept representing this opinion which is effective to make human mind understand and remember information. By picture and color combination, mind map can represent fact and thought which is compatible with how the brain works[1]. Even though people understand mind map s benefit, still not many people make use of mind map since it is troublesome and wasting time, especially when the text size is large. 2. Related Works Descriptions below are about related mind map generator for English, namely M 2 Gen[2] and Actor-based Mind- Maps Assembler[3]. 2.1. M2Gen[2] There are three main components in M 2 Gen: natural language processing, mind map conversion and mind map view manager. The architecture is shown in Fig. 1. In the morphological analysis, the process is to analyze each word into its lemma and affix along with its POS tag. In the parsing, the grammar is written in CFG form while the algorithm is the top down chart parsing. Not all parse tree are used in the semantic analyzer, therefore the syntax analysis filter the parse tree result into the needed information. Next process is the semantic analyzer which includes discourse analysis, word sense disambiguation and text meaning representation. The resulted semantic model is then converted into mind map figure. The example of generated mind map figure is shown in Fig. 2 Fig. 1. Process Architecture in M2Gen [2] Fig 2. Mind Map Generated by M2Gen [2] 2.2. Actor-based Mind-Maps Assembler[3] This research assumes that the main concept of a text is the actor. Sentence subject and object become the concept, while the sentence predicate is the concept relation. The adjective becomes the sub-concept. The whole process is shown in Fig. 3. preprocess is a component to get the syntactical and semantic information from an input text. The result is then processed by pronoun resolution to search the sense of a pronoun. Next process is to take the subject-verb-object from each sentence along with relation between sentences. The result is represented in semantic network. Last, the co-reference resolution component joins the same concept among sentences. The mind map example is shown in Fig. 4.

Athia Saelan and Ayu Purwarianti / Procedia Technology 11 ( 2013 ) 1163 1169 1165 Fig. 3. Process Architecture in Actor-based Mind Maps Assembler[3] Fig 4. Mind Map resulted by Actor-based Mind Maps Assembler[3] 3. Indonesian Mind Map Generator 3.1. The Semantic Representation Here, we tried to analyze the previous researched mind map representation [2][3] into Indonesian text as below: Kartini lahir di Jepara. Jepara berada di Jawa Tengah. Kartini lahir pada tanggal 21 April 1879. Beliau adalah tokoh. Kartini mendirikan sekolah perempuan pada tahun 1913. Sekolah itu bernama Sekolah Kartini. Kartini menulis surat-surat. In M2Gen, all words become the node which in Actor-based Mind-Maps Assembler, the noun becomes node and verb becomes relation. The mind map representation in M2Gen is shown in Fig. 5 and the mind map representation in Actor-based Mind-Maps Assembler is shown in Fig. 6. Basically, both methods are similar, but the first method is more flexible because it is easier to represent a sentence having no object or sentence with a lot of objects. For example, for sentences: Kartini lahir di Jepara and Kartini lahir pada tanggal 21 April 1879, the second method will gives two branches while the first method will results one branch. We selected the semantic text representation proposed by the first method which can be seen as a type of semantic network. As for the internal data structure for this representation, we chose to use a list of objects and a list of object relations. For example, the objects are Kartini, lahir, Jepara, and the relations are Kartini lahir and lahir Jepara. 3.2. Alternative Method to Generate the Semantic Representation The important consideration here is that the process appropriateness with the needed semantic representation and the availability of Indonesian Natural Language Processing (NLP) tools. In the M2Gen, the method is to use all important NLP tools, ranging from the morphologically analysis to semantic analyzer. In the Actor-based Mind- Maps Assembler, the employed method is the syntactical analysis. Another alternative is the name entity extraction and relation extraction. By this, there are 3 alternative methods: using semantic analysis, syntactical analysis and named entity-relation extraction.

1166 Athia Saelan and Ayu Purwarianti / Procedia Technology 11 ( 2013 ) 1163 1169 Fig. 5. Mind Map for the Indonesian Sentence such as in M2Gen Fig. 6. Mind Map for the Indonesian Sentence such as in Actor-based Mind Maps Assembler 3.2.1. Syntactical Analysis There are two common syntactical analysis or parsing methods: constituency based and dependency based. Since the constituency based gives result of phrase list and the dependency based gives result of word relation list, we concluded that the dependency based is more appropriate for the mind mapping generator with easier process to transform the dependency parse tree into the list of objects and list of object relations. To evaluate it further, we analyzed the parse tree for each sample sentence mentioned before. Unfortunately, the available Indonesian dependency parser is not equipped with its reference resolution which makes it difficult to see the sentence relation. 3.2.2. Semantic Analysis In M 2 Gen, the semantic analysis gives result of semantic network. For Indonesian language, the available semantic analysis gives result of FOL[4]. This semantic analysis already has its anaphora resolution. The FOL resulted for the sample Indonesian sentence is shown in Table 1. The FOL can then be transformed into the semantic networks by defining mapping rules between terms in the FOL and object/relation in the semantic networks. Table 1. Examples of FOL for Indonesian Sentence. Sentence Kartini lahir di Jepara. Jepara berada di Jawa Tengah. FOL?X?Y?b event(x,lahir) ^ Actor(X,Y) ^ Location(X,b) ^ place(b,jepara) ^ object(y,kartini)?x?y?b event(x,berada) ^ Actor(X,Y) ^ Location(X,b) ^ place(b,jawa+tengah) ^ object(y,jepara) 3.2.3. Named Entity and Relation Extraction The idea is to employ named entity recognizer (NER) and relation extraction (RE). The objects in the semantic representation are the named entity resulted by the NER and the relation between objects are the relation resulted by RE. The relation itself can be extracted from the syntactical parse tree or be trained in a machine learning approach which identifies the relation between named entities. The main problem here is the unavailability of Indonesian relation extraction. 3.3. Selected Method to Generate the Semantic Representation Based on the appropriateness and component availability, we chose to employ the semantic analysis as the approach. The process in the available semantic analysis includes Indonesian POS Tagger[5], Indonesian PCFG Parser[6], and Indonesian FOL Semantic Analyzer [4]. As an example for further process, the sentence Kartini lahir di Jepara will be transformed into FOL of?x?y?z event(x, lahir) ^ actor(x,y) ^ location(x,z) ^ place(z,jepara) ^ object(y,kartini) which can be illustrated as shown in Fig. 7.

Athia Saelan and Ayu Purwarianti / Procedia Technology 11 ( 2013 ) 1163 1169 1167 The text representation resulted by the FOL semantic analysis should be processed further in order to gain the needed representation for the mind map visualization. The illustration of needed representation is shown in Fig. 8. The process to transform the FOL into Object and Object Relation is by changing the variable into its related value. As for the relation between sentences, the semantic network for one sentence is then joined with the semantic network for other sentences. Unfortunately, the available Indonesian NLP Tools still give incorrect result due to several things: 1. The error resulted by the Indonesian POS Tagger which then is repaired by adding the lexicon and corpus. 2. The error resulted by the Indonesian PCFG Parser which mainly not working on complex sentences. To fix the error, the required process is to enhance the Indonesian corpus which should be analyzed further. 3. The error resulted by the Indonesian FOL Semantic Analyzer which should be fixed by adding its semantic rule Fig. 8. Illustration of Required Representation for Kartini lahir di Jepara Fig. 7. Illustration of FOL Result for Kartini lahir di Jepara Fig. 9. Example of Kartini lahir di Jepara Processed by the Word Order Based Process To handle the incomplete process caused by the incorrect result of Indonesian NLP Tools, here we added a process to employ the word order. For example, if the sentence Kartini lahir di Jepara is failed to be processed by the Indonesian NLP then it will be processed based on its word order and gives result such as shown in Fig. 9. 3.4. Mind Map Visualization Mind map visualization should follow rules such as drawing main idea in the figure center with branches related with the center. This condition is suitable with radial drawing method which the root as the drawing center can be the main idea while the branches are the entities related with the main idea. This method still should be modified because the mind map structure resulted from the text is not always a tree. The example of graph of entities resulted from the previous Indonesian sentences is shown in Fig. 10. The radial drawing method is a variation of layering drawing method [7]. By this, the hierarchical approach which applies layering system can be modified into radial. The illustration is shown in Fig. 11. By this, the entity in the center or the first layer is the main idea of the mind map. Here the main idea is the entity with most relations. The relation of the main idea will placed on the second layer while other entities related with the main idea will be placed in the third layer and so on. After all entities have its layer position, the layer position will be modified into center-distance. 3.5. Interaction in the Mind Map Editor The mind map representation is usually resulted manually from human mind and the mind map resulted from the automatic generation might give unwanted results, therefore the mind map generator should be completed by its editor. For the mind map editing, we analyzed that there are several things should be handled in the editor: 1. Position of Entity 2. Content of Entity and Relation 3. Structure including the addition or deletion of entity or relation.

1168 Athia Saelan and Ayu Purwarianti / Procedia Technology 11 ( 2013 ) 1163 1169 4. Color 5. Image addition 6. Size 7. Curvature of Relation Line Fig. 10. Graph of Entities of Kartini lahir di Jepara and Kartini tinggal di Jepara Fig. 11. Radial Tree Drawing as Variation of Layering Tree Drawing 4. Experiment 4.1. Experimental Aim and Data There are two aims of the experiments: (1) to test the FOL-semantic network transformation; (2) to evaluate the generated mind mapping figure. For the first experiment, we used two types of text: the original text taken from primary school book and the modified text (non-complex sentences). Both texts are then inputted into the mind map generator and the resulted semantic networks are evaluated. There are 10 texts taken from social book 5 th grade primary school. For the second experiment, there are 5 respondents were surveyed related with two types of drawing result: the original automatic mind map generator and the modified mind map figure. 4.2. Experimental Result 4.2.1. Experiment to Evaluate the FOL-Semantic Network Transformation The original text consists of 34 sentences, while the modified text consists of 59 sentences. Here, the complex sentences are modified into simple sentences. Even though the sentences are modified into simple sentences, still not all texts can be processed since the limited rules and training data available in the Indonesian NLP Tools. For the original text, there are only 16 sentences can be processed from 34 sentences which gives accuracy of around 47%. For the modified text, there are 46 sentences can be processed from 59 sentences which gives accuracy of 77%. Mainly the error is caused by the syntactical parser, while the error caused by the transformation is only 1 sentence from both texts. 4.2.2. Experiment to Evaluate Mind Map Drawing Result Here, we asked 5 respondents to evaluate the legibility of the resulted mind map figure. There are two types of figure: the original automatic one, resulted by the system and the modified one. As the result, there are 48% of respondents said that the original drawing is readable and easy to understand. As for the modified one, there are 96% respondents said the drawing is readable and easy to understand. The existence of unimportant words in the drawing makes the drawing difficult to understand. Other reasons for the difficulty are related with the color and the main idea focus.

Athia Saelan and Ayu Purwarianti / Procedia Technology 11 ( 2013 ) 1163 1169 1169 5. Conclusion Our mind map generator consists of three components: semantic net generator, mind map visualization and interaction handler. In the semantic network generator, based on the availability of Indonesian NLP tools, we chose to use the available FOL semantic analyzer and added a transformation module to change it into a representation of semantic network. As for the mind map visualization, we used the radial drawing approach. For the root, the node with most relations is chosen as the drawing center. All relations and other objects are connected with the drawing center. The structure is then mapped into radial structure. For the interaction, we defined several things should be handled in the mind map editor. In the experiments, we evaluated the semantic network generator and the mind map visualization. In the semantic network generator, the accuracy achieved was 77% with 1 incorrect sentence (among 59 sentences) resulted by the semantic network transformation. In the mind map visualization, using the original automatic drawing result, there were 48% respondents agree that the drawing result is readable and can be understood. But using the modified drawing result, the result was increased into 96% respondents. The main reason of the illegibility is that the existence of unimportant node in the drawing result. References [1] Buzan, Tony, Buku Pintar Mind Map, Jakarta, PT. Gramedia Pustaka Utama; 2005. [2] Abdeen, M., El-Sahan, R., Ismaeil, A., El-Harouny, S., Shalaby, M., Yagoub, M. C. E. Direct Automatic Generation of Mind Maps from Text with M2Gen. In Proceeding of IEEE Toronto International Conference Science and Technology for Humanity, 2009; p. 95-99, Toronto, Canada. [3] Brucks, C., Schommer, C. Assembling Actor-based Mind-Maps from Text Streams. Master Thesis, University of Luxembourg, Department of Computer Science and Communication; 2008. [4] Ferdian, F., Purwarianti, A. Implementation of Semantic Analyzer in Indonesian Text-Understanding Evaluation System. In Proceedings of IEEE International Conference on Computational Intelligence and Cybernetics, Bali; 2012. [5] Wicaksono, A. F., Purwarianti, A. HMM based Part of Speech Tagger for Bahasa Indonesia. In Fourth International MALINDO Workshop, Jakarta; 2010. [6] Afif, I. Studi Perbandingan Kinerja Algoritma CYK dan Algoritma Earley pada Pengurai Kalimat Menggunakan Probablistic Context Free Grammar Bahasa Indonesia Sederhana. Final Project of Undergraduate, Bandung Institute of Technology; 2011. [7] Batista, G., Eader, P., Tamassia, R., Tollis, I. Graph Drawing: Algorithm for the Visualization of Graphs, Prentice Hall; 1999.