IJCNLP The 6th Workshop on Asian Language Resources (ALR 6)

Similar documents
Named Entity Recognition: A Survey for the Indian Languages

Survey of Named Entity Recognition Systems with respect to Indian and Foreign Languages

Indian Institute of Technology, Kanpur

MULTILINGUAL INFORMATION ACCESS IN DIGITAL LIBRARY

Trend Survey on Japanese Natural Language Processing Studies over the Last Decade

Product Feature-based Ratings foropinionsummarization of E-Commerce Feedback Comments

Grammar Extraction from Treebanks for Hindi and Telugu

Task-Based Language Teaching: An Insight into Teacher Practice

Eye Level Education. Program Orientation

CROSS LANGUAGE INFORMATION RETRIEVAL: IN INDIAN LANGUAGE PERSPECTIVE

Overview of the 3rd Workshop on Asian Translation

The Discourse Anaphoric Properties of Connectives

Double Master Degrees in International Economics and Development

POS tagging of Chinese Buddhist texts using Recurrent Neural Networks

Leveraging Sentiment to Compute Word Similarity

Developing Autonomy in an East Asian Classroom: from Policy to Practice

NCU IISR English-Korean and English-Chinese Named Entity Transliteration Using Different Grapheme Segmentation Approaches

HinMA: Distributed Morphology based Hindi Morphological Analyzer

Language Independent Passage Retrieval for Question Answering

Transliteration Systems Across Indian Languages Using Parallel Corpora

Regional Capacity-Building on ICT for Development Item 7 Third Session of Committee on ICT 21 November, 2012 Bangkok

The Current Situations of International Cooperation and Exchange and Future Expectations of Guangzhou Ploytechnic of Sports

A Survey of WordNets and their Licenses

History. 344 History. Program Student Learning Outcomes. Faculty and Offices. Degrees Awarded. A.A. Degree: History. College Requirements

Approved Foreign Language Courses

OTHER RESEARCH EXPERIENCE & AFFILIATIONS

Developing a large semantically annotated corpus

Vocabulary Usage and Intelligibility in Learner Language

Semi-supervised methods of text processing, and an application to medical concept extraction. Yacine Jernite Text-as-Data series September 17.

Philip Hallinger a & Arild Tjeldvoll b a Hong Kong Institute of Education. To link to this article:

Yoshida Honmachi, Sakyo-ku, Kyoto, Japan 1 Although the label set contains verb phrases, they

Applications of memory-based natural language processing

ROSETTA STONE PRODUCT OVERVIEW

ACS HONG KONG_INTERNATIONAL CHEMICAL SCIENCES CHAPTER 2011 ANNUAL REPORT

September 8, 2017 Asia Pacific Health Promotion Capacity Building Forum

ACS HONG KONG INTERNATIONAL CHEMICAL SCIENCES CHAPTER 2014 ANNUAL REPORT

Multiobjective Optimization for Biomedical Named Entity Recognition and Classification

Investigation of Indian English Speech Recognition using CMU Sphinx

A Simple Surface Realization Engine for Telugu

HARVARD GLOBAL UPDATE. October 1-2, 2014

OCW Global Conference 2009 MONTERREY, MEXICO BY GARY W. MATKIN DEAN, CONTINUING EDUCATION LARRY COOPERMAN DIRECTOR, UC IRVINE OCW

5.7 Country case study: Vietnam

Conversions among Fractions, Decimals, and Percents

GEB 6930 Doing Business in Asia Hough Graduate School Warrington College of Business Administration University of Florida

Shun-ling Chen. Harvard Law School, S.J.D., expected: 2012, with a PhD Secondary Field in Science, Technology and Society, Harvard University

Noisy Channel Models for Corrupted Chinese Text Restoration and GB-to-Big5 Conversion

FACULTY OF ARTS. Division of Anthropology. Programme. Admission Requirements. Additional Application Information. Fields of Specialization

EXECUTIVE SUMMARY. TIMSS 1999 International Mathematics Report

EDUCATIONAL DECENTRALIZATION

Asia s Global Influence. The focus of this lesson plan is on the sites and attractions of Hong Kong.

English for Specific Purposes Research Trends, Issues and Controversies

National Taiwan Normal University - List of Presidents

Disambiguation of Thai Personal Name from Online News Articles

Application of Visualization Technology in Professional Teaching

James H. Williams, Ed.D. CICE, Hiroshima University George Washington University August 2, 2012

Building a Semantic Role Labelling System for Vietnamese

SINGLE DOCUMENT AUTOMATIC TEXT SUMMARIZATION USING TERM FREQUENCY-INVERSE DOCUMENT FREQUENCY (TF-IDF)

Professional Development Guideline for Instruction Professional Practice of English Pre-Service Teachers in Suan Sunandha Rajabhat University

Improving the Quality of MT Output using Novel Name Entity Translation Scheme

Young Leaders Program

Mandarin Lexical Tone Recognition: The Gating Paradigm

A Syllable Based Word Recognition Model for Korean Noun Extraction

Two methods to incorporate local morphosyntactic features in Hindi dependency

Curriculum Vitae. Jonathan D. London. Assistant Professor of Sociology, City University of Hong Kong, January 2008-

Cultural Diversity in English Language Teaching: Learners Voices

APPENDIX 2: TOPLINE QUESTIONNAIRE

REGIONAL CAPACITY BUILDING ON ICT FOR DEVELOPMENT

DLM NYSED Enrollment File Layout for NYSAA

Target Language Preposition Selection an Experiment with Transformation-Based Learning and Aligned Bilingual Data

Distant Supervised Relation Extraction with Wikipedia and Freebase

A Named Entity Recognition Method using Rules Acquired from Unlabeled Data

Overview. Contrasts in Current Approaches to Quality Assurance of Universities in Australia, the United Kingdom and New Zealand

2016 Kyoto Global Conference for Rising Public Health Researchers Universal Health Coverage and Health Economics

GLOBAL MEET FOR A RESURGENT BIHAR

Corpus on Web: Introducing The First Tagged and Balanced Chinese Corpus + Chu-Ren Huang, *Keh-Jiann Chen and -Shin Lin

Use of Online Information Resources for Knowledge Organisation in Library and Information Centres: A Case Study of CUSAT

EXECUTIVE SUMMARY. TIMSS 1999 International Science Report

16-17 NOVEMBER 2017, MOSCOW, RUSSIAN FEDERATION OVERVIEW PRESENTATION

Master s Degree Programme in East Asian Studies

Impact of Educational Reforms to International Cooperation CASE: Finland

Bachelor of Arts in Gender, Sexuality, and Women's Studies

INSTITUTE OF MANAGEMENT STUDIES NOIDA

Section V Reclassification of English Learners to Fluent English Proficient

Introduction Research Teaching Cooperation Faculties. University of Oulu

On-Screen Font in Telugu

Expert locator using concept linking. V. Senthil Kumaran* and A. Sankar

Programme Specification. MSc in International Real Estate

ASIAN CHAPTER NEWS Official Newsletter of the Asian Chapter of the Special Libraries Association

Master of Arts Program Handbook

The role of the first language in foreign language learning. Paul Nation. The role of the first language in foreign language learning

Heritage Korean Stage 6 Syllabus Preliminary and HSC Courses

Curriculum Vitae of Dr. Bani Bhattacharya

Ideas for Intercultural Education

Asian Studies. Jukka Lahtinen. at Helsinki Metropolia University of Applied Sciences Program Director: Managing Director, Avaintulos Oy

ANNEXURE VII (Part-II) PRACTICAL WORK FIRST YEAR ( )

Multilingual Sentiment and Subjectivity Analysis

Ensemble Technique Utilization for Indonesian Dependency Parser

AN INTRODUCTION (2 ND ED.) (LONDON, BLOOMSBURY ACADEMIC PP. VI, 282)

UC Merced Proceedings of the Annual Meeting of the Cognitive Science Society

Setting the Scene and Getting Inspired

Transcription:

IJCNLP 2008 The 6th Workshop on Asian Language Resources (ALR 6) Proceedings of the Workshop 11-12 January 2008 Indian School of Business, Hyderabad, India

c 2008 Asian Federation of Natural Language Processing Sponsor Special Coordination Funds for Promoting Science and Technology, Ministry of Education, Culture, Sport, Science and Technology, MEXT Japan.

Preface This volume contains the papers presented at the sixth workshop on Asian Language Resources, held on 11 12 January 2008 in conjunction with the third International Joint Conference on Natural Langauge Processing (IJCNLP 2008). Language resources have played an essential role in empirical approaches to natural language processing (NLP) for the last two decades. Previous concerted efforts on construction of language resources, particularly in the US and European countries, have laid a solid foundation for the pioneering NLP researches in these two communities. In comparison, the availability and accessibility of many Asian language resources are still very limited except for a few languages. Moreover, there is a greater diversity in Asian languages with respect to character sets, grammatical properties and the cultural background. Motivated by such a context, we have organised a series of workshops on Asian language resources since 2001. This workshop series has contributed to the activation of the NLP research in Asia particularly of building and utilising corpora of various types and languages. In this sixth workshop, we had 31 submissions encompassing 13 languages. The paper selection was highly competitive compared with the last five workshops. The program committee selected 10 regular papers, 3 short papers and 8 resource reports for presentation at the workshop. The workshop is comprised of two parts, technical sessions and a session devoted to reporting activities related to language resources in several languages. Following the resource report session, we have an open discussion on the collaboration in building, standardising and exchanging language resources in Asia. We hope this workshop further accelerates the already thriving NLP research in Asia. Chu-Ren Huang Mikami Yoshiki Workshop Co-chairs Hasida Kôiti Tokunaga Takenobu Program Co-chairs i

Organiser Workshop chairs Huang, Chu-Ren Academia Sinica Mikami, Yoshiki Nagaoka University of Technology Program chairs Hasida, Kôiti Tokunaga, Takenobu National Institute of Advanced Industrial Science and Technology Tokyo Institute of Technology Program Committee Bhattacharyya, Pushpak Fang, Alex Chengyu Riza, Hammam Hasida, Kôiti He, Tingting Huang, Chu-Ren Hussain, Sarmad Itahashi, Shuichi Lu, Qin Luong, Chi Mai Mikami, Yoshiki Nandasara, Shakrange Turrance Nguyen, Thi Minh Huyen Oo, Thein Rau, Victoria Rim, Hae-Chang Roxas, Rachel Edita O Shirai, Kiyoaki Sornlertlamvanich, Virach Sui, Zhifang Tokunaga, Takenobu Vikas, Om Zhao, Jun IIT, Bombay City University of Hong Kong IPTEKnet BPPT National Institute of Advanced Industrial Science and Technology Huazhong Normal University Academia Sinica National University of Computer & Emerging Sciences National Institute of Informatics Hong Kong Polytechnic University National Center for Sciences and Technologies of Vietnam Nagaoka University of Technology University of Colombo, School of Computing Hanoi University of Sciences Myanmar Computer Federation Providence University Korea University De La Salle University, Manila Japan Advanced Institute of Science and Technology Thai Computational Linguistics Laboratory, NICT Peking University Tokyo Institute of Technology Indian Institute of Information Technology and Management Chinese Academy of Sciences This workshop is supported by Special Coordination Funds for Promoting Science and Technology, Ministry of Education, Culture, Sport, Science and Technology, MEXT Japan. ii

Workshop Program 11-12 January 2008 Indian School of Business, Hyderabad, India Day 1 (11 January) 9:00 Registration 9:20 Opening 9:30 Development of Bengali Named Entity Tagged Corpus and its Use in NER Systems Asif Ekbal and Sivaji Bandyopadhyay 9:55 Gazetteer Preparation for Named Entity Recognition in Indian Languages Sujan Kumar Saha, Sudeshna Sarkar and Pabitra Mitra 10:20 Preliminary Chinese Term Classification for Ontology Construction Gaoying Cui, Qin Lu and Wenjie Li 10:45 Break 11:05 Technical Terminology in Asian Languages: Different Approaches to Adopting Engineering Terms Makiko Matsuda, Tomoe Takahashi, Hiroki Goto, Yoshikazu Hayase, Robin Lee Nagano and Yoshiki Mikami 11:30 Selection of XML tag set for Myanmar National Corpus Wunna Ko Ko and Thin Zar Phyo 11:55 Myanmar Word Segmentation using Syllable level Longest Matching Hla Hla Htay and Kavi Narayana Murthy 12:20 Lunch 13:50 The Link Structure of Language Communities and its Implication for Language-specific Crawling Rizza Caminero and Yoshiki Mikami 14:15 A Multilingual Multimedia Indian Sign Language Dictionary Tool Tirthankar Dasgupta, Sambit Shukla, Sandeep Kumar, Synny Diwakar and Anupam Basu 14:40 A Discourse Resource for Turkish: Annotating Discourse Connectives in the METU Corpus Deniz Zeyrek and Bonnie Webber 15:05 Towards an Annotated Corpus of Discourse Relations in Hindi Rashmi Prasad, Samar Husain, Dipti Sharma and Aravind Joshi 15:30 Break 15:50 A Semantic Study on Yami Ontology in Traditional Songs Yin-Sheng Tai, D. Victoria Rau and Meng-Chien Yang 16:05 Assessment and Development of POS Tag Set for Telugu Rama Sree R.J., Uma Maheswara Rao G and Madhu Murthy K.V. 16:20 Designing a Common POS-Tagset Framework for Indian Languages Sankaran Baskaran, Kalika Bali, Tanmoy Bhattacharya, Pushpak Bhattacharyya, Girish Nath Jha, Rajendran S, Saravanan K, Sobha L and Subbarao K V iii

Day 2 (12 January) 9:00 Resources Report on Languages of Indonesia Hammam Riza 9:15 Confirmed Language Resource for Answering How Type Questions Developed by Using Mails Posted to a Mailing List Ryo Nishimura, Yasuhiko Watanabe and Yoshihiro Okada 9:30 Corpus building for Mongolian language Purev Jaimai and Odbayar Chimeddorj 9:45 Resources for Urdu Language Processing Sarmad Hussain 10:00 Balanced Corpus of Contemporary Written Japanese Kikuo Maekawa 10:15 Break 10:35 A Basic Framework to Build a Test Collection for the Vietnamese Text Catergorization Viet Hoang-Anh, Thu Dinh-Thi-Phuong and Thang Huynh-Quyet 10:50 Enhanced Tools for Online Collaborative Language Resource Development Virach Sornlertlamvanich, Thatsanee Charoenporn, Suphanut Thayaboon, Chumpol Mokarat and Hitoshi Isahara 11:05 Japanese Effort Toward Sharing Text and Speech Corpora Shuichi Itahashi and Kôiti Hasida 11:20 Open Discussion 12:20 Closing iv

Table of Contents Regular papers Development of Bengali Named Entity Tagged Corpus and its Use in NER Systems Asif Ekbal and Sivaji Bandyopadhyay........................................................1 Gazetteer Preparation for Named Entity Recognition in Indian Languages Sujan Kumar Saha, Sudeshna Sarkar and Pabitra Mitra........................................ 9 Preliminary Chinese Term Classification for Ontology Construction Gaoying Cui, Qin Lu and Wenjie Li........................................................ 17 Technical Terminology in Asian Languages: Different Approaches to Adopting Engineering Terms Makiko Matsuda, Tomoe Takahashi, Hiroki Goto, Yoshikazu Hayase, Robin Lee Nagano and Yoshiki Mikami.................................................................................. 25 Selection of XML tag set for Myanmar National Corpus Wunna Ko Ko and Thin Zar Phyo.......................................................... 33 Myanmar Word Segmentation using Syllable level Longest Matching Hla Hla Htay and Kavi Narayana Murthy....................................................41 The Link Structure of Language Communities and its Implication for Language-specific Crawling Rizza Caminero and Yoshiki Mikami....................................................... 49 A Multilingual Multimedia Indian Sign Language Dictionary Tool Tirthankar Dasgupta, Sambit Shukla, Sandeep Kumar, Synny Diwakar and Anupam Basu........ 57 A Discourse Resource for Turkish: Annotating Discourse Connectives in the METU Corpus Deniz Zeyrek and Bonnie Webber.......................................................... 65 Towards an Annotated Corpus of Discourse Relations in Hindi Rashmi Prasad, Samar Husain, Dipti Sharma and Aravind Joshi............................... 73 Short papers A Semantic Study on Yami Ontology in Traditional Songs Yin-Sheng Tai, D. Victoria Rau and Meng-Chien Yang....................................... 81 Assessment and Development of POS Tag Set for Telugu Rama Sree R.J., Uma Maheswara Rao G and Madhu Murthy K.V.............................. 85 Designing a Common POS-Tagset Framework for Indian Languages Sankaran Baskaran, Kalika Bali, Tanmoy Bhattacharya, Pushpak Bhattacharyya, Girish Nath Jha, Rajendran S, Saravanan K, Sobha L and Subbarao K V..........................................89 Resource reports Resources Report on Languages of Indonesia Hammam Riza........................................................................... 93 Confirmed Language Resource for Answering How Type Questions Developed by Using Mails Posted to a Mailing List Ryo Nishimura, Yasuhiko Watanabe and Yoshihiro Okada.................................... 95 v

Corpus building for Mongolian language Purev Jaimai and Odbayar Chimeddorj...................................................... 97 Resources for Urdu Language Processing Sarmad Hussain.......................................................................... 99 Balanced Corpus of Contemporary Written Japanese Kikuo Maekawa......................................................................... 101 A Basic Framework to Build a Test Collection for the Vietnamese Text Catergorization Viet Hoang-Anh, Thu Dinh-Thi-Phuong and Thang Huynh-Quyet............................ 103 Enhanced Tools for Online Collaborative Language Resource Development Virach Sornlertlamvanich, Thatsanee Charoenporn, Suphanut Thayaboon, Chumpol Mokarat and Hitoshi Isahara............................................................................ 105 Japanese Effort Toward Sharing Text and Speech Corpora Shuichi Itahashi and Kôiti Hasida......................................................... 107 vi