Error Correcting Romaji-kana Conversion for Japanese Language Education

Similar documents
NCU IISR English-Korean and English-Chinese Named Entity Transliteration Using Different Grapheme Segmentation Approaches

Kwansei Gakuin University Rep

A Named Entity Recognition Method using Rules Acquired from Unlabeled Data

arxiv: v1 [cs.cl] 2 Apr 2017

Trend Survey on Japanese Natural Language Processing Studies over the Last Decade

POS tagging of Chinese Buddhist texts using Recurrent Neural Networks

My Japanese Coach: Lesson I, Basic Words

Automatic English-Chinese name transliteration for development of multilingual resources

Add -reru to the negative base, that is to the "-a" syllable of any Godan Verb. e.g. becomes becomes

MULTILINGUAL INFORMATION ACCESS IN DIGITAL LIBRARY

3 Character-based KJ Translation

arxiv:cs/ v2 [cs.cl] 7 Jul 1999

Implementing the Syntax of Japanese Numeral Classifiers

user s utterance speech recognizer content word N-best candidates CMw (content (semantic attribute) accept confirm reject fill semantic slots

Investigation on Mandarin Broadcast News Speech Recognition

Abbreviated text input. The Harvard community has made this article openly available. Please share how this access benefits you. Your story matters.

The Ups and Downs of Preposition Error Detection in ESL Writing

Words come in categories

EdIt: A Broad-Coverage Grammar Checker Using Pattern Grammar

**Note: this is slightly different from the original (mainly in format). I would be happy to send you a hard copy.**

Using a Native Language Reference Grammar as a Language Learning Tool

Evaluation of a Simultaneous Interpretation System and Analysis of Speech Log for User Experience Assessment

Eli Yamamoto, Satoshi Nakamura, Kiyohiro Shikano. Graduate School of Information Science, Nara Institute of Science & Technology

UNIVERS, Macrocosm and Microcosm in Western Art. The picture is the Battle of

What Can Neural Networks Teach us about Language? Graham Neubig a2-dlearn 11/18/2017

JYPE Spring Semester Course Description

30 Sociocultural theory and the zone of proximal development

LINGUIST List

Disambiguation of Thai Personal Name from Online News Articles

Learning Methods in Multilingual Speech Recognition

Speech Emotion Recognition Using Support Vector Machine

Mandarin Lexical Tone Recognition: The Gating Paradigm

Information Session 13 & 19 August 2015

Combining a Chinese Thesaurus with a Chinese Dictionary

Product Feature-based Ratings foropinionsummarization of E-Commerce Feedback Comments

FEIRONG YUAN, PH.D. Updated: April 15, 2016

Information Retrieval

Anatomy and Physiology. Astronomy. Boomilever. Bungee Drop

Overview of the 3rd Workshop on Asian Translation

Lip reading: Japanese vowel recognition by tracking temporal changes of lip shape

Cross Language Information Retrieval

DEPARTMENT OF JAPANESE LANGUAGE AND STUDIES

Survey on parsing three dependency representations for English

THE ACQUISITION OF ARGUMENT ELLIPSIS IN JAPANESE: A PRELIMINARY STUDY* Koji Sugisaki Mie University

A heuristic framework for pivot-based bilingual dictionary induction

Rule Learning with Negation: Issues Regarding Effectiveness

Problems of the Arabic OCR: New Attitudes

Basic German: CD/Book Package (LL(R) Complete Basic Courses) By Living Language

2012 Ph.D. University of Maryland, College Park (UMD). Physics. (December, anticipated)

Why Is the Chinese Curriculum Difficult for Immigrants Children from Southeast Asia

Al Cornish Head, Library Systems Washington State University Libraries Pullman, WA

Distant Supervised Relation Extraction with Wikipedia and Freebase

What the National Curriculum requires in reading at Y5 and Y6

Noisy Channel Models for Corrupted Chinese Text Restoration and GB-to-Big5 Conversion

English (from Chinese) (Language Learners) By Daniele Bourdaise

have to be modeled) or isolated words. Output of the system is a grapheme-tophoneme conversion system which takes as its input the spelling of words,

TextGraphs: Graph-based algorithms for Natural Language Processing

Longest Common Subsequence: A Method for Automatic Evaluation of Handwritten Essays

Improving the Quality of MT Output using Novel Name Entity Translation Scheme

Predicting Student Attrition in MOOCs using Sentiment Analysis and Neural Networks

Artwork and Drama Activities Using Literature with High School Students

Predicting Students Performance with SimStudent: Learning Cognitive Skills from Observation

METHODS FOR EXTRACTING AND CLASSIFYING PAIRS OF COGNATES AND FALSE FRIENDS

The MSR-NRC-SRI MT System for NIST Open Machine Translation 2008 Evaluation

Multiobjective Optimization for Biomedical Named Entity Recognition and Classification

THE ROLE OF DECISION TREES IN NATURAL LANGUAGE PROCESSING

Memory-based grammatical error correction

EECS 571 PRINCIPLES OF REAL-TIME COMPUTING Fall 10. Instructor: Kang G. Shin, 4605 CSE, ;

Grade 3: Module 2B: Unit 3: Lesson 10 Reviewing Conventions and Editing Peers Work

Semi-supervised methods of text processing, and an application to medical concept extraction. Yacine Jernite Text-as-Data series September 17.

Multilingual Document Clustering: an Heuristic Approach Based on Cognate Named Entities

FY year and 3-year Cohort Default Rates by State and Level and Control of Institution

Remarks on Classifiers and Nominal Structure in East Asian

The taming of the data:

STELLA TING-TOOMEY CURRICULUM VITAE

Detecting Student Emotions in Computer-Enabled Classrooms

Sari locative noun classes Contents

Detecting Wikipedia Vandalism using Machine Learning Notebook for PAN at CLEF 2011

Humboldt-Universität zu Berlin

French Dictionary: 1000 French Words Illustrated By Evelyn Goldsmith

Student Admissions, Outcomes, and Other Data

INTRODUCTION. 512 J. Acoust. Soc. Am. 105 (1), January /99/105(1)/512/10/$ Acoustical Society of America 512

Houghton Mifflin Reading Correlation to the Common Core Standards for English Language Arts (Grade1)

A New Perspective on Combining GMM and DNN Frameworks for Speaker Adaptation

Rule Learning With Negation: Issues Regarding Effectiveness

Analyzing sentiments in tweets for Tesla Model 3 using SAS Enterprise Miner and SAS Sentiment Analysis Studio

Identification of Opinion Leaders Using Text Mining Technique in Virtual Community

Introduction, Organization Overview of NLP, Main Issues

Using the Web as a Bilingual Dictionary

Test Blueprint. Grade 3 Reading English Standards of Learning

Let s think about how to multiply and divide fractions by fractions!

Direct and Indirect Passives in East Asian. C.-T. James Huang Harvard University

The NICT Translation System for IWSLT 2012

Busuu The Mobile App. Review by Musa Nushi & Homa Jenabzadeh, Introduction. 30 TESL Reporter 49 (2), pp

Books Effective Literacy Y5-8 Learning Through Talk Y4-8 Switch onto Spelling Spelling Under Scrutiny

Rule discovery in Web-based educational systems using Grammar-Based Genetic Programming

The role of the first language in foreign language learning. Paul Nation. The role of the first language in foreign language learning

Residual Stacking of RNNs for Neural Machine Translation

Target Language Preposition Selection an Experiment with Transformation-Based Learning and Aligned Bilingual Data

The Effects of Social Anxiety on English Language Learning in Japan R. A. Brown

Transcription:

1 1 2 1 SNS 10% Error Correcting Romaji-kana Conversion for Japanese Language Education Seiji Kasahara, 1 Mamoru Komachi, 1 Masaaki Nagata 2 and Yuuji Matsumoto 1 We present an approach to help Japanese editors on language learning SNS correct learners sentences written in roman characters by converting them into kana Our system detects foreign words and converts only Japanese words even if it contains spelling errors. Experimental results show that our system achieves about 10 points higher conversion accuracy than one of traditional input methods. Error analysis reveals tendency of errors made by learners. For example, learners tend to be confused by vowels and make errors caused by nature of their native language. 1 Nara Institute of Science and Technology 2 NTT NTT Communication Science Laboratories 1. 2009 133 365 1 50 SNS 1 http://www.jpf.go.jp/j/about/press/dl/0542.pdf 1 c 2011 Information Processing Society of Japan

2 3 4 SNS Lang-8 5 6 7 8 2. 7) n-gram?) 2) 3) 3. 4. Lang-8 SNS Lang-8 1 75,000 925,588 93.4% 763,971 10,000 1 Lang-8 Lang-8 1 2 3 4 5 6 7 OK desu 8 9 10 8 hanasemasu hanashimasu mada made 9 ha no 10 amerikajin americagen amerika america jin gen Lang-8 ha wa wo o he e 5. 1 http://lang-8.com/ 2 c 2011 Information Processing Society of Japan

1 Onaka ga itai desu! Onaka ga itai desu! 2 suki ni narimasu. suki ni narimasu.perfect! 3 Isogashikatta. Isogashikatta. 4 gakko wa omoshiroi desu. gakko wa omoshiroi desu. 5 Tokyo ni irutoki, Meiji-jingu mo ni ikimashita. Tokyo ni irutoki, Meiji-jingu ni mo ikimashita. 6 Noh ni mimashita. Nihonjin no tomodachi ga Noh wo misetekuremashita. 7 Konnichiwa! OK desu 8 nihongo ga sukoshi hanashimasu demo made jouzu ja arimasen. nihongo ga sukoshi hanasemasu demo mada jouzu ja arimasen. 9 Chichi no atama ga ii desu. Chichi ha atama ga ii desu. 10 watashi wa americagen desu. watashi wa amerikajin desu. 1 Lang-8 5.1 1 155 287 WordNet 2.1 2 IPADic 2.7.0 1991 CaboCha 0.53 3 243,663 5.2 uni-gram IPADic 5.2.1 4 n-gram n 1 packu 163 kau pakku chikau 4) 5 5.2.2 n-gram 5-gram 1991 kakasi 1 kakasi 2.3.4. http://kakasi.namazu.org/ 2 http://wordnet.princeton.edu/ 3 http://chasen.org/~taku/software/cabocha/ 4 5 http://www.chokkan.org/software/simstring/ 3 c 2011 Information Processing Society of Japan

2 yorushiku onegia shimasu. yoroshiku onegai shimasu. Muscle musical wo mietai. Muscle musical wo mitai. Muscle musical Gorofu ga daisuki desu gorufu ga daisuki desu Lang-8 SRILM 1.5.12 1 Witten-Bell 5.3 ca, ci, cu, ce, co ka, shi, ku, se, ko m n kinyuu n 6. 6.1 Recall = N t N w, P recision = N t N e Nt Nw Ne 6.2 Anthy 74.5 66.7 69.7 84.5 76.6 77.3 85.0 78.1 78.6 3 Anthy 7900 2 Anthy 6.3 Lang-8 Lang-8 500 2 6.4 3 85.0% Anthy 74.5% 10 84.5% 4 77.3% 1 http://www-speech.sri.com/projects/srilm/ 2 http://anthy.sourceforge.jp/ 4 c 2011 Information Processing Society of Japan

1 domou doumo 2 Yorushiko onegai shimasu yoroshiku onegai shimasu 3 Merrii kurisamasu, mina-san merii kurisumasu minasan 4 domo arigato guzaimasu doumo arigatou gozaimasu 5 nihongo ga scoshi wakarimasu s nihongo ga sukoshi wakarimasu 6 hajimimashtei sh hajimemashite 7 donna eigaosaiking mimashitaka donna eiga wo saikin mimashitaka 8 Horandajin desu orandajin desu 9 Nihon go wa totemo musugashi desu nihon go wa totemo muzukashii desu 5 1 Soshite, kurama wo durivu wo shimasu, Soshite, kuruma wo doraibu wo shimasu 2 boku wa nagai ichi-nichi no renshou o shimasu boku wa nagai ichi nichi no renshuu o shimasu 3 Terebi gamu wo asobitai desu terebi geemu wo asobitai desu 6 shuutmatsu t shuumatsu do-yoobi doyoubi packu c pakku 4 durivu doraibu 3 prutugarogo p porutogarugo 3 musugashi muzukashii 3 7 78.6% 4 76.6% 78.1% 7. 3 7.1?? renhuu renshou renshou n-gram 7.2 7 muzukashii musugashi 5 c 2011 Information Processing Society of Japan

denwabangou denwa bangou Meiji-jingu meiji jinguu nouryokushiken nouryoku shiken 8 5 1 2 3 4 doumo domou 5 6 su shi 7 n ng 8 9 5) 7.3 8 nouryokushiken nouryoku shiken IPADic nouryokushiken Lang-8 1) Zheng Chen and Kai-Fu Lee. A New Statistical Approach to Chinese Pinyin Input. In Proceedings of ACL, pp. 241 247, 2000. 2) Yo Ehara and Kumiko Tanaka-Ishii. Multilingual Text Entry using Automatic Language Detection. In Proceedings of IJCNLP, pp. 441 448, 2008. 3) Tomoya Mizumoto, Mamoru Komachi, Masaaki Nagata, and Yuji Matsumoto. Mining Revision Log of Language Learning SNS for Automated Japanese Error Correction of Second Language Learners. In Proceedings of IJCNLP, 2011. 4) Naoaki Okazaki and Jun ichi Tsujii. Simple and Efficient Algorithm for Approximate Dictionary Matching. In Proceedings of COLING, pp. 851 859, 2010. 5) Kumiko Tanaka-Ishii, Yusuke Inutsuka, and Masato Takeichi. Japanese input system with digits Can Japanese be input only with consonants? In Proceedings of HLT, pp. 211 218, 2001. 6) Yabin Zheng, Chen Li, and Maosong Sun. CHIME: An Efficient Error-Tolerant Chinese Pinyin Input Method. In Proceedings of IJCAI, pp. 2551 2556, 2011. 7). N-gram., Vol.40, No.6, pp. 2690 2698, 1999. 8. SNS 10 6 c 2011 Information Processing Society of Japan