1 1 2 1 SNS 10% Error Correcting Romaji-kana Conversion for Japanese Language Education Seiji Kasahara, 1 Mamoru Komachi, 1 Masaaki Nagata 2 and Yuuji Matsumoto 1 We present an approach to help Japanese editors on language learning SNS correct learners sentences written in roman characters by converting them into kana Our system detects foreign words and converts only Japanese words even if it contains spelling errors. Experimental results show that our system achieves about 10 points higher conversion accuracy than one of traditional input methods. Error analysis reveals tendency of errors made by learners. For example, learners tend to be confused by vowels and make errors caused by nature of their native language. 1 Nara Institute of Science and Technology 2 NTT NTT Communication Science Laboratories 1. 2009 133 365 1 50 SNS 1 http://www.jpf.go.jp/j/about/press/dl/0542.pdf 1 c 2011 Information Processing Society of Japan
2 3 4 SNS Lang-8 5 6 7 8 2. 7) n-gram?) 2) 3) 3. 4. Lang-8 SNS Lang-8 1 75,000 925,588 93.4% 763,971 10,000 1 Lang-8 Lang-8 1 2 3 4 5 6 7 OK desu 8 9 10 8 hanasemasu hanashimasu mada made 9 ha no 10 amerikajin americagen amerika america jin gen Lang-8 ha wa wo o he e 5. 1 http://lang-8.com/ 2 c 2011 Information Processing Society of Japan
1 Onaka ga itai desu! Onaka ga itai desu! 2 suki ni narimasu. suki ni narimasu.perfect! 3 Isogashikatta. Isogashikatta. 4 gakko wa omoshiroi desu. gakko wa omoshiroi desu. 5 Tokyo ni irutoki, Meiji-jingu mo ni ikimashita. Tokyo ni irutoki, Meiji-jingu ni mo ikimashita. 6 Noh ni mimashita. Nihonjin no tomodachi ga Noh wo misetekuremashita. 7 Konnichiwa! OK desu 8 nihongo ga sukoshi hanashimasu demo made jouzu ja arimasen. nihongo ga sukoshi hanasemasu demo mada jouzu ja arimasen. 9 Chichi no atama ga ii desu. Chichi ha atama ga ii desu. 10 watashi wa americagen desu. watashi wa amerikajin desu. 1 Lang-8 5.1 1 155 287 WordNet 2.1 2 IPADic 2.7.0 1991 CaboCha 0.53 3 243,663 5.2 uni-gram IPADic 5.2.1 4 n-gram n 1 packu 163 kau pakku chikau 4) 5 5.2.2 n-gram 5-gram 1991 kakasi 1 kakasi 2.3.4. http://kakasi.namazu.org/ 2 http://wordnet.princeton.edu/ 3 http://chasen.org/~taku/software/cabocha/ 4 5 http://www.chokkan.org/software/simstring/ 3 c 2011 Information Processing Society of Japan
2 yorushiku onegia shimasu. yoroshiku onegai shimasu. Muscle musical wo mietai. Muscle musical wo mitai. Muscle musical Gorofu ga daisuki desu gorufu ga daisuki desu Lang-8 SRILM 1.5.12 1 Witten-Bell 5.3 ca, ci, cu, ce, co ka, shi, ku, se, ko m n kinyuu n 6. 6.1 Recall = N t N w, P recision = N t N e Nt Nw Ne 6.2 Anthy 74.5 66.7 69.7 84.5 76.6 77.3 85.0 78.1 78.6 3 Anthy 7900 2 Anthy 6.3 Lang-8 Lang-8 500 2 6.4 3 85.0% Anthy 74.5% 10 84.5% 4 77.3% 1 http://www-speech.sri.com/projects/srilm/ 2 http://anthy.sourceforge.jp/ 4 c 2011 Information Processing Society of Japan
1 domou doumo 2 Yorushiko onegai shimasu yoroshiku onegai shimasu 3 Merrii kurisamasu, mina-san merii kurisumasu minasan 4 domo arigato guzaimasu doumo arigatou gozaimasu 5 nihongo ga scoshi wakarimasu s nihongo ga sukoshi wakarimasu 6 hajimimashtei sh hajimemashite 7 donna eigaosaiking mimashitaka donna eiga wo saikin mimashitaka 8 Horandajin desu orandajin desu 9 Nihon go wa totemo musugashi desu nihon go wa totemo muzukashii desu 5 1 Soshite, kurama wo durivu wo shimasu, Soshite, kuruma wo doraibu wo shimasu 2 boku wa nagai ichi-nichi no renshou o shimasu boku wa nagai ichi nichi no renshuu o shimasu 3 Terebi gamu wo asobitai desu terebi geemu wo asobitai desu 6 shuutmatsu t shuumatsu do-yoobi doyoubi packu c pakku 4 durivu doraibu 3 prutugarogo p porutogarugo 3 musugashi muzukashii 3 7 78.6% 4 76.6% 78.1% 7. 3 7.1?? renhuu renshou renshou n-gram 7.2 7 muzukashii musugashi 5 c 2011 Information Processing Society of Japan
denwabangou denwa bangou Meiji-jingu meiji jinguu nouryokushiken nouryoku shiken 8 5 1 2 3 4 doumo domou 5 6 su shi 7 n ng 8 9 5) 7.3 8 nouryokushiken nouryoku shiken IPADic nouryokushiken Lang-8 1) Zheng Chen and Kai-Fu Lee. A New Statistical Approach to Chinese Pinyin Input. In Proceedings of ACL, pp. 241 247, 2000. 2) Yo Ehara and Kumiko Tanaka-Ishii. Multilingual Text Entry using Automatic Language Detection. In Proceedings of IJCNLP, pp. 441 448, 2008. 3) Tomoya Mizumoto, Mamoru Komachi, Masaaki Nagata, and Yuji Matsumoto. Mining Revision Log of Language Learning SNS for Automated Japanese Error Correction of Second Language Learners. In Proceedings of IJCNLP, 2011. 4) Naoaki Okazaki and Jun ichi Tsujii. Simple and Efficient Algorithm for Approximate Dictionary Matching. In Proceedings of COLING, pp. 851 859, 2010. 5) Kumiko Tanaka-Ishii, Yusuke Inutsuka, and Masato Takeichi. Japanese input system with digits Can Japanese be input only with consonants? In Proceedings of HLT, pp. 211 218, 2001. 6) Yabin Zheng, Chen Li, and Maosong Sun. CHIME: An Efficient Error-Tolerant Chinese Pinyin Input Method. In Proceedings of IJCAI, pp. 2551 2556, 2011. 7). N-gram., Vol.40, No.6, pp. 2690 2698, 1999. 8. SNS 10 6 c 2011 Information Processing Society of Japan