Automatic translation in Chinese and English based on mixed strategy

Similar documents
LEXICAL COHESION ANALYSIS OF THE ARTICLE WHAT IS A GOOD RESEARCH PROJECT? BY BRIAN PALTRIDGE A JOURNAL ARTICLE

A Case Study: News Classification Based on Term Frequency

Empirical research on implementation of full English teaching mode in the professional courses of the engineering doctoral students

Cross Language Information Retrieval

Constructing a support system for self-learning playing the piano at the beginning stage

Writing a composition

Longest Common Subsequence: A Method for Automatic Evaluation of Handwritten Essays

Welcome to the Purdue OWL. Where do I begin? General Strategies. Personalizing Proofreading

Speech Emotion Recognition Using Support Vector Machine

Parsing of part-of-speech tagged Assamese Texts

A Study of Metacognitive Awareness of Non-English Majors in L2 Listening

How to Judge the Quality of an Objective Classroom Test

Effectiveness of Electronic Dictionary in College Students English Learning

Taking into Account the Oral-Written Dichotomy of the Chinese language :

Arizona s English Language Arts Standards th Grade ARIZONA DEPARTMENT OF EDUCATION HIGH ACADEMIC STANDARDS FOR STUDENTS

AN INTRODUCTION (2 ND ED.) (LONDON, BLOOMSBURY ACADEMIC PP. VI, 282)

Modeling user preferences and norms in context-aware systems

AQUA: An Ontology-Driven Question Answering System

A Note on Structuring Employability Skills for Accounting Students

Notes on The Sciences of the Artificial Adapted from a shorter document written for course (Deciding What to Design) 1

Thought and Suggestions on Teaching Material Management Job in Colleges and Universities Based on Improvement of Innovation Capacity

Strategy Study on Primary School English Game Teaching

ZHANG Xiaojun, XIONG Xiaoliang School of Finance and Business English, Wuhan Yangtze Business University, P.R.China,

LISTENING STRATEGIES AWARENESS: A DIARY STUDY IN A LISTENING COMPREHENSION CLASSROOM

Mining Association Rules in Student s Assessment Data

INNOWIZ: A GUIDING FRAMEWORK FOR PROJECTS IN INDUSTRIAL DESIGN EDUCATION

THE VERB ARGUMENT BROWSER

Reducing Features to Improve Bug Prediction

Speech Recognition at ICSI: Broadcast News and beyond

Data Fusion Models in WSNs: Comparison and Analysis

Word Segmentation of Off-line Handwritten Documents

Australian Journal of Basic and Applied Sciences

Twitter Sentiment Classification on Sanders Data using Hybrid Approach

USER ADAPTATION IN E-LEARNING ENVIRONMENTS

Mastering Team Skills and Interpersonal Communication. Copyright 2012 Pearson Education, Inc. publishing as Prentice Hall.

Developing True/False Test Sheet Generating System with Diagnosing Basic Cognitive Ability

PAGE(S) WHERE TAUGHT If sub mission ins not a book, cite appropriate location(s))

MYP Language A Course Outline Year 3

BANGLA TO ENGLISH TEXT CONVERSION USING OPENNLP TOOLS

Derivational and Inflectional Morphemes in Pak-Pak Language

Implementing the English Language Arts Common Core State Standards

A Study of Video Effects on English Listening Comprehension

Specification of the Verity Learning Companion and Self-Assessment Tool

10.2. Behavior models

Linking Task: Identifying authors and book titles in verbose queries

Online Marking of Essay-type Assignments

Abbey Academies Trust. Every Child Matters

Use of Online Information Resources for Knowledge Organisation in Library and Information Centres: A Case Study of CUSAT

LITERACY ACROSS THE CURRICULUM POLICY

Evaluation of Usage Patterns for Web-based Educational Systems using Web Mining

Evaluation of Usage Patterns for Web-based Educational Systems using Web Mining

Transfer Learning Action Models by Measuring the Similarity of Different Domains

Timeline. Recommendations

Grade 6: Module 1: Unit 2: Lesson 5 Building Vocabulary: Working with Words about the Key Elements of Mythology

Teachers Guide Chair Study

The Good Judgment Project: A large scale test of different methods of combining expert predictions

The development and implementation of a coaching model for project-based learning

Dublin City Schools Broadcast Video I Graded Course of Study GRADES 9-12

OCR for Arabic using SIFT Descriptors With Online Failure Prediction

SINGLE DOCUMENT AUTOMATIC TEXT SUMMARIZATION USING TERM FREQUENCY-INVERSE DOCUMENT FREQUENCY (TF-IDF)

Guidelines for the Use of the Continuing Education Unit (CEU)

An Interactive Intelligent Language Tutor Over The Internet

Text Type Purpose Structure Language Features Article

Essay on importance of good friends. It can cause flooding of the countries or even continents..

Identifying Novice Difficulties in Object Oriented Design

Procedia - Social and Behavioral Sciences 141 ( 2014 ) WCLTA Using Corpus Linguistics in the Development of Writing

have to be modeled) or isolated words. Output of the system is a grapheme-tophoneme conversion system which takes as its input the spelling of words,

Communicative Language Teaching (CLT): A Critical and Comparative Perspective

CONCEPT MAPS AS A DEVICE FOR LEARNING DATABASE CONCEPTS

Calibration of Confidence Measures in Speech Recognition

The 9 th International Scientific Conference elearning and software for Education Bucharest, April 25-26, / X

On-Line Data Analytics

English for Specific Purposes World ISSN Issue 34, Volume 12, 2012 TITLE:

I. INTRODUCTION. for conducting the research, the problems in teaching vocabulary, and the suitable

THE EFFECTS OF TEACHING THE 7 KEYS OF COMPREHENSION ON COMPREHENSION DEBRA HENGGELER. Submitted to. The Educational Leadership Faculty

The English Monolingual Dictionary: Its Use among Second Year Students of University Technology of Malaysia, International Campus, Kuala Lumpur

Mandarin Lexical Tone Recognition: The Gating Paradigm

Constructing Parallel Corpus from Movie Subtitles

Assessment and Evaluation

Systematic reviews in theory and practice for library and information studies

1. Introduction. 2. The OMBI database editor

User Education Programs in Academic Libraries: The Experience of the International Islamic University Malaysia Students

Lecturing Module

Web as Corpus. Corpus Linguistics. Web as Corpus 1 / 1. Corpus Linguistics. Web as Corpus. web.pl 3 / 1. Sketch Engine. Corpus Linguistics

Heritage Korean Stage 6 Syllabus Preliminary and HSC Courses

Lip reading: Japanese vowel recognition by tracking temporal changes of lip shape

CELTA. Syllabus and Assessment Guidelines. Third Edition. University of Cambridge ESOL Examinations 1 Hills Road Cambridge CB1 2EU United Kingdom

USING INTERACTIVE VIDEO TO IMPROVE STUDENTS MOTIVATION IN LEARNING ENGLISH

Specification and Evaluation of Machine Translation Toy Systems - Criteria for laboratory assignments

Educator s e-portfolio in the Modern University

Textbook Evalyation:

NCU IISR English-Korean and English-Chinese Named Entity Transliteration Using Different Grapheme Segmentation Approaches

Grade 5: Module 3A: Overview

Learning Methods in Multilingual Speech Recognition

A Reinforcement Learning Variant for Control Scheduling

Identification of Opinion Leaders Using Text Mining Technique in Virtual Community

Rottenberg, Annette. Elements of Argument: A Text and Reader, 7 th edition Boston: Bedford/St. Martin s, pages.

ADDIE MODEL THROUGH THE TASK LEARNING APPROACH IN TEXTILE KNOWLEDGE COURSE IN DRESS-MAKING EDUCATION STUDY PROGRAM OF STATE UNIVERSITY OF MEDAN

Procedia - Social and Behavioral Sciences 154 ( 2014 )

Developing a Language for Assessing Creativity: a taxonomy to support student learning and assessment

Transcription:

dvanced Materials Research Online: 2013-09-18 ISSN: 1662-8985, Vols. 760-762, pp 1942-1946 doi:10.4028/www.scientific.net/mr.760-762.1942 2013 Trans Tech Publications, Switzerland utomatic translation in Chinese and English based on mixed strategy Heng Chen 1, a and Yongsheng Yu 2,b 1 School of foreign languages, Wuhan Institute of Technology, Wuhan City, 430070, China 2 School of computer science and technology, Huazhong University of Science and Technology, Wuhan City, 430074, China a chenchrisheng@foxmail.com, b yuyosh@hotmail.com Keywords: automatic translation; mixed strategy; regulation identification; combination of words; knowledge database bstract. English and Chinese translation is an essential part in language teaching. utomatic translation is the focus of language research. In this thesis, we analyze the traditional translation system and the intelligent translation system, coming up with the working model. Then we discuss the different granularity in English and Chinese translation and its influence on translation, and propose sentence construction model and Chinese and English translation model. Through the analysis, the advantages of the auto translation based on phrases are shown. Then, on the purpose of assisting English language teaching, automatic translation framework with mixed strategy based on phrase collecting, judgment, statistics and functional words connection, is worked out. The framework includes building up translation knowledge database by analyzing the translation documents resources, identifying phrase and sentence models, and automatic translating based on the knowledge database. Introduction Due to the importance of translation, we are always trying to find out efficient tools to translate. Nowadays machine translation has taken the place of man in doing translation work. From the linguistic perspective, the translation process is the combination of understanding and expressing. The understanding precedes the expressing with careful choice of words. Wittgenstein holds that the meaning of the word, the component of language, does not exist. The attribute of words lies in its usage, so the same word has different meanings in different contexts. esides, many Chinese distinguished translators underscore "resemblance in spirit" rather than "resemblance in form". Therefore human s translation is a complicated thinking pattern at the high level, while computer intelligence remains at the low level of logic thinking. Naturally, we can only demand the machine or computer aided translation close to the original meaning on the basis of suitable words correspondence. The translation methods can be classified into two categories: rule-based machine translation and corpus-based machine translation. Rule-based machine translation includes literal translation, conversion in translation, translation using special transforming language. Corpus-based machine translation methods include translation based on examples and based on statistics etc. [1] and [2] clarify the concept and definition of machine translation and analyze the prospects of its application. While [3] gives a comprehensive introduction to the early computer aided translation. From the perspective of research methods, [4] puts forward the computer aided translation depending on the dictionary basis. However [5] does some researches from the linguistic perspective. [6] analyzes the English and Chinese translation through semiotics. [7] and [8] are instrumental in establishing corpus. [10] presents studying the computer aided translation based on language thinking process. ll these researches open up a new horizon for the computer aided translation, but the present achievement can not satisfy the public. The automatic translation systems using different translation methods have different features. In a literature translation agency, people pay close attention to the integrity and consistency of the whole document. For assisted English language teaching however, the accuracy of statement translation ll rights reserved. No part of contents of this paper may be reproduced or transmitted in any form or by any means without the written permission of Trans Tech Publications, www.ttp.net. (#69806940, Pennsylvania State University, University Park, US-18/09/16,02:46:46)

dvanced Materials Research Vols. 760-762 1943 catches more attention. So it is important to consider different methods for different application. The thesis, based on the present study, probes into the main design of machine translation system, compares advantages and disadvantages of rule-based machine translation and corpus based machine translation, discusses the difficulties and the structure of the automatic translation system, then comes up with auto translation model based on mixed strategies and provide the framework for the special application domain of assisted English language teaching. The analysis of the translation model There are many methods in automatic translation, which can be grouped into two categories: rule-based machine translation and corpus-based machine translation. For the development of translation software working system between Chinese and English, the working model can be classified into the traditional translation model and the intelligent translation model. The traditional translation model of automatic translation ased on words, the principle of the traditional translation tools is to incorporate the dictionary in the translation software, so the translation tools can provide convenient functions such as vocal pronunciation, the choice of several dictionaries, etc. The traditional translation model is illustrated by Fig.1, with dictionary as the database. The retrieval process is similar to consulting a dictionary. There do exist some problems in this traditional translation model: limited the source of the database, poor software availability. Some software upgrades its tradition e-dictionary and enlarges its database to enhance the word to word translation to the level of phrase to phrase translation. The typical representative is early illustrated in Fig.1, besides dictionaries, it also collects collocations. Fig. 1. The traditional translation model. The intelligent translation model The traditional translation model is based on words, while in this model many-to-many words mapping relationship exists in Chinese and English. When you use this translation tools as if you consult your dictionary, you have to choose the best translation. The intelligent translation model is based on the rules in forming sentences or phrases, so its translation is not word to word mapping, but higher than word granularity, or based on the high frequency of translation, through which the user can choose the good one. The translation model based on statistical result of analyzed literature shows the different frequencies of translation statistically so that the translator can choose the well accepted one. For the rule-based machine translation, many commercial systems rest on rule sets. First, the original language is analyzed into a source internal representation identified by the rule sets. Next, this source internal representation is translated into the internal representation of the target language. Finally, it will be transform into target language shown to the users. There are other methods using a unique third language. In every language translation couples, it always translates the original language material into the third language, and then translates the third language to the target language. This method has advantage of good expansibility, easy to translate among multi-languages, but it is difficult to design the unique third language. Fig.2 demonstrates the working process of statistical translation model, from which we can see that it analyzes the titles, the key words and abstracts of every scientific thesis, and then processes the analysis results in its statistical database. This method doesn t give any judgment of the translation. It only shows the statistical frequency of translation. The statistical translation tool provides the whole statistic translation results to the users so that they can choose the one they like, which give users a lot of helps in academic translation. ut it does

1944 Optoelectronics Engineering and Information Technologies in Industry not make any judgment on the translation itself, so sometimes the users feel confused. esides, its power can not be fully exerted, on the condition that some terms have less English translation in the database. Fig. 2. The statistical translation model. Granularity of automatic translation If we take a complete statement as studying units, and then the statement model can be expressed as: P Sentence = FV [, S, C, T] (1) In Eq. 1, V represents the vocabulary, S represents the construction model, C represents the context, and T shows the emotional elements of the statement. Each section reflects a different aspect of the statement, and four parts constitute a complete statement. ased on Eq. 1, the process of translation between Chinese and English can be expressed as: F V, S, C, T F V, S, C, T (2) Chinese [ ] [ ] Eq. 2 shows that the translation process is mapping and replacement between the parts V, S, C, T of Chinese sentence and the corresponding parts of the English sentence. Firstly, we analyze the vocabulary. For automatic translation, there are different granularities in different systems. Single word, phrase and short sentence, all can be viewed as vocabulary. Usually the words have many-to-many mapping relationship: one Chinese word has many relating English synonyms, and vice versa. ut actually these synonyms have some differences, which can not be identified in granularity of words. 1 2 p1 1 2 p2 1 2 pn V w, w,, w V w, w,, w, w, w,, w,, w, w,, w (3) English [ ] [{ }{ } { }] C1 C2 Cn E1 E1 E1 En En In Eq. 3, a Chinese sentence has n basic words; each word has many relating English synonyms; w C1 has p1 translation correspondents, wcnhas pn translation correspondents. Such many-to-many mapping leads to the difficulty of translation in granularity of word. Now, if we take the word group as the translation unit, and then the items in mapping domain will be significantly reduced.if we take phrases, combination of words or short sentence as the translation unit, then Eq. 3 will be transformed into Eq. 4: 1 2 rt V w, w,, w, w V w, w,, w,, w, w,, w (4) [ ] [{ }{ } { } { }] C1 C2 Ck Ct E1 Ek Et Et So if we can take a big part as translation unit, and record those one-to-one mapping couples into the knowledge database, in this way we can improve the automatic translation system. Next we study the construction model. The sentence is composed of different sentence elements. There is a serial of basic sentence elements in each language, and such elements have similar represent in different languages. S [ D D,, D, L ] S [ D, D,, D, L ] 1, 2 i 1 2 (5) In Eq. 5, D 1to Di and D 1to D j represent the sentence elements in Chinese and English, andl, L show the conjunction model. It has greatly improved automatic translation by collecting and recording the structure translation material into the knowledge database. For the last two items: C and T, the context relationship and the emotion in the statement depend on the whole document translation, in which difficulties still remain. j Et En

dvanced Materials Research Vols. 760-762 1945 Framework of automatic translation based on mixed strategies From the above discussion, we can find that the essence of automatic translation is the mapping and converting in pragmatic context. Currently some of the commercial translation systems are designed with the method of rule-based; some are corpus-based. oth of them have their advantages and disadvantages. Thus commercial systems have their limitations. There are some connections between the translation model and the range of its application. For assisting of English language teaching, accurate and excellent choice of words, words group and sentence construction is the focus. While commercial translation pays more attention to translation in the whole documents. For the granularity of sentence, construction by conjunctions is very important, so the system should identify their relationship with other words within the sentence, that is, how they connect words, sentences, phrases or clauses together. The key of the automatic translation rests on phrases and the grammatical relationship of functional words with other words within a sentence. So collecting, organizing, and identifying the fixed phrases as well as their translation are crucial in automatic translation. In terms of phrases, they are more similar to well-acknowledged, fixed, conventional words groups. These words groups are useful in auto translation. New words and new usages are endless so it is not workable for us to add new words and usages into the database every time. We have to resort to computer to handle this work. Therefore, setting processing mode and words groups identification rules are vital in solving the problem in auto-identification and analysis in translation tools. For the conjunctions, it is very important to identify their relationship with other words within the sentence, that is, how they connect words, sentences, phrases or clauses together. We should record these conjunction relationships with other words in the knowledge database so as to match them in sentence translation. ased on the above discussion, with the purpose of assisting English language teaching, the framework of the intelligent machine-translation system based on mixed strategies should include two parts: analysis module and translation module. The process of the analysis is illustrated in Fig.3. Fig. 3. The scheme of auto-translation based on mixed strategy.. Resources Introducing. The origin resources include dictionaries, scientific literature, translation masterpieces etc. Dictionaries, as a basis, are necessary to translation. The scientific literature usually has Chinese and English titles, key words and abstracts, which are important source to translation knowledge database. The more abundant the resources are, the richer the content of the translation database will be. Obviously it can improve the computer aided translation.. nalyze the phrases. In this phase, the sentences will be analyzed and separated into different words, phrases, and words groups. Keys words in scientific literature can be collected directly, while examples in dictionaries, abstracts in scientific literature, and translation masterpieces all have to be comparatively analyzed by translation system through English and Chinese, the two languages. For every corresponding Chinese and English sentence, we should find out every Chinese and English words matching relationship and check the content words and their adjacencies to see if some collocations or fixed or conventional words groups exist. Words groups should extend from two words combinations to all the entire possible combinations to identify all the possible words groups.

1946 Optoelectronics Engineering and Information Technologies in Industry C. nalyze the sentence. The analysis of the functional words is vital in translation of sentences. So they should be extracted and analyzed their possible relationship with other words according to what has recorded in database. fter these analyses of functional and content words in the database, the system can provide the translations of sentence. D. Intelligent identification. The key to setting up a translation database lies in the standard to identify the fixed usage and judge the correct sentence. Obviously, giving predetermined thresholds is unreasonable, for in actual application, there are so many different word combinations that it is impossible to set a unified judgment standard. Here we adopt intelligent algorithm, after statistical listing the translation results, the system sets and adjusts the threshold based on the historic records. The working process of the translation module is as follows: Sentences will be analyzed one by one. Words in the sentences will be grouped into different combinations, in which the longer words groups enjoy priority in matching in the database. If the corresponding matching can not be found, then the words groups will be separated into shorter words groups or combinations until all the words groups can find all their translations in database. t the same time, the functional words esp. conjunctions are also analyzed in accordance with their relationship with the content words in the database. fter all these, the translation of a sentence can be offered by the automatic translation system. Conclusions With the development of computer science, some influential automatic translation systems emerge. The working process of these systems is studied in the thesis. In the following, we discuss the granularity of automatic translation unit, from single word,, phrase, word groups and short sentence, and show the statement model based on vocabulary, construction model, context, and the emotional elements of the statement. Through these studies, the framework of the automatic translation system based on mixed strategy for special application of assisting English language teaching is put forward. References [1] Makoto Nagao. Machine Translation, How Far Can It Go? London: Oxford University Press, 1986 [2] W. John Hutchins et al. n introduction to machine translation. US: cademic Press, 1992 [3] W. John Hutchins. Machine translation: past, present, future. London: Ellis Horwood Limited, 1986 [4] Shuttleworth, Mark & Cowie, Moria. Dictionary of Translation Studies. Manchester: St.Jerome publishing. 1997 [5] George Yule. The Study of Language. eijing: Foreign Language Teaching & Research Press, 2000. [6] R. l-kahnjii, S. El-shiyab, R. Hussein. On the Use of Compensatory Strategies in Simultaneous Interpretation. Journal des traducteurs, 45(3), 2000. pp. 544 557 [7] F. Och, H. Ney. Systematic Comparison of Various Statistical lignment Models. Computational Linguistics, 29(1),2003. pp. 19 51 [8] P. rown, S. Chen, S. DellaPietra, et al. utomatic Speech Recognition in Machine ided Translation. Computer Speech and Language, 8(3),1994. pp. 177 87 [9] Esselink, ert. Practical Guide to Localization. msterdam: John enjamins Pub Co., 2000. [10] Feng Zhiwei. Computer Processing of Natural Languages. Shanghai: Shanghai Foreign Language & Education Press. 1996.