Dr. Ana Julia Perrotti-Garcia, DLM, FFLCH, USP São Paulo (Brazil)

Similar documents
Procedia - Social and Behavioral Sciences 141 ( 2014 ) WCLTA Using Corpus Linguistics in the Development of Writing

ROSETTA STONE PRODUCT OVERVIEW

Linguistic Variation across Sports Category of Press Reportage from British Newspapers: a Diachronic Multidimensional Analysis

English Language and Applied Linguistics. Module Descriptions 2017/18

Cross Language Information Retrieval

MYP Language A Course Outline Year 3

Turkish Vocabulary Developer I / Vokabeltrainer I (Turkish Edition) By Katja Zehrfeld;Ali Akpinar

English-German Medical Dictionary And Phrasebook By A.H. Zemback

Content Language Objectives (CLOs) August 2012, H. Butts & G. De Anda

Language Center. Course Catalog

Approved Foreign Language Courses

Procedia - Social and Behavioral Sciences 154 ( 2014 )

Literature and the Language Arts Experiencing Literature

Open Discovery Space: Unique Resources just a click away! Andy Galloway

Program Matrix - Reading English 6-12 (DOE Code 398) University of Florida. Reading

My First Spanish Phrases (Speak Another Language!) By Jill Kalz

Berlitz Swedish-English Dictionary (Berlitz Bilingual Dictionaries) By Berlitz Guides

PAGE(S) WHERE TAUGHT If sub mission ins not a book, cite appropriate location(s))

LANGUAGE IN INDIA Strength for Today and Bright Hope for Tomorrow Volume 11 : 12 December 2011 ISSN

Stefan Engelberg (IDS Mannheim), Workshop Corpora in Lexical Research, Bucharest, Nov [Folie 1] 6.1 Type-token ratio

The development of a new learner s dictionary for Modern Standard Arabic: the linguistic corpus approach

Constructing Parallel Corpus from Movie Subtitles

Prentice Hall Literature: Timeless Voices, Timeless Themes Gold 2000 Correlated to Nebraska Reading/Writing Standards, (Grade 9)

Oakland Unified School District English/ Language Arts Course Syllabus

Linguistics. Undergraduate. Departmental Honors. Graduate. Faculty. Linguistics 1

Test Blueprint. Grade 3 Reading English Standards of Learning

Author: Justyna Kowalczys Stowarzyszenie Angielski w Medycynie (PL) Feb 2015

Loughton School s curriculum evening. 28 th February 2017

Possessive have and (have) got in New Zealand English Heidi Quinn, University of Canterbury, New Zealand

Undergraduate Programs INTERNATIONAL LANGUAGE STUDIES. BA: Spanish Studies 33. BA: Language for International Trade 50

Language Acquisition Chart

Reading Grammar Section and Lesson Writing Chapter and Lesson Identify a purpose for reading W1-LO; W2- LO; W3- LO; W4- LO; W5-

Web as Corpus. Corpus Linguistics. Web as Corpus 1 / 1. Corpus Linguistics. Web as Corpus. web.pl 3 / 1. Sketch Engine. Corpus Linguistics

Let's Learn English Lesson Plan

Advanced Grammar in Use

DETECTING RANDOM STRINGS; A LANGUAGE BASED APPROACH

A Minimalist Approach to Code-Switching. In the field of linguistics, the topic of bilingualism is a broad one. There are many

Big Fish. Big Fish The Book. Big Fish. The Shooting Script. The Movie

Text-mining the Estonian National Electronic Health Record

The Internet as a Normative Corpus: Grammar Checking with a Search Engine

CROSS-LANGUAGE INFORMATION RETRIEVAL USING PARAFAC2

CALL FOR APPLICATION "Researching Public Law in Rio"/ Pesquisar Direito Público no Rio

GENERAL COMMENTS Some students performed well on the 2013 Tamil written examination. However, there were some who did not perform well.

Linking Task: Identifying authors and book titles in verbose queries

Florida Reading Endorsement Alignment Matrix Competency 1

METHODS FOR EXTRACTING AND CLASSIFYING PAIRS OF COGNATES AND FALSE FRIENDS

Universities as Laboratories for Societal Multilingualism: Insights from Implementation

Corpus Linguistics (L615)

Experience: Virtual Travel Digital Path

Review in ICAME Journal, Volume 38, 2014, DOI: /icame

Prentice Hall Literature: Timeless Voices, Timeless Themes, Platinum 2000 Correlated to Nebraska Reading/Writing Standards (Grade 10)

EUROPEAN DAY OF LANGUAGES

Target Language Preposition Selection an Experiment with Transformation-Based Learning and Aligned Bilingual Data

BULATS A2 WORDLIST 2

Tour. English Discoveries Online

The Ohio State University. Colleges of the Arts and Sciences. Bachelor of Science Degree Requirements. The Aim of the Arts and Sciences

AN INTRODUCTION (2 ND ED.) (LONDON, BLOOMSBURY ACADEMIC PP. VI, 282)

Modeling full form lexica for Arabic

Grade 4. Common Core Adoption Process. (Unpacked Standards)

Intra-talker Variation: Audience Design Factors Affecting Lexical Selections

Information for Candidates

French Dictionary: 1000 French Words Illustrated By Evelyn Goldsmith

A Case Study: News Classification Based on Term Frequency

Project in the framework of the AIM-WEST project Annotation of MWEs for translation

LA1 - High School English Language Development 1 Curriculum Essentials Document

Vocabulary (Language Workbooks) By Laurie Bauer

Problems of the Arabic OCR: New Attitudes

HIGH SCHOOL COURSE DESCRIPTION HANDBOOK

Parsing of part-of-speech tagged Assamese Texts

Creating Travel Advice

Digital Storytelling:Great Depression

ELA/ELD Standards Correlation Matrix for ELD Materials Grade 1 Reading

Common Core State Standards for English Language Arts

correlated to the Nebraska Reading/Writing Standards Grades 9-12

Dickinson ISD ELAR Year at a Glance 3rd Grade- 1st Nine Weeks

the contribution of the European Centre for Modern Languages Frank Heyworth

LEXICAL COHESION ANALYSIS OF THE ARTICLE WHAT IS A GOOD RESEARCH PROJECT? BY BRIAN PALTRIDGE A JOURNAL ARTICLE

INTERNATIONAL LANGUAGE STUDIES

Basic German: CD/Book Package (LL(R) Complete Basic Courses) By Living Language

Universiteit Leiden ICT in Business

1. Introduction. 2. The OMBI database editor

Benchmark Testing In Language Arts

Artemeva, N 2006 Approaches to Leaning Genre: a bibliographical essay. Artemeva & Freedman

New Jersey Department of Education

TEKS Correlations Proclamation 2017

University of New Orleans

Text-to-Speech Application in Audio CASI

Workshop 5 Teaching Writing as a Process

Candidates must achieve a grade of at least C2 level in each examination in order to achieve the overall qualification at C2 Level.

A heuristic framework for pivot-based bilingual dictionary induction

Arizona s English Language Arts Standards th Grade ARIZONA DEPARTMENT OF EDUCATION HIGH ACADEMIC STANDARDS FOR STUDENTS

Formative Assessment in Mathematics. Part 3: The Learner s Role

Salli Kankaanpää, Riitta Korhonen & Ulla Onkamo. Tallinn,15 th September 2016

21st CENTURY SKILLS IN 21-MINUTE LESSONS. Using Technology, Information, and Media

Specification and Evaluation of Machine Translation Toy Systems - Criteria for laboratory assignments

Section V Reclassification of English Learners to Fluent English Proficient

Chapter 5: Language. Over 6,900 different languages worldwide

Formulaic Language and Fluency: ESL Teaching Applications

Grade 11 Language Arts (2 Semester Course) CURRICULUM. Course Description ENGLISH 11 (2 Semester Course) Duration: 2 Semesters Prerequisite: None

Timeline. Recommendations

Transcription:

Dr. Ana Julia Perrotti-Garcia, DLM, FFLCH, USP São Paulo (Brazil) drajulia@gmail.com Thursday, July 25, 2013 12:00 PM - 1:00 PM EDT drajulia@gmail.com www.scientiavinces.com/ana

ATA Prof. Tony Berber-Sardinha Naomi de Moraes The use of customised corpora to improve translation accuracy : Ana Julia Perrotti-Garcia, DLM, FFLCH, USP São Paulo (Brazil) drajulia@gmail.com www.scientiavinces.com/ana

Overview of basic concepts corpus corpora made available by others corpus management tools and programs customised corpora Steps in building your customised corpora Practical examples Q&A/more practical examples drajulia@gmail.com www.scientiavinces.com/ana

source text choices target text drajulia@gmail.com www.scientiavinces.com/ana

glossaries harmony with its target reader natural precise accurate Native-like style books encyclopedias drajulia@gmail.com www.scientiavinces.com/ana

drajulia@gmail.com www.scientiavinces.com/ana versus

The main uses of a corpus are: Reference Book Publishing Dictionaries, grammar books, teaching materials, usage guides, thesauri. Linguistic Research Raw data for studying lexis, syntax, morphology, semantics, discourse analysis, stylistics, sociolinguistics... Natural language processing Programs that understand natural language, spell checking, word lists... English Language Teaching Design of syllabuses and materials, classroom reference, independent learner research. Reference material for translators 7

Reference material for translators A corpus is a systematised collection of selected texts that can be analysed with the help of specific computer programs. drajulia@gmail.com www.benvindos.com.br/drajulia

drajulia@gmail.com www.scientiavinces.com/ana

drajulia@gmail.com www.scientiavinces.com/ana

drajulia@gmail.com www.scientiavinces.com/ana

The corpus also allows you to easily limit searches by frequency and compare the frequency of words, phrases, and grammatical constructions, in at least two main ways: By genre: comparisons between spoken, fiction, popular magazines, newspapers, and academic, or even between sub-genres (or domains), such as movie scripts, sports magazines, newspaper editorials, or scientific journals Over time: compare different years from 1990 to the present time drajulia@gmail.com www.scientiavinces.com/ana

Search word: WORK FORM: CHART ORAL: drajulia@gmail.com www.scientiavinces.com/ana

KEYWORD IN CONTEXT DISPLAY Words are classified by their grammatical classes (nouns are highlighted in light blue, prepositions in yellow, etc) drajulia@gmail.com www.scientiavinces.com/ana

http://www.natcorp.ox.ac.uk/ 100 million words wide range of sources current British English spoken and written language 15

16

Corpus: BNC Search word: WORK drajulia@gmail.com www.scientiavinces.com/ana

drajulia@gmail.com www.scientiavinces.com/ana

http://americannationalcorpus.org/ It includes texts of all genres and transcripts of spoken data produced from 1990 onward It will contain a core corpus of at least 100 million words (22 million words of American English already released) It has also released an "Open" portion of the full ANC consisting of approximately 15 million words, freely available for download. 19

http://www.linguateca.pt/compara/ COMPARA is a bidirectional parallel corpus of English and Portuguese. In other words, it is a type of database with original and translated texts in these two languages that have been linked together sentence by sentence. 20

21

22

The search word is in bold Full sentences Bilingual & aligned results 23

24

drajulia@gmail.com www.scientiavinces.com/ana

drajulia@gmail.com www.scientiavinces.com/ana

H ttp://www.statmt.org/europarl/ Download source release (text files with preprocessing tools and sentence aligner), 1.3 GB tools (preprocessing tools and sentence aligner only), 8.6 KB parallel corpus Bulgarian-English, 23 MB, 01/2007-12/2010 parallel corpus Czech-English, 43 MB, 01/2007-12/2010 parallel corpus Danish-English, 164 MB, 04/1996-12/2010 parallel corpus German-English, 172 MB, 04/1996-12/2010 parallel corpus Greek-English, 125 MB, 04/1996-12/2010 parallel corpus Spanish-English, 170 MB, 04/1996-12/2010 parallel corpus Estonian-English, 41 MB, 01/2007-12/2010 parallel corpus Finnish-English, 163 MB, 01/1997-12/2010 parallel corpus French-English, 177 MB, 04/1996-12/2010 parallel corpus Hungarian-English, 43 MB, 01/2007-12/2010 parallel corpus Italian-English, 172 MB, 04/1996-12/2010 parallel corpus Lithuanian-English, 41 MB, 01/2007-12/2010 parallel corpus Latvian-English, 41 MB, 01/2007-12/2010 parallel corpus Dutch-English, 174 MB, 04/1996-12/2010 parallel corpus Polish-English, 42 MB, 01/2007-12/2010 parallel corpus Portuguese-English, 173 MB, 04/1996-12/2010 parallel corpus Romanian-English, 21 MB, 01/2007-12/2010 parallel corpus Slovak-English, 43 MB, 01/2007-12/2010 parallel corpus Slovene-English, 40 MB, 01/2007-12/2010 parallel corpus Swedish-English, 155 MB, 01/1997-12/2010 27

drajulia@gmail.com www.benvindos.com.br/drajulia

register linguistic variants types of documents genre target reader drajulia@gmail.com www.scientiavinces.com/ana

Practical comparison of the results obtained drajulia@gmail.com www.scientiavinces.com/ana

drajulia@gmail.com www.benvindos.com.br/drajulia

US English To patients US English NEJM: specialists India s National Magazine, interview with a Japonese MD. drajulia@gmail.com www.benvindos.com.br/drajulia

drajulia@gmail.com www.benvindos.com.br/drajulia

drajulia@gmail.com www.scientiavinces.com/ana step by step...

synchronic/diachronic syntopic / diatopic synstratic X diastratic synphasic / diaphasic drajulia@gmail.com www.benvindos.com.br/drajulia

drajulia@gmail.com www.scientiavinces.com/ana

www.lexically.net/wordsmith/ - http://www.antlab.sci.waseda.ac.jp/ drajulia@gmail.com www.scientiavinces.com/ana

Research. drajulia@gmail.com www.benvindos.com.br/drajulia

drajulia@gmail.com www.benvindos.com.br/drajulia

results are homogeneous there are no suspicious terms all the phrases come from pre-selected texts translator will economize time more coherent, accurate and precise results drajulia@gmail.com www.scientiavinces.com/ana

To analyse different translation options To document occurrences 41

Which option is better, this one or that one? Eg.: Gum disease or gingival disease? 42

43

44

Partial results of the search of the word disease (INOINO = text suppressed for privacy reasons) 45

I ve never seen this translated as such, are you sure it is correct? 46

47

drajulia@gmail.com www.benvindos.com.br/drajulia

drajulia@gmail.com www.benvindos.com.br/drajulia

drajulia@gmail.com www.scientiavinces.com/ana

drajulia@gmail.com www.scientiavinces.com/ana WORD LIST Sorted by Freq

drajulia@gmail.com www.scientiavinces.com/ana WORD LIST Sorted by Word

drajulia@gmail.com www.scientiavinces.com/ana WORD LIST Sorted by Word End

drajulia@gmail.com www.scientiavinces.com/ana

drajulia@gmail.com www.scientiavinces.com/ana

drajulia@gmail.com www.scientiavinces.com/ana

Our professional life as translators has good moments, but we also have difficult ones. It is important to face these issues with creativity, trying to use technology and science to helps us. I do hope this presentation will help you find new solutions for your translation challenges. drajulia@gmail.com www.benvindos.com.br/drajulia

Ana Julia Perrotti-Garcia drajulia@gmail.com. drajulia@gmail.com www.benvindos.com.br/drajulia