Corpus linguistics as a research method

Similar documents
Procedia - Social and Behavioral Sciences 141 ( 2014 ) WCLTA Using Corpus Linguistics in the Development of Writing

AN INTRODUCTION (2 ND ED.) (LONDON, BLOOMSBURY ACADEMIC PP. VI, 282)

Procedia - Social and Behavioral Sciences 154 ( 2014 )

Lexical Collocations (Verb + Noun) Across Written Academic Genres In English

Linguistic Variation across Sports Category of Press Reportage from British Newspapers: a Diachronic Multidimensional Analysis

Introduction. Beáta B. Megyesi. Uppsala University Department of Linguistics and Philology Introduction 1(48)

The Potential of Corpus-Informed L2 Pedagogy. Jonathon Reinhardt University of Arizona

VOCABULARY INSTRUCTION

Possessive have and (have) got in New Zealand English Heidi Quinn, University of Canterbury, New Zealand

English Language and Applied Linguistics. Module Descriptions 2017/18

Running head: LISTENING COMPREHENSION OF UNIVERSITY REGISTERS 1

Procedia - Social and Behavioral Sciences 98 ( 2014 ) International Conference on Current Trends in ELT

Bigrams in registers, domains, and varieties: a bigram gravity approach to the homogeneity of corpora

Learning and Retaining New Vocabularies: The Case of Monolingual and Bilingual Dictionaries

Chapter 11: Academic Discourse

Artemeva, N 2006 Approaches to Leaning Genre: a bibliographical essay. Artemeva & Freedman

Approaches to Teaching Second Language Writing Brian PALTRIDGE, The University of Sydney

Corpus Linguistics (L615)

Keynote. Developments in English for Specific Purposes Research. Brian Paltridge University of Sydney

Stefan Engelberg (IDS Mannheim), Workshop Corpora in Lexical Research, Bucharest, Nov [Folie 1] 6.1 Type-token ratio

Spoken English, TESOL and Applied Linguistics

Cross Language Information Retrieval

Web as Corpus. Corpus Linguistics. Web as Corpus 1 / 1. Corpus Linguistics. Web as Corpus. web.pl 3 / 1. Sketch Engine. Corpus Linguistics

Why PPP won t (and shouldn t) go away

The role of the first language in foreign language learning. Paul Nation. The role of the first language in foreign language learning

Ideology and corpora in two languages. Rachelle Freake Queen Mary, University of London

LEXICAL COHESION ANALYSIS OF THE ARTICLE WHAT IS A GOOD RESEARCH PROJECT? BY BRIAN PALTRIDGE A JOURNAL ARTICLE

Syntactic and Lexical Simplification: The Impact on EFL Listening Comprehension at Low and High Language Proficiency Levels

Outline. Web as Corpus. Using Web Data for Linguistic Purposes. Ines Rehbein. NCLT, Dublin City University. nclt

English Academic Word Knowledge in Tertiary Education in Sweden

Procedia - Social and Behavioral Sciences 143 ( 2014 ) CY-ICER Teacher intervention in the process of L2 writing acquisition

Program Matrix - Reading English 6-12 (DOE Code 398) University of Florida. Reading

Progressive Aspect in Nigerian English

What ESP Is and Can Be: An Introduction

MORE THAN A LINGUISTIC REFERENCE: THE INFLUENCE OF CORPUS TECHNOLOGY ON L2 ACADEMIC WRITING

BULATS A2 WORDLIST 2

The impact of E-dictionary strategy training on EFL class

TAIWANESE STUDENT ATTITUDES TOWARDS AND BEHAVIORS DURING ONLINE GRAMMAR TESTING WITH MOODLE

English for Specific Purposes Research Trends, Issues and Controversies

Linking Task: Identifying authors and book titles in verbose queries

Florida Reading Endorsement Alignment Matrix Competency 1

Secondary English-Language Arts

THE VERB ARGUMENT BROWSER

Constructing Parallel Corpus from Movie Subtitles

A Comparative Study of Research Article Discussion Sections of Local and International Applied Linguistic Journals

International Conference on Education and Educational Psychology (ICEEPSY 2012)

The impact of using electronic dictionary on vocabulary learning and retention of Iranian EFL learners

Modeling full form lexica for Arabic

UCLA Issues in Applied Linguistics

Literature and the Language Arts Experiencing Literature

Vocabulary (Language Workbooks) By Laurie Bauer

CONTENUTI DEL CORSO (presentazione di disciplina, argomenti, programma):

By. Candra Pantura Panlaysia Dr. CH. Evy Tri Widyahening, S.S., M.Hum Slamet Riyadi University Surakarta ABSTRACT

Derivational and Inflectional Morphemes in Pak-Pak Language

Using Moodle in ESOL Writing Classes

and secondary sources, attending to such features as the date and origin of the information.

On document relevance and lexical cohesion between query terms

Levelling-out and register variation in the translations of experienced and inexperienced translators: a corpus-based study 1

5. UPPER INTERMEDIATE

Text and task authenticity in the EFL classroom

EUROPEAN DAY OF LANGUAGES

A typical day at Trebinshun

Specification and Evaluation of Machine Translation Toy Systems - Criteria for laboratory assignments

A corpus-based sociolinguistic study of amplifiers in British English

The Journal of Specialised Translation Issue 10 - July 2008

Target Language Preposition Selection an Experiment with Transformation-Based Learning and Aligned Bilingual Data

CELTA. Syllabus and Assessment Guidelines. Third Edition. University of Cambridge ESOL Examinations 1 Hills Road Cambridge CB1 2EU United Kingdom

Review in ICAME Journal, Volume 38, 2014, DOI: /icame

Laporan Penelitian Unggulan Prodi

Investigating the Effectiveness of the Uses of Electronic and Paper-Based Dictionaries in Promoting Incidental Word Learning

HARPER ADAMS UNIVERSITY Programme Specification

HOW TO RAISE AWARENESS OF TEXTUAL PATTERNS USING AN AUTHENTIC TEXT

The Language of Football England vs. Germany (working title) by Elmar Thalhammer. Abstract

The Use of Drama and Dramatic Activities in English Language Teaching

TITLE: Shakespeare: The technical words. DATE(S): Project will run for four weeks during June or July

Prentice Hall Literature: Timeless Voices, Timeless Themes Gold 2000 Correlated to Nebraska Reading/Writing Standards, (Grade 9)

REVIEW OF CONNECTED SPEECH

Second Language Acquisition in Adults: From Research to Practice

Intra-talker Variation: Audience Design Factors Affecting Lexical Selections

DOES RETELLING TECHNIQUE IMPROVE SPEAKING FLUENCY?

CaMLA Working Papers

Advanced Grammar in Use

Lexical Trends in Young Adult Literature: A Corpus-Based Approach

Guidelines for Writing an Internship Report

Prentice Hall Literature: Timeless Voices, Timeless Themes, Platinum 2000 Correlated to Nebraska Reading/Writing Standards (Grade 10)

A Case Study: News Classification Based on Term Frequency

Trend Survey on Japanese Natural Language Processing Studies over the Last Decade

CONTENT KNOWLEDGE IN TEACHER EDUCATION: WHERE PROFESSIONALISATION LIES

The Effect of Discourse Markers on the Speaking Production of EFL Students. Iman Moradimanesh

Teaching ideas. AS and A-level English Language Spark their imaginations this year

The Use of Lexical Cohesion in Reading and Writing

Integrating culture in teaching English as a second language

A Note on Structuring Employability Skills for Accounting Students

The following information has been adapted from A guide to using AntConc.

LING 329 : MORPHOLOGY

To appear in The TESOL encyclopedia of ELT (Wiley-Blackwell) 1 RECASTING. Kazuya Saito. Birkbeck, University of London

Project in the framework of the AIM-WEST project Annotation of MWEs for translation

Mandarin Lexical Tone Recognition: The Gating Paradigm

LANGUAGE IN INDIA Strength for Today and Bright Hope for Tomorrow Volume 11 : 12 December 2011 ISSN

Improved Effects of Word-Retrieval Treatments Subsequent to Addition of the Orthographic Form

Transcription:

Corpus linguistics as a research method 18 th Seminar on 12 th April 2016 Institute for Research in Digital Culture and Humanities Open University of Hong Kong Althea Ha Centre for Applied English Studies The University of Hong Kong

Outline What a corpus is Why we use corpora in linguistic research Different types of corpora Considerations when using/building a corpus Text analytical tools A corpus-based lexical study Academic Word List (Coxhead, 2000) What corpus linguistics is 2

What a corpus is A corpus is a collection of pieces of language text in electronic form (Sinclair, 2004, p. 19). Text is natural language used for communication, whether it is realised in speech or writing (Biber & Conrad, 2009, p. 5). 3

Why corpora in linguistic research different aspects of linguistics such as lexis, registers, and genres discipline-specific vocabulary written vs. spoken academic writing new constructions in English new words / phrases new senses attached to existing words 4

Why corpora in linguistic research But how? intuition? existing prescriptive rules? a number of authentic texts? 5

Why corpora in linguistic research A corpus is a more reliable guide to language use than native speaker intuition is (Hunston, 2002, p. 20). It is hard to imagine any area of vocabulary research into acquisition, processing, pedagogy, or assessment where the insights available from corpus analysis would not be valuable (Schmitt, 2010, p.307). 6

Why corpora in linguistic research Corpora and EAP are perfect companions corpora bring evidence of typical patterning and salient features to the study of academic discourse, providing data which represent a speaker s experience of language in a restricted domain (Hyland, 2009, p. 317). 7

Why corpora in linguistic research corpora electronically processed in text analytical tools useful statistical information such as number of word types, frequency, cooccurrences comparison between corpora 8

Different types of corpora general English spoken English national varieties of English academic English languages other than English parallel monolingual (Schmitt, 2010) 9

Major corpora available to date British Naitonal Corpus (BNC) (100 million tokens) Collins WordBanks Online (WordBanks) (553 million tokens) Longman Corpus Network (330 million tokens) Cambridge English Corpus (1.5 billion tokens) Cambridge and Nottingham Corpus of Discourse in English (CANCODE) Cambridge and Nottingham Spoken Business English (CANBEC) Cambridge Corpus of Business English Cambridge Corpus of Financial English 10

Considerations in using/building a corpus Matching corpus data with the research purpose is the most crucial consideration in the design of a corpus ( Koester, 2009; McEnery & Hardie, 2012). 11

Considerations in using/building a corpus Purpose of your research Criteria of the target corpus Survey of existing corpora Existing/self-built corpus Categories of texts in the existing corpus Structure and size of the self-built corpus 12

Sinclair s (2004) criteria Mode Type Domain Language(s)/ Language varieties Location Date 13

What corpus to use survey of existing corpora accessible? free / at a cost? categorisation of texts read descriptions and guidelines built-in tools in the website 14

Sinclair s (2004) steps towards a representative corpus structural criteria principal corpus components; available text types for each component rank the text types in terms of importance estimate the size of the corpus number of text types number of texts total number of words compare the self-built corpus with the original planned corpus 15

Caveat no perfect corpus even though it is huge and carefully designed never have exactly the same characteristics as the language itself as representative as possible (Sinclair, 2004) 16

Text analytical tools Existing web-based corpora built-in tools in the website search concordances of a word search collocates of a word search and compare the frequency of words and phrases in different genres Self-built corpora WordSmith Tools (Scott, 2012) Range (Heatley, Nation, & Coxhead, 2002) 17

Text analytical tools Web-based mega corpus Wordbanks at http://www.collins.co.uk/page/wordbanks+onl ine Web-based text analytical tool VocabProfilers at http://www.lextutor.ca/vp/eng/ Desktop application WordSmith Tools 6.0 (Scott, 2012) 18

19

20

21

22

23

24

25

Text analytical tools 26

VocabProfilers 27

Text analytical tools 28

29

30

31

Academic Word List (Coxhead, 2000) Research purpose to develop and evaluate a new academic word list Factors considered in building the Academic Corpus Representation Organization Size Word selection 32

Academic Word List (Coxhead, 2000) Representation not only textbooks, but also a range of academic texts 158 journal articles (print) 51 edited journal articles (online) 43 complete university textbooks or course books 42 texts from the Learned and Scientific section of the Wellington Corpus of Written English (Bauer, 1993) etc. 33

Academic Word List (Coxhead, 2000) Organization 4 disciplines arts, commerce, law, science 28 subject areas 34

Academic Word List (Coxhead, 2000) Size 3.5 million running words so as to identify 100 occurrences of a word family Coxhead referred to the data from Brown Corpus (Francis & Kucera, 1982) 35

Academic Word List (Coxhead, 2000) Cohead, 2000, p. 220 36

Academic Word List (Coxhead, 2000) Word selection What a word is morphologically different words (e.g. s and ed) word types word families [a] word family was defined as a stem plus all closely related affixed forms (Coxhead, 2000, p. 218) 37

Taking analyse as an example regular inflections analysed, analysing, analyses derivations analyser, analysers, analysis, analyst, analysts, analytic, analytical, analytically American spelling analyze, analyzed, analyzes, analyzing 38

Academic Word List (Coxhead, 2000) Methods Range (Heatley & Nation, 1996) Criteria for a member of a word family Specialised Occurrence excluding 2000 most frequent words Range occurs at least 10 times in each discipline occurs in 15 or more subject areas (out of 28) Frequency occurs at least 100 times in the Academic Corpus 39

Academic Word List (Coxhead, 2000) Results 570 word families 12% word coverage for commerce 9.3% word coverage for arts 9.4% word coverage for law 9.1% word coverage for science Average 10% word coverage for academic texts 40

Corpus linguistics Corpus linguistics is a research approach that has developed over the past several decades to support empirical investigations of language variation and use, resulting in research findings that have much greater generalizability and validity than would otherwise be feasible (Biber, Reppen, & Friginal, 2010, p. 548). 41

Corpus linguistics Corpus linguistics involves dealing with some set of machine-readable texts which is deemed an appropriate basis on which to study a specific set of research questions (McEnery & Hardie, 2012, p. 1). a methodological approach rather than a model of language (Biber, Conrad, & Reppen, 1998, p. 4) 42

Corpus linguistics it is empirical, analyzing the actual patterns of use in natural texts; it utilizes a large and principled collection of natural texts, known as a corpus as the basis for analysis; it makes extensive use of computers for analysis, employing both automatic and interactive techniques; it depends on both quantitative and qualitative analytical techniques. (Biber, Conrad, & Reppen, 1998, p. 4) 43

List of References Bauer, L., & Nation, I. S. P. (1993). Word families. International Journal of Lexicography, 6(3), 1 27. Biber, D., Conrad, S., & Reppen, R. (1998). Corpus linguistics: Investigating language structure and use. Cambridge: Cambridge University Press. Biber, D. E., Reppen, R., & Friginal, E. (2010). Research in corpus linguistics. In R. B. Kaplan (Ed.), The Oxford Handbook of Applied Linguistics (pp. 548-570). New York: Oxford University Press. Cheng, W. (2012). Exploring corpus linguistics: Language in action. New York: Routledge. Coxhead, A. (2000). A new academic word list. TESOL Quarterly, 34(2), 213-238. Heatley, A., Nation, I.S.P. and Coxhead, A. (2002). Range and Frequency programmes. http://www.victoria.ac.nz/lals/about/staff/paul-nation Hyland, K. (2009). Corpora and EAP: Specificity in disciplinary discourses. In Goźdź-Roszkowski (Ed.), Explorations across languages and corpora (pp. 317-334). Frankfurt am Main: Peter Lang. Koester, A. (2010). Building small specialised corpora. In A. O Keeffe and M. McCarthy (Eds.), The Routledge handbook of corpus linguistics (pp. 66-79). London: Routledge. McEnery, T. & Hardie, A. (2012). Corpus linguistics. Cambridge: Cambridge University Press. Schmitt, N. (2010). Research vocabulary: A vocabulary research manual. Basingstoke: Palgrave Macmillan. Scott, M. (2012). WordSmith tools version 6. Liverpool: Lexical Analysis Software Ltd. Sinclair, J. M. (2004). Corpus and text: Basic principles. In M. Wynne (Ed.), Developing linguistic corpora: A guide to good practice. Oxford: Oxbow Books. Retrieved from http://ota.ahds.ac.uk/documents/creating/dlc/chapter1.htm#section3 44

Thank you for your attention. 45