1 Searching an existing corpus

Size: px
Start display at page:

Download "1 Searching an existing corpus"

Transcription

1 Advanced Research Methods 2010 Mick O Donnell Week 2: Annotating, Searching and Analysing Corpora This week we will explore the use of linguistic corpora, both in terms of using a corpus you collect yourself, and also using large online corpora. 1 Searching an existing corpus We will start with how to use large existing corpora, and what to do with them. One of the classic corpora of this kind is the British National Corpora, consisting of 100 million words of various registers, collected in the period Fortunately, Mark Davies at Brigham Young University in the States has made a search interface to this corpus available over the web. Link: The webpage for this site is shown in Figure 1.. Figure 1: The BYU BNC Search Interface 1

2 1.1 A Basic Search To get started: a) Under Display, select Chart b) In the words space, put however c) Press the Search button. On the right, you should now see a bar chart, which shows how many occurrences of however were found per million words over the 5 genres included in the BNC. Note that the occurrence is far higher in Academic writing than any other genre. It is lowest in the spoken section of the corpus. Click on any of the bars in the bar chart, and to the right will be shown details about the hits in this corpus, and the corpus size. 1.2 Wildcard Search Now, change the search string to the* (note the * ) and change the Display option to List, and press Search. The table on the right now displays the most frequent 100 words which start with the, showing how often the word occurs in the corpus. Note the word the occurs much more frequently than any other word: it is the most common word in the English language. To verify this, change the search string to simply * and press search again. What is the second most common token in the corpus? What is the second most common word? What is the most common open-class word? Note that the * can appear anywhere in a search string. Try searching for con*ert * will match any number of characters (0 or more). If you want to match just 1 character, use? instead. Try searching for con?ert. See the difference in the results? 1.3 POS Search Rather than searching for words themselves, we can also search for word classes. a) Click on the POS List label in the search space. b) From the list, select verb.be. This will insert [vb*] in the search string space, which is the search pattern for all inflections of the verb to be. c) From the POS list, select verb.en. This will insert [v?n*] in the search string space, which is the pattern for all past participle verbs. d) Press Search. You will be told that there are too many hits for both search terms to show you the results. So let s be more specific. Put I in front of the search string, e.g., I [vb*] [v?n*] Now, search. Then switch to Chart mode. What can you say about the contexts in which passive clauses with a first person subject are used? Why is Fiction as strong as Spoken on this feature? Do you think this is due to the passive structure? or first person Subject? How would you verify this hypothesis? Other online search sites: Cobuild Collins: 2

3 2 Means of analyzing an unannotated corpus This section summarises the various ways you can analyse an unannotated corpus (whether your own texts or an existing one). 2.1 Simple Search counts The kinds of searches we saw above return hit counts (how many were found) and more useful, hit rate (hits per million words). By comparing the hit rates in two different subcorpora (e.g., spoken vs. written), we can the corpora differ. 2.2 Concordance A concordance is a list of all occurrences of a each word in a corpus. For example, see Figure 2. Figure 2: An example Concordance This concordance is shown in KWIC format ( Key Word In Context ). Used for: By showing you all occurrences of a word in context, one can explore the grammatical contexts for the word to appear, or the different meanings it can carry. Tools: You can perform concordance studies on your own texts. Some tools: MicroConcord: Wordsmith Tools ( may have automatic POS tagging. Not free. Concordance: UAM CorpusTool (free, automatic POS tagging for English) Concordancing with UAM CorpusTool ( allows concordance matching of your own corpus of texts. The texts do not need to be part of speech tagged: it uses a large lexicon of English to identify the part of speech of each word on the fly. Below is a diagram of a search for again any be verb followed immediately by a participle verb. 3

4 The search language is as follows (these operations are mostly available in other systems): CONCORDANCE PATTERNS: Literal token: the, however. Wildcard token: ca*, *ed, bro*en. # matches any single word. LEXICON BASED MATCHING (currently only matches any noun matches any verb matches any verb which is classified as matches noun classified as human-noun ca*@noun matches nouns starting with ca. *ing@mental-verb matches mental-verb verbs ending with ing. break% matches all inflections of root break red% matches red, reds (noun), redder (adj), reddest red%noun matches red, reds These instructions can be found under the Help window. A full list of word classes can be found by selecting Show Wordclass network from the Misc menu at the top of the screen. Examples: Find all passive clauses: be% Find modalised Find negated never 4

5 2.3 Frequency Lists Concordance tools like Wordsmith will provide you with a list of the words in your corpus, sorted for overall frequency (most frequent first), as a count, and as a percentage of all words. Used for: One can, for instance, work out what percentage of the whole corpus is accounted for by the first 1000 words. Such lists are useful for teaching, as they can highlight the words which need to be taught in a particular genre or field (e.g., what are the words most important for an engineer to know?) Tools: Wordsmith 2.4 Keywords The top words in any frequency list for English will be words such as the, of and a. A more informative listing works out how important each word is for a particular corpus, when compared with a more general corpus. For instance, the keywords from a corpus split over three fields are shown below. The words are ordered in terms of their specialness for this corpus (relative frequency in this corpus when compared to the relative frequency in the general corpus). A value of 100 indicates the word appears 100 times more in this corpus than in other corpora. Military Economics Crime economy companies stock tax cuts 85.0 profits 80.0 investment 75.0 billion 75.0 returns 70.0 sales 70.0 earnings 65.0 investors 65.0 jobs 65.0 package 65.0 assets 65.0 prices 60.0 bill 60.0 corporate 60.0 stocks markets 55.0 budget 55.0 finance 50.0 volatility 50.0 reforms 45.0 commercial 40.0 temporary 40.0 cent analysts growth troops weapons engine mountains smoke 90.0 gulf 85.0 enemy 85.0 aircraft 80.0 force 70.0 civilians 70.0 civilian 70.0 guys 65.0 military squadron 60.0 suicide 55.0 tanks 55.0 soldier 55.0 jungle 55.0 altitude 55.0 strikes 55.0 trees 55.0 lieutenant 55.0 withdrawal 55.0 missile 55.0 bomber 50.0 invasion 50.0 combat 50.0 rounds 50.0 missions 45.0 crime detective 50.0 police disappearance 40.0 criminal court justice driver boy victims 18.6 family child car lived officers legal children kids 9.3 mercy 9.3 investigators 9.3 woman 9.01 murder 8.52 boys 8.52 age 7.77 victim 6.64 street 6.27 body 6.22 incident 5.98 Used for: even more than raw frequency lists, a keyword list highlights the words which are important for teaching a particular genre/register. A word like claims might occur frequently in an economic corpus, but it is a word the language learner will meet in lots of other contexts. Tools: WordSmith, UAM CorpusTool 5

6 2.5 Collocations You might be interested in what words collocate (occur with) a particular word. For instance, what are the collocations of base? A good tool for this (30 day free trial) is called Sketch Engine ( (see Figure 2) Used For: very useful for learning the uses of a word, and also if one is building a dictionary. Figure 2: A Sketch Engine description of challenge 6

7 2.6 N-Grams (multi-word phrases, lexical bundles) Rather than looking at single-words, n-gram analysis looks for sequences of words which are common in the corpus. For instance, a list of the frequent 3-grams (sequence of 3 words) that occur in a small corpus of introductions to academic papers are shown below: in terms of 12 a set of 11 in this paper 10 the performance of 7 of the two 7 be able to 7 a number of 7 the design of 7 which can be 7 the problem of 6 ad hoc networks 6 we believe that 6 of this paper 6 terms of a 5 in section 4 5 some of the 5 in order to 5 large number of 5 that can be 5 ad hoc network 5 According to Biber (e.g., Biber and Barbieri 2007), as the corpus grows to a reasonable size (millions of words), the kinds of phrases that raise to the top don t contain lexical content as such (e.g., ad hoc networks ). Rather, they are phrases which are used to frame such meanings. We see here: in terms of, a set of, etc. Used for: while keywords tell us which words we should teach in a text, n-grams can tell us which phrasings are usefully taught. For instance, assuming we were teaching students how to write introduction sections to academic papers, we collect a corpus of such texts and produce the key n-grams for various lengths. From such a corpus, we can pick up frequent phrases such as this paper reports on or this paper/article is organized as follows. Tools: Wordsmith, UAM CorpusTool 3 Annotating your own corpus 3.1 Annotation Software Where search for words and POS is not enough, you can annotate your own text files with linguistic (or other) categories, and use your annotated corpus to perform statistical studies. You need to provide your text (or audio or video), and load them up in the annotation tool. Some annotation tools include: Text Annotation: UAM CorpusTool ( allows you to annotate multiple texts at multiple linguistic levels MMAX-2 ( annotate single text files. Worse graphical interface than CorpusTool, but allows you to annotate reference chains. Knowtator ( easy to use. Not a lot of features. Audio Annotation: Transcriber ( EXMARaLDA ( Video Annotation: Anvil ( 7

8 3.2 Annotating with UAM CorpusTool UAM CorpusTool allows you to annotate multiple texts at multiple linguistic levels, you provide the coding scheme at each level, such as that below: To annotate a text, you simply select some text, and then choose the features to assign to this segment from those presented below. See the CorpusTool manual for instructions on manual annotation of your corpus (under the Help menu at the top of the screen). 8

9 The tool also allows you to search your annotated corpus for tagged segments: 4 Statistical Analysis of an annotated corpus See the CorpusTool manual for instructions on performing statistical analyses of your annotated corpus (under the Help menu at the top of the screen). 5 References Biber, Douglas and Federica Barbieri 2007 Lexical bundles in university spoken and written registers. English for Specific Purposes 26 (2007)

Procedia - Social and Behavioral Sciences 141 ( 2014 ) WCLTA Using Corpus Linguistics in the Development of Writing

Procedia - Social and Behavioral Sciences 141 ( 2014 ) WCLTA Using Corpus Linguistics in the Development of Writing Available online at www.sciencedirect.com ScienceDirect Procedia - Social and Behavioral Sciences 141 ( 2014 ) 124 128 WCLTA 2013 Using Corpus Linguistics in the Development of Writing Blanka Frydrychova

More information

Web as Corpus. Corpus Linguistics. Web as Corpus 1 / 1. Corpus Linguistics. Web as Corpus. web.pl 3 / 1. Sketch Engine. Corpus Linguistics

Web as Corpus. Corpus Linguistics. Web as Corpus 1 / 1. Corpus Linguistics. Web as Corpus. web.pl 3 / 1. Sketch Engine. Corpus Linguistics (L615) Markus Dickinson Department of Linguistics, Indiana University Spring 2013 The web provides new opportunities for gathering data Viable source of disposable corpora, built ad hoc for specific purposes

More information

Linguistic Variation across Sports Category of Press Reportage from British Newspapers: a Diachronic Multidimensional Analysis

Linguistic Variation across Sports Category of Press Reportage from British Newspapers: a Diachronic Multidimensional Analysis International Journal of Arts Humanities and Social Sciences (IJAHSS) Volume 1 Issue 1 ǁ August 216. www.ijahss.com Linguistic Variation across Sports Category of Press Reportage from British Newspapers:

More information

The following information has been adapted from A guide to using AntConc.

The following information has been adapted from A guide to using AntConc. 1 7. Practical application of genre analysis in the classroom In this part of the workshop, we are going to analyse some of the texts from the discipline that you teach. Before we begin, we need to get

More information

Context Free Grammars. Many slides from Michael Collins

Context Free Grammars. Many slides from Michael Collins Context Free Grammars Many slides from Michael Collins Overview I An introduction to the parsing problem I Context free grammars I A brief(!) sketch of the syntax of English I Examples of ambiguous structures

More information

DegreeWorks Advisor Reference Guide

DegreeWorks Advisor Reference Guide DegreeWorks Advisor Reference Guide Table of Contents 1. DegreeWorks Basics... 2 Overview... 2 Application Features... 3 Getting Started... 4 DegreeWorks Basics FAQs... 10 2. What-If Audits... 12 Overview...

More information

Linking Task: Identifying authors and book titles in verbose queries

Linking Task: Identifying authors and book titles in verbose queries Linking Task: Identifying authors and book titles in verbose queries Anaïs Ollagnier, Sébastien Fournier, and Patrice Bellot Aix-Marseille University, CNRS, ENSAM, University of Toulon, LSIS UMR 7296,

More information

Corpus Linguistics (L615)

Corpus Linguistics (L615) (L615) Basics of Markus Dickinson Department of, Indiana University Spring 2013 1 / 23 : the extent to which a sample includes the full range of variability in a population distinguishes corpora from archives

More information

Procedia - Social and Behavioral Sciences 154 ( 2014 )

Procedia - Social and Behavioral Sciences 154 ( 2014 ) Available online at www.sciencedirect.com ScienceDirect Procedia - Social and Behavioral Sciences 154 ( 2014 ) 263 267 THE XXV ANNUAL INTERNATIONAL ACADEMIC CONFERENCE, LANGUAGE AND CULTURE, 20-22 October

More information

Enhancing Unlexicalized Parsing Performance using a Wide Coverage Lexicon, Fuzzy Tag-set Mapping, and EM-HMM-based Lexical Probabilities

Enhancing Unlexicalized Parsing Performance using a Wide Coverage Lexicon, Fuzzy Tag-set Mapping, and EM-HMM-based Lexical Probabilities Enhancing Unlexicalized Parsing Performance using a Wide Coverage Lexicon, Fuzzy Tag-set Mapping, and EM-HMM-based Lexical Probabilities Yoav Goldberg Reut Tsarfaty Meni Adler Michael Elhadad Ben Gurion

More information

SINGLE DOCUMENT AUTOMATIC TEXT SUMMARIZATION USING TERM FREQUENCY-INVERSE DOCUMENT FREQUENCY (TF-IDF)

SINGLE DOCUMENT AUTOMATIC TEXT SUMMARIZATION USING TERM FREQUENCY-INVERSE DOCUMENT FREQUENCY (TF-IDF) SINGLE DOCUMENT AUTOMATIC TEXT SUMMARIZATION USING TERM FREQUENCY-INVERSE DOCUMENT FREQUENCY (TF-IDF) Hans Christian 1 ; Mikhael Pramodana Agus 2 ; Derwin Suhartono 3 1,2,3 Computer Science Department,

More information

EdIt: A Broad-Coverage Grammar Checker Using Pattern Grammar

EdIt: A Broad-Coverage Grammar Checker Using Pattern Grammar EdIt: A Broad-Coverage Grammar Checker Using Pattern Grammar Chung-Chi Huang Mei-Hua Chen Shih-Ting Huang Jason S. Chang Institute of Information Systems and Applications, National Tsing Hua University,

More information

Stefan Engelberg (IDS Mannheim), Workshop Corpora in Lexical Research, Bucharest, Nov [Folie 1] 6.1 Type-token ratio

Stefan Engelberg (IDS Mannheim), Workshop Corpora in Lexical Research, Bucharest, Nov [Folie 1] 6.1 Type-token ratio Content 1. Empirical linguistics 2. Text corpora and corpus linguistics 3. Concordances 4. Application I: The German progressive 5. Part-of-speech tagging 6. Fequency analysis 7. Application II: Compounds

More information

The Internet as a Normative Corpus: Grammar Checking with a Search Engine

The Internet as a Normative Corpus: Grammar Checking with a Search Engine The Internet as a Normative Corpus: Grammar Checking with a Search Engine Jonas Sjöbergh KTH Nada SE-100 44 Stockholm, Sweden jsh@nada.kth.se Abstract In this paper some methods using the Internet as a

More information

The Roaring 20s. History. igcse Examination Technique. Paper 2. International Organisations. September 2015 onwards

The Roaring 20s. History. igcse Examination Technique. Paper 2. International Organisations. September 2015 onwards History The Roaring 20s igcse Examination Technique Paper 2 International Organisations September 2015 onwards 1 Assessment Overview Paper 2 50% of total igcse marks 90 minutes Historical investigation

More information

Outreach Connect User Manual

Outreach Connect User Manual Outreach Connect A Product of CAA Software, Inc. Outreach Connect User Manual Church Growth Strategies Through Sunday School, Care Groups, & Outreach Involving Members, Guests, & Prospects PREPARED FOR:

More information

WiggleWorks Software Manual PDF0049 (PDF) Houghton Mifflin Harcourt Publishing Company

WiggleWorks Software Manual PDF0049 (PDF) Houghton Mifflin Harcourt Publishing Company WiggleWorks Software Manual PDF0049 (PDF) Houghton Mifflin Harcourt Publishing Company Table of Contents Welcome to WiggleWorks... 3 Program Materials... 3 WiggleWorks Teacher Software... 4 Logging In...

More information

2/15/13. POS Tagging Problem. Part-of-Speech Tagging. Example English Part-of-Speech Tagsets. More Details of the Problem. Typical Problem Cases

2/15/13. POS Tagging Problem. Part-of-Speech Tagging. Example English Part-of-Speech Tagsets. More Details of the Problem. Typical Problem Cases POS Tagging Problem Part-of-Speech Tagging L545 Spring 203 Given a sentence W Wn and a tagset of lexical categories, find the most likely tag T..Tn for each word in the sentence Example Secretariat/P is/vbz

More information

Lexical Collocations (Verb + Noun) Across Written Academic Genres In English

Lexical Collocations (Verb + Noun) Across Written Academic Genres In English Available online at www.sciencedirect.com ScienceDirect Procedia - Social and Behavioral Sciences 182 ( 2015 ) 433 440 4th WORLD CONFERENCE ON EDUCATIONAL TECHNOLOGY RESEARCHES, WCETR- 2014 Lexical Collocations

More information

Project in the framework of the AIM-WEST project Annotation of MWEs for translation

Project in the framework of the AIM-WEST project Annotation of MWEs for translation Project in the framework of the AIM-WEST project Annotation of MWEs for translation 1 Agnès Tutin LIDILEM/LIG Université Grenoble Alpes 30 october 2014 Outline 2 Why annotate MWEs in corpora? A first experiment

More information

MOODLE 2.0 GLOSSARY TUTORIALS

MOODLE 2.0 GLOSSARY TUTORIALS BEGINNING TUTORIALS SECTION 1 TUTORIAL OVERVIEW MOODLE 2.0 GLOSSARY TUTORIALS The glossary activity module enables participants to create and maintain a list of definitions, like a dictionary, or to collect

More information

The development of a new learner s dictionary for Modern Standard Arabic: the linguistic corpus approach

The development of a new learner s dictionary for Modern Standard Arabic: the linguistic corpus approach BILINGUAL LEARNERS DICTIONARIES The development of a new learner s dictionary for Modern Standard Arabic: the linguistic corpus approach Mark VAN MOL, Leuven, Belgium Abstract This paper reports on the

More information

Advanced Grammar in Use

Advanced Grammar in Use Advanced Grammar in Use A self-study reference and practice book for advanced learners of English Third Edition with answers and CD-ROM cambridge university press cambridge, new york, melbourne, madrid,

More information

Let's Learn English Lesson Plan

Let's Learn English Lesson Plan Let's Learn English Lesson Plan Introduction: Let's Learn English lesson plans are based on the CALLA approach. See the end of each lesson for more information and resources on teaching with the CALLA

More information

Storytelling Made Simple

Storytelling Made Simple Storytelling Made Simple Storybird is a Web tool that allows adults and children to create stories online (independently or collaboratively) then share them with the world or select individuals. Teacher

More information

Name: Class: Date: ID: A

Name: Class: Date: ID: A Name: Class: _ Date: _ Test Review Multiple Choice Identify the choice that best completes the statement or answers the question. 1. Members of a high school club sold hamburgers at a baseball game to

More information

Chunk Parsing for Base Noun Phrases using Regular Expressions. Let s first let the variable s0 be the sentence tree of the first sentence.

Chunk Parsing for Base Noun Phrases using Regular Expressions. Let s first let the variable s0 be the sentence tree of the first sentence. NLP Lab Session Week 8 October 15, 2014 Noun Phrase Chunking and WordNet in NLTK Getting Started In this lab session, we will work together through a series of small examples using the IDLE window and

More information

Derivational: Inflectional: In a fit of rage the soldiers attacked them both that week, but lost the fight.

Derivational: Inflectional: In a fit of rage the soldiers attacked them both that week, but lost the fight. Final Exam (120 points) Click on the yellow balloons below to see the answers I. Short Answer (32pts) 1. (6) The sentence The kinder teachers made sure that the students comprehended the testable material

More information

Using dialogue context to improve parsing performance in dialogue systems

Using dialogue context to improve parsing performance in dialogue systems Using dialogue context to improve parsing performance in dialogue systems Ivan Meza-Ruiz and Oliver Lemon School of Informatics, Edinburgh University 2 Buccleuch Place, Edinburgh I.V.Meza-Ruiz@sms.ed.ac.uk,

More information

Natural Language Processing. George Konidaris

Natural Language Processing. George Konidaris Natural Language Processing George Konidaris gdk@cs.brown.edu Fall 2017 Natural Language Processing Understanding spoken/written sentences in a natural language. Major area of research in AI. Why? Humans

More information

Variation of English passives used by Swedes

Variation of English passives used by Swedes School of Language and Literature G3, Bachelor s course English Linguistics Course code: 2EN10E Supervisor: Mikko Laitinen Credits: 15 Examiner: Ibolya Maricic Date: 18 January, 2014 Variation of English

More information

INSTRUCTOR USER MANUAL/HELP SECTION

INSTRUCTOR USER MANUAL/HELP SECTION Criterion INSTRUCTOR USER MANUAL/HELP SECTION ngcriterion Criterion Online Writing Evaluation June 2013 Chrystal Anderson REVISED SEPTEMBER 2014 ANNA LITZ Criterion User Manual TABLE OF CONTENTS 1.0 INTRODUCTION...3

More information

Teacher: Mlle PERCHE Maeva High School: Lycée Charles Poncet, Cluses (74) Level: Seconde i.e year old students

Teacher: Mlle PERCHE Maeva High School: Lycée Charles Poncet, Cluses (74) Level: Seconde i.e year old students I. GENERAL OVERVIEW OF THE PROJECT 2 A) TITLE 2 B) CULTURAL LEARNING AIM 2 C) TASKS 2 D) LINGUISTICS LEARNING AIMS 2 II. GROUP WORK N 1: ROUND ROBIN GROUP WORK 2 A) INTRODUCTION 2 B) TASK BASED PLANNING

More information

Once your credentials are accepted, you should get a pop-window (make sure that your browser is set to allow popups) that looks like this:

Once your credentials are accepted, you should get a pop-window (make sure that your browser is set to allow popups) that looks like this: SCAIT IN ARIES GUIDE Accessing SCAIT The link to SCAIT is found on the Administrative Applications and Resources page, which you can find via the CSU homepage under Resources or click here: https://aar.is.colostate.edu/

More information

Introduction. Beáta B. Megyesi. Uppsala University Department of Linguistics and Philology Introduction 1(48)

Introduction. Beáta B. Megyesi. Uppsala University Department of Linguistics and Philology Introduction 1(48) Introduction Beáta B. Megyesi Uppsala University Department of Linguistics and Philology beata.megyesi@lingfil.uu.se Introduction 1(48) Course content Credits: 7.5 ECTS Subject: Computational linguistics

More information

Create Quiz Questions

Create Quiz Questions You can create quiz questions within Moodle. Questions are created from the Question bank screen. You will also be able to categorize questions and add them to the quiz body. You can crate multiple-choice,

More information

Tour. English Discoveries Online

Tour. English Discoveries Online Techno-Ware Tour Of English Discoveries Online Online www.englishdiscoveries.com http://ed242us.engdis.com/technotms Guided Tour of English Discoveries Online Background: English Discoveries Online is

More information

Beyond the Blend: Optimizing the Use of your Learning Technologies. Bryan Chapman, Chapman Alliance

Beyond the Blend: Optimizing the Use of your Learning Technologies. Bryan Chapman, Chapman Alliance 901 Beyond the Blend: Optimizing the Use of your Learning Technologies Bryan Chapman, Chapman Alliance Power Blend Beyond the Blend: Optimizing the Use of Your Learning Infrastructure Facilitator: Bryan

More information

CORPUS ANALYSIS CORPUS ANALYSIS QUANTITATIVE ANALYSIS

CORPUS ANALYSIS CORPUS ANALYSIS QUANTITATIVE ANALYSIS CORPUS ANALYSIS Antonella Serra CORPUS ANALYSIS ITINEARIES ON LINE: SARDINIA, CAPRI AND CORSICA TOTAL NUMBER OF WORD TOKENS 13.260 TOTAL NUMBER OF WORD TYPES 3188 QUANTITATIVE ANALYSIS THE MOST SIGNIFICATIVE

More information

Development of the First LRs for Macedonian: Current Projects

Development of the First LRs for Macedonian: Current Projects Development of the First LRs for Macedonian: Current Projects Ruska Ivanovska-Naskova Faculty of Philology- University St. Cyril and Methodius Bul. Krste Petkov Misirkov bb, 1000 Skopje, Macedonia rivanovska@flf.ukim.edu.mk

More information

Interactive Corpus Annotation of Anaphor Using NLP Algorithms

Interactive Corpus Annotation of Anaphor Using NLP Algorithms Interactive Corpus Annotation of Anaphor Using NLP Algorithms Catherine Smith 1 and Matthew Brook O Donnell 1 1. Introduction Pronouns occur with a relatively high frequency in all forms English discourse.

More information

Formulaic Language and Fluency: ESL Teaching Applications

Formulaic Language and Fluency: ESL Teaching Applications Formulaic Language and Fluency: ESL Teaching Applications Formulaic Language Terminology Formulaic sequence One such item Formulaic language Non-count noun referring to these items Phraseology The study

More information

National Literacy and Numeracy Framework for years 3/4

National Literacy and Numeracy Framework for years 3/4 1. Oracy National Literacy and Numeracy Framework for years 3/4 Speaking Listening Collaboration and discussion Year 3 - Explain information and ideas using relevant vocabulary - Organise what they say

More information

Possessive have and (have) got in New Zealand English Heidi Quinn, University of Canterbury, New Zealand

Possessive have and (have) got in New Zealand English Heidi Quinn, University of Canterbury, New Zealand 1 Introduction Possessive have and (have) got in New Zealand English Heidi Quinn, University of Canterbury, New Zealand heidi.quinn@canterbury.ac.nz NWAV 33, Ann Arbor 1 October 24 This paper looks at

More information

Creating a Test in Eduphoria! Aware

Creating a Test in Eduphoria! Aware in Eduphoria! Aware Login to Eduphoria using CHROME!!! 1. LCS Intranet > Portals > Eduphoria From home: LakeCounty.SchoolObjects.com 2. Login with your full email address. First time login password default

More information

STUDENT MOODLE ORIENTATION

STUDENT MOODLE ORIENTATION BAKER UNIVERSITY SCHOOL OF PROFESSIONAL AND GRADUATE STUDIES STUDENT MOODLE ORIENTATION TABLE OF CONTENTS Introduction to Moodle... 2 Online Aptitude Assessment... 2 Moodle Icons... 6 Logging In... 8 Page

More information

The taming of the data:

The taming of the data: The taming of the data: Using text mining in building a corpus for diachronic analysis Stefania Degaetano-Ortlieb, Hannah Kermes, Ashraf Khamis, Jörg Knappen, Noam Ordan and Elke Teich Background Big data

More information

Introduction to Moodle

Introduction to Moodle Center for Excellence in Teaching and Learning Mr. Philip Daoud Introduction to Moodle Beginner s guide Center for Excellence in Teaching and Learning / Teaching Resource This manual is part of a serious

More information

A Case Study: News Classification Based on Term Frequency

A Case Study: News Classification Based on Term Frequency A Case Study: News Classification Based on Term Frequency Petr Kroha Faculty of Computer Science University of Technology 09107 Chemnitz Germany kroha@informatik.tu-chemnitz.de Ricardo Baeza-Yates Center

More information

Memory-based grammatical error correction

Memory-based grammatical error correction Memory-based grammatical error correction Antal van den Bosch Peter Berck Radboud University Nijmegen Tilburg University P.O. Box 9103 P.O. Box 90153 NL-6500 HD Nijmegen, The Netherlands NL-5000 LE Tilburg,

More information

Intra-talker Variation: Audience Design Factors Affecting Lexical Selections

Intra-talker Variation: Audience Design Factors Affecting Lexical Selections Tyler Perrachione LING 451-0 Proseminar in Sound Structure Prof. A. Bradlow 17 March 2006 Intra-talker Variation: Audience Design Factors Affecting Lexical Selections Abstract Although the acoustic and

More information

Bigrams in registers, domains, and varieties: a bigram gravity approach to the homogeneity of corpora

Bigrams in registers, domains, and varieties: a bigram gravity approach to the homogeneity of corpora Bigrams in registers, domains, and varieties: a bigram gravity approach to the homogeneity of corpora Stefan Th. Gries Department of Linguistics University of California, Santa Barbara stgries@linguistics.ucsb.edu

More information

1/20 idea. We ll spend an extra hour on 1/21. based on assigned readings. so you ll be ready to discuss them in class

1/20 idea. We ll spend an extra hour on 1/21. based on assigned readings. so you ll be ready to discuss them in class If we cancel class 1/20 idea We ll spend an extra hour on 1/21 I ll give you a brief writing problem for 1/21 based on assigned readings Jot down your thoughts based on your reading so you ll be ready

More information

Improved Effects of Word-Retrieval Treatments Subsequent to Addition of the Orthographic Form

Improved Effects of Word-Retrieval Treatments Subsequent to Addition of the Orthographic Form Orthographic Form 1 Improved Effects of Word-Retrieval Treatments Subsequent to Addition of the Orthographic Form The development and testing of word-retrieval treatments for aphasia has generally focused

More information

Course Law Enforcement II. Unit I Careers in Law Enforcement

Course Law Enforcement II. Unit I Careers in Law Enforcement Course Law Enforcement II Unit I Careers in Law Enforcement Essential Question How does communication affect the role of the public safety professional? TEKS 130.294(c) (1)(A)(B)(C) Prior Student Learning

More information

Longman English Interactive

Longman English Interactive Longman English Interactive Level 3 Orientation Quick Start 2 Microphone for Speaking Activities 2 Course Navigation 3 Course Home Page 3 Course Overview 4 Course Outline 5 Navigating the Course Page 6

More information

Dublin City Schools Mathematics Graded Course of Study GRADE 4

Dublin City Schools Mathematics Graded Course of Study GRADE 4 I. Content Standard: Number, Number Sense and Operations Standard Students demonstrate number sense, including an understanding of number systems and reasonable estimates using paper and pencil, technology-supported

More information

CHANCERY SMS 5.0 STUDENT SCHEDULING

CHANCERY SMS 5.0 STUDENT SCHEDULING CHANCERY SMS 5.0 STUDENT SCHEDULING PARTICIPANT WORKBOOK VERSION: 06/04 CSL - 12148 Student Scheduling Chancery SMS 5.0 : Student Scheduling... 1 Course Objectives... 1 Course Agenda... 1 Topic 1: Overview

More information

Word Sense Disambiguation

Word Sense Disambiguation Word Sense Disambiguation D. De Cao R. Basili Corso di Web Mining e Retrieval a.a. 2008-9 May 21, 2009 Excerpt of the R. Mihalcea and T. Pedersen AAAI 2005 Tutorial, at: http://www.d.umn.edu/ tpederse/tutorials/advances-in-wsd-aaai-2005.ppt

More information

2014 State Residency Conference Frequently Asked Questions FAQ Categories

2014 State Residency Conference Frequently Asked Questions FAQ Categories 2014 State Residency Conference Frequently Asked Questions FAQ Categories Deadline... 2 The Five Year Rule... 3 Statutory Grace Period... 4 Immigration... 5 Active Duty Military... 7 Spouse Benefit...

More information

Case study Norway case 1

Case study Norway case 1 Case study Norway case 1 School : B (primary school) Theme: Science microorganisms Dates of lessons: March 26-27 th 2015 Age of students: 10-11 (grade 5) Data sources: Pre- and post-interview with 1 teacher

More information

Reading Grammar Section and Lesson Writing Chapter and Lesson Identify a purpose for reading W1-LO; W2- LO; W3- LO; W4- LO; W5-

Reading Grammar Section and Lesson Writing Chapter and Lesson Identify a purpose for reading W1-LO; W2- LO; W3- LO; W4- LO; W5- New York Grade 7 Core Performance Indicators Grades 7 8: common to all four ELA standards Throughout grades 7 and 8, students demonstrate the following core performance indicators in the key ideas of reading,

More information

BULATS A2 WORDLIST 2

BULATS A2 WORDLIST 2 BULATS A2 WORDLIST 2 INTRODUCTION TO THE BULATS A2 WORDLIST 2 The BULATS A2 WORDLIST 21 is a list of approximately 750 words to help candidates aiming at an A2 pass in the Cambridge BULATS exam. It is

More information

Alberta Police Cognitive Ability Test (APCAT) General Information

Alberta Police Cognitive Ability Test (APCAT) General Information Alberta Police Cognitive Ability Test (APCAT) General Information 1. What does the APCAT measure? The APCAT test measures one s potential to successfully complete police recruit training and to perform

More information

Appendix L: Online Testing Highlights and Script

Appendix L: Online Testing Highlights and Script Online Testing Highlights and Script for Fall 2017 Ohio s State Tests Administrations Test administrators must use this document when administering Ohio s State Tests online. It includes step-by-step directions,

More information

Basic Parsing with Context-Free Grammars. Some slides adapted from Julia Hirschberg and Dan Jurafsky 1

Basic Parsing with Context-Free Grammars. Some slides adapted from Julia Hirschberg and Dan Jurafsky 1 Basic Parsing with Context-Free Grammars Some slides adapted from Julia Hirschberg and Dan Jurafsky 1 Announcements HW 2 to go out today. Next Tuesday most important for background to assignment Sign up

More information

LEXICAL COHESION ANALYSIS OF THE ARTICLE WHAT IS A GOOD RESEARCH PROJECT? BY BRIAN PALTRIDGE A JOURNAL ARTICLE

LEXICAL COHESION ANALYSIS OF THE ARTICLE WHAT IS A GOOD RESEARCH PROJECT? BY BRIAN PALTRIDGE A JOURNAL ARTICLE LEXICAL COHESION ANALYSIS OF THE ARTICLE WHAT IS A GOOD RESEARCH PROJECT? BY BRIAN PALTRIDGE A JOURNAL ARTICLE Submitted in partial fulfillment of the requirements for the degree of Sarjana Sastra (S.S.)

More information

Review in ICAME Journal, Volume 38, 2014, DOI: /icame

Review in ICAME Journal, Volume 38, 2014, DOI: /icame Review in ICAME Journal, Volume 38, 2014, DOI: 10.2478/icame-2014-0012 Gaëtanelle Gilquin and Sylvie De Cock (eds.). Errors and disfluencies in spoken corpora. Amsterdam: John Benjamins. 2013. 172 pp.

More information

Setting Up Tuition Controls, Criteria, Equations, and Waivers

Setting Up Tuition Controls, Criteria, Equations, and Waivers Setting Up Tuition Controls, Criteria, Equations, and Waivers Understanding Tuition Controls, Criteria, Equations, and Waivers Controls, criteria, and waivers determine when the system calculates tuition

More information

David Erwin Ritter Associate Professor of Accounting MBA Coordinator Texas A&M University Central Texas

David Erwin Ritter Associate Professor of Accounting MBA Coordinator Texas A&M University Central Texas David Erwin Ritter Associate Professor of Accounting MBA Coordinator Texas A&M University Central Texas Education Doctor of Business Administration (1986) Juris Doctor (1996) Master of Business Administration

More information

Houghton Mifflin Online Assessment System Walkthrough Guide

Houghton Mifflin Online Assessment System Walkthrough Guide Houghton Mifflin Online Assessment System Walkthrough Guide Page 1 Copyright 2007 by Houghton Mifflin Company. All Rights Reserved. No part of this document may be reproduced or transmitted in any form

More information

Lyman, M. D. (2011). Criminal investigation: The art and the science (6th ed.). Upper Saddle River, NJ: Prentice Hall.

Lyman, M. D. (2011). Criminal investigation: The art and the science (6th ed.). Upper Saddle River, NJ: Prentice Hall. Course Syllabus Course Description Presents a study of the development of the investigative procedures and techniques from early practices to modern-day forensic science capabilities with an emphasis on

More information

EMPOWER Self-Service Portal Student User Manual

EMPOWER Self-Service Portal Student User Manual EMPOWER Self-Service Portal Student User Manual by Hasanna Tyus 1 Registrar 1 Adapted from the OASIS Student User Manual, July 2013, Benedictine College. 1 Table of Contents 1. Introduction... 3 2. Accessing

More information

ELD CELDT 5 EDGE Level C Curriculum Guide LANGUAGE DEVELOPMENT VOCABULARY COMMON WRITING PROJECT. ToolKit

ELD CELDT 5 EDGE Level C Curriculum Guide LANGUAGE DEVELOPMENT VOCABULARY COMMON WRITING PROJECT. ToolKit Unit 1 Language Development Express Ideas and Opinions Ask for and Give Information Engage in Discussion ELD CELDT 5 EDGE Level C Curriculum Guide 20132014 Sentences Reflective Essay August 12 th September

More information

Developing Grammar in Context

Developing Grammar in Context Developing Grammar in Context intermediate with answers Mark Nettle and Diana Hopkins PUBLISHED BY THE PRESS SYNDICATE OF THE UNIVERSITY OF CAMBRIDGE The Pitt Building, Trumpington Street, Cambridge, United

More information

MORE THAN A LINGUISTIC REFERENCE: THE INFLUENCE OF CORPUS TECHNOLOGY ON L2 ACADEMIC WRITING

MORE THAN A LINGUISTIC REFERENCE: THE INFLUENCE OF CORPUS TECHNOLOGY ON L2 ACADEMIC WRITING Language Learning & Technology http://llt.msu.edu/vol12num2/yoon/ June 2008, Volume 12, Number 2 pp. 31-48 MORE THAN A LINGUISTIC REFERENCE: THE INFLUENCE OF CORPUS TECHNOLOGY ON L2 ACADEMIC WRITING Hyunsook

More information

Basic German: CD/Book Package (LL(R) Complete Basic Courses) By Living Language

Basic German: CD/Book Package (LL(R) Complete Basic Courses) By Living Language Basic German: CD/Book Package (LL(R) Complete Basic Courses) By Living Language If searching for the book by Living Language Basic German: CD/Book Package (LL(R) Complete Basic Courses) in pdf format,

More information

SCT Banner Student Fee Assessment Training Workbook October 2005 Release 7.2

SCT Banner Student Fee Assessment Training Workbook October 2005 Release 7.2 SCT HIGHER EDUCATION SCT Banner Student Fee Assessment Training Workbook October 2005 Release 7.2 Confidential Business Information --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

More information

University of Waterloo School of Accountancy. AFM 102: Introductory Management Accounting. Fall Term 2004: Section 4

University of Waterloo School of Accountancy. AFM 102: Introductory Management Accounting. Fall Term 2004: Section 4 University of Waterloo School of Accountancy AFM 102: Introductory Management Accounting Fall Term 2004: Section 4 Instructor: Alan Webb Office: HH 289A / BFG 2120 B (after October 1) Phone: 888-4567 ext.

More information

Twitter Sentiment Classification on Sanders Data using Hybrid Approach

Twitter Sentiment Classification on Sanders Data using Hybrid Approach IOSR Journal of Computer Engineering (IOSR-JCE) e-issn: 2278-0661,p-ISSN: 2278-8727, Volume 17, Issue 4, Ver. I (July Aug. 2015), PP 118-123 www.iosrjournals.org Twitter Sentiment Classification on Sanders

More information

SEPERAC MEE QUICK REVIEW OUTLINE

SEPERAC MEE QUICK REVIEW OUTLINE SEPERAC MEE QUICK REVIEW OUTLINE 206 MEE QUESTIONS WITH ISSUES AND SHORT ANSWERS BASED ON 2002-2016 MEE EXAMS DATE RELEASED: NOVEMBER 11, 2016 This outline contains every released MEE question from 2002

More information

Literature and the Language Arts Experiencing Literature

Literature and the Language Arts Experiencing Literature Correlation of Literature and the Language Arts Experiencing Literature Grade 9 2 nd edition to the Nebraska Reading/Writing Standards EMC/Paradigm Publishing 875 Montreal Way St. Paul, Minnesota 55102

More information

The Haymarket Disaster and the Knights of Labor

The Haymarket Disaster and the Knights of Labor St. Cloud State University therepository at St. Cloud State Curriculum Unit on the Gilded Age in the United States American History Lesson Plans 1-8-2016 The Haymarket Disaster and the Knights of Labor

More information

Lower and Upper Secondary

Lower and Upper Secondary Lower and Upper Secondary Type of Course Age Group Content Duration Target General English Lower secondary Grammar work, reading and comprehension skills, speech and drama. Using Multi-Media CD - Rom 7

More information

Managerial Decision Making

Managerial Decision Making Course Business Managerial Decision Making Session 4 Conditional Probability & Bayesian Updating Surveys in the future... attempt to participate is the important thing Work-load goals Average 6-7 hours,

More information

Minimalism is the name of the predominant approach in generative linguistics today. It was first

Minimalism is the name of the predominant approach in generative linguistics today. It was first Minimalism Minimalism is the name of the predominant approach in generative linguistics today. It was first introduced by Chomsky in his work The Minimalist Program (1995) and has seen several developments

More information

AN INTRODUCTION (2 ND ED.) (LONDON, BLOOMSBURY ACADEMIC PP. VI, 282)

AN INTRODUCTION (2 ND ED.) (LONDON, BLOOMSBURY ACADEMIC PP. VI, 282) B. PALTRIDGE, DISCOURSE ANALYSIS: AN INTRODUCTION (2 ND ED.) (LONDON, BLOOMSBURY ACADEMIC. 2012. PP. VI, 282) Review by Glenda Shopen _ This book is a revised edition of the author s 2006 introductory

More information

SOCIAL STUDIES GRADE 1. Clear Learning Targets Office of Teaching and Learning Curriculum Division FAMILIES NOW AND LONG AGO, NEAR AND FAR

SOCIAL STUDIES GRADE 1. Clear Learning Targets Office of Teaching and Learning Curriculum Division FAMILIES NOW AND LONG AGO, NEAR AND FAR SOCIAL STUDIES FAMILIES NOW AND LONG AGO, NEAR AND FAR GRADE 1 Clear Learning Targets 2015-2016 Aligned with Ohio s Learning Standards for Social Studies Office of Teaching and Learning Curriculum Division

More information

Words come in categories

Words come in categories Nouns Words come in categories D: A grammatical category is a class of expressions which share a common set of grammatical properties (a.k.a. word class or part of speech). Words come in categories Open

More information

OCR for Arabic using SIFT Descriptors With Online Failure Prediction

OCR for Arabic using SIFT Descriptors With Online Failure Prediction OCR for Arabic using SIFT Descriptors With Online Failure Prediction Andrey Stolyarenko, Nachum Dershowitz The Blavatnik School of Computer Science Tel Aviv University Tel Aviv, Israel Email: stloyare@tau.ac.il,

More information

We re Listening Results Dashboard How To Guide

We re Listening Results Dashboard How To Guide We re Listening Results Dashboard How To Guide Contents Page 1. Introduction 3 2. Finding your way around 3 3. Dashboard Options 3 4. Landing Page Dashboard 4 5. Question Breakdown Dashboard 5 6. Key Drivers

More information

BANGLA TO ENGLISH TEXT CONVERSION USING OPENNLP TOOLS

BANGLA TO ENGLISH TEXT CONVERSION USING OPENNLP TOOLS Daffodil International University Institutional Repository DIU Journal of Science and Technology Volume 8, Issue 1, January 2013 2013-01 BANGLA TO ENGLISH TEXT CONVERSION USING OPENNLP TOOLS Uddin, Sk.

More information

What the National Curriculum requires in reading at Y5 and Y6

What the National Curriculum requires in reading at Y5 and Y6 What the National Curriculum requires in reading at Y5 and Y6 Word reading apply their growing knowledge of root words, prefixes and suffixes (morphology and etymology), as listed in Appendix 1 of the

More information

PowerTeacher Gradebook User Guide PowerSchool Student Information System

PowerTeacher Gradebook User Guide PowerSchool Student Information System PowerSchool Student Information System Document Properties Copyright Owner Copyright 2007 Pearson Education, Inc. or its affiliates. All rights reserved. This document is the property of Pearson Education,

More information

Coordinating by looking back? Past experience as enabler of coordination in extreme environment

Coordinating by looking back? Past experience as enabler of coordination in extreme environment Coordinating by looking back? Past experience as enabler of coordination in extreme environment Cécile Godé Research Center of the French Air Force Associate researcher GREDEG UMR 6227 CNRS UNSA Research

More information

SCT Banner Financial Aid Needs Analysis Training Workbook January 2005 Release 7

SCT Banner Financial Aid Needs Analysis Training Workbook January 2005 Release 7 SCT HIGHER EDUCATION SCT Banner Financial Aid Needs Analysis Training Workbook January 2005 Release 7 Confidential Business Information --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

More information

SECTION 12 E-Learning (CBT) Delivery Module

SECTION 12 E-Learning (CBT) Delivery Module SECTION 12 E-Learning (CBT) Delivery Module Linking a CBT package (file or URL) to an item of Set Training 2 Linking an active Redkite Question Master assessment 2 to the end of a CBT package Removing

More information

L1 and L2 acquisition. Holger Diessel

L1 and L2 acquisition. Holger Diessel L1 and L2 acquisition Holger Diessel Schedule Comparing L1 and L2 acquisition The role of the native language in L2 acquisition The critical period hypothesis [student presentation] Non-linguistic factors

More information

Beginning Blackboard. Getting Started. The Control Panel. 1. Accessing Blackboard:

Beginning Blackboard. Getting Started. The Control Panel. 1. Accessing Blackboard: Beginning Blackboard Contact Information Blackboard System Administrator: Paul Edminster, Webmaster Developer x3842 or Edminster@its.gonzaga.edu Blackboard Training and Support: Erik Blackerby x3856 or

More information

Moodle 2 Assignments. LATTC Faculty Technology Training Tutorial

Moodle 2 Assignments. LATTC Faculty Technology Training Tutorial LATTC Faculty Technology Training Tutorial Moodle 2 Assignments This tutorial begins with the instructor already logged into Moodle 2. http://moodle.lattc.edu/ Faculty login id is same as email login id.

More information