A Corpus of Dutch Aphasic Speech: Sketching the Design and Performing a Pilot Study. E. N. Westerhout November 10, 2005

Size: px
Start display at page:

Download "A Corpus of Dutch Aphasic Speech: Sketching the Design and Performing a Pilot Study. E. N. Westerhout November 10, 2005"

Transcription

1 A Corpus of Dutch Aphasic Speech: Sketching the Design and Performing a Pilot Study E. N. Westerhout November 10, 2005

2 Abstract In this thesis, a pilot study for the development of a corpus of Dutch aphasic speech (CoDAS) is presented. Given the lack of resources of this kind not only for Dutch but also for other languages, CoDAS will be able to set standards and will contribute to the future research in this area. A corpus of Dutch aphasic speech should fulfill at least three requirements. First, it should encode a plausible sample of contemporary Dutch as spoken by aphasic patients. That is, it should include speech representing different types of aphasia as well as various communication settings. Secondly, the speech fragments should be documented with the relevant metadata which should include information about the speaker and aphasia. Thirdly, the corpus should be enriched with various kinds of linguistic information. Given the special character of the speech contained in CoDAS, we cannot simply carry over the design and the annotation protocols of existing corpora, such as SDC or CHILDES. However, they have been assumed as starting point. In our pilot study, we have established the basic requirements with respect to text types, metadata, and annotation levels that CoDAS should fulfill. In this respect, we have investigated whether and how the procedures and protocols for the annotation and transcription used for the SDC should be adapted in order to annotate and transcribe the aphasic speech properly. In particular, for the orthographic transcription and the part-of-speech tagging, suggestions for improvement of the existing protocols have been given. On the other hand, the phonetic transcription procedure assumed within the SDC can be adopted without major modifications. i

3 Acknowledgements First of all, I would like to thank my supervisors, Dr. Paola Monachesi and Dr. Esther Janse, for their valuable guidance and encouragement during the writing process. I would also like to thank the six clients of the Afasiecentrum in Capelle aan den IJssel that participated in the pilot study and Mia Verschaeve, Speech and Language Therapist in this center. My vote of thanks also goes to the Speech and Language Therapist Janneke Wolters, who conducted the AAT s on the patients. The members of the ILK group in Tilburg, especially Dr. Erwin Marsi and Dr. Antal van den Bosch, thank you for performing the automatic transcriptions needed within the pilot study. Last but not least, I would like to thank Anne Marie van de Zande. We started to work on this project together. Unfortunately, our subjects proved to be too different, so we decided to work on our own projects. Nevertheless, we met several times in Utrecht to work together. Working together is definitely more fun than working alone. ii

4 Contents Abstract Acknowledgements i ii 1 Introduction Motivation Overview Part 1: Aphasia and Corpora Part 2: Corpus Design Part 3: The Pilot Study Conclusions and suggestions for future research Aphasia Causes Varieties Patients The Akense Afasie Test (AAT) ( Aachen Aphasia Test ) The six patients Data used for the pilot study Corpora Characteristics Types General issues Levels of annotation and transcription Three important corpora Brown Corpus Lancaster-Oslo-Bergen Corpus British National Corpus Relevant corpora for this project CHILDES Three components Design issues iii

5 CONTENTS CONTENTS Metadata Levels of annotation and transcription The Spoken Dutch Corpus Design issues Text types Metadata Levels of annotation and transcription Corpus design Purpose Permissions Text types Deviating from the Spoken Dutch Corpus CoDAS Metadata Levels of annotation and transcription Orthographic Transcription Criteria for guidelines The EAGLES guidelines Spelling guidelines Unidentifiable material The CHILDES project Spelling guidelines corresponding to the EAGLES guidelines Complementary spelling guidelines Unidentifiable material Transcription of aphasic speech in CHILDES The Spoken Dutch Corpus Spelling guidelines corresponding to the EAGLES guidelines Complementary spelling guidelines Unidentifiable material Orthographic transcription of the non-fluent speech Problematic issues Transcription of the problematic issues Phonetic Transcription The Spoken Dutch Corpus Phonetic transcription files The symbol set Automatic generation of phonetic transcriptions TreeTalk Verification and correction Phonetic transcription of the non-fluent speech iv

6 CONTENTS CONTENTS 8 Lemmatization and part-of-speech tagging EAGLES guidelines CHILDES The Spoken Dutch Corpus Lemmatization Part-of-speech tagging Tagging the non-fluent speech Performance of the Memory-Based Tagger (MBT) Improving the performance of the Memory-Based Tagger Conclusions Future research Appendices 67 A Metadata of the SDC and CoDAS 68 A.1 Metadata about the recordings A.2 Metadata about the participants A.2.1 Metadata of the SDC A.2.2 Complementary metadata for CoDAS B Orthographic transcriptions 72 B.1 Patient B.2 Patient C The SDC symbol set 79 D EAGLES recommended subcategories and values 81 E Tagset of the Spoken Dutch Corpus 83 E.1 Obligatory E.2 Recommended F Part-of-Speech tagging 87 F.1 Patient F.2 Patient v

7 Chapter 1 Introduction 1.1 Motivation In 2004, the Spoken Dutch Corpus (SDC) ( Corpus Gesproken Nederlands ) project finished (Oostdijk et al., 2002). This project aimed at the construction of a database of contemporary standard Dutch as spoken by adults in the Netherlands and Flanders. However, the content of the corpus is restricted, because it only contains speech from adults with intact speech abilities. Speech from persons with aphasia or other speech and language disorders has not been included. The SDC project gave rise to the question whether it would be interesting to develop another corpus of spoken Dutch, namely a specialized corpus containing only Dutch aphasic speech. In this thesis we state that it indeed would be interesting and useful to have a Corpus of Dutch Aphasic Speech (CoDAS). Because of the special character of the speech contained in such a corpus, the design will be different from the design of the SDC. One of the purposes of this thesis is to sketch a design for a Corpus of Dutch Aphasic Speech. Just as the design of the corpus differs from that of the SDC, the annotation of the aphasic speech will also have to be performed in a different way, because aphasic speech differs from normal speech at different points. Therefore, a pilot study has been carried out to investigate the changes that should be made in order to make it possible to annotate and tag the speech of aphasics. For the purpose of the pilot study, we have focused on the speech of non-fluent aphasics. The orthographic transcription and part-of-speech tagging of this kind of speech have been examined thoroughly. We have also taken the phonetic transcription into consideration. So, the second goal of the thesis is to investigate which problems with respect to the annotation and transcription should be tackled when a Corpus of Dutch Aphasic Speech is going to be developed. In summary, the thesis is intended to serve as a preparatory study for the set up of a Corpus of Dutch Aphasic Speech and focuses on two aspects. First, corpus design issues are considered. The additional aim of the thesis is to investigate how the annotation and transcription of such a corpus should be performed. Therefore, a pilot study was carried out, in which non-fluent aphasic speech has been examined. 1

8 1.2. Overview Introduction 1.2 Overview The thesis can be divided in three parts. The first part contains three introductory chapters providing background information about the language impairment aphasia, corpora in general and two relevant corpora for this project. Part 2 is about the design requirements that should be met when a Corpus of Dutch Aphasic Speech would be designed. The pilot study is the topic of interest of the third part, which comprises Chapter 6, 7, and 8. In the pilot study, we focus on three annotation levels, namely the orthographic transcription, the phonetic transcription and the part-of-speech tagging Part 1: Aphasia and Corpora Chapter 2 focuses on the language impairment aphasia. The three main causes for aphasia are stroke, trauma, and tumors. Depending on the location and the size of the impairment, different aspects of speech can be disturbed. Therefore, different types of aphasia are distinguished. The last section of this chapter focuses on the patients involved in the pilot study. Within this section we discuss the Akense Afasie Test ( Aachen Aphasia Test ) (AAT), the test we used to diagnose the aphasic patients involved in the pilot study, and the scores of the patients on this test. For the pilot study, we made use of the first component of the AAT (the spontaneous language sample). Corpora are an essential tool for linguistic research. Chapter 3 is about corpora and discusses some of its general characteristics. It proceeds with introducing different distinctions made to typify corpora (e.g. written vs. spoken, synchronic vs. diachronic). Before a corpus can be developed, several design issues have to be dealt with. These issues are also covered in this chapter. Thereafter, the different annotation and transcription levels are discussed. The chapter ends with some examples of existing corpora. In Chapter 4, two corpora that are relevant for this project are discussed. These two corpora are CHILDES (MacWhinney, 2000a,b) and the Spoken Dutch Corpus (Oostdijk et al., 2002). The CHILDES system is of particular interest for this project, because the kind of speech contained in this system also deviates from normal speech. The second corpus we used, the SDC, is relevant for this project because it contains only Dutch speech and is accompanied by very extended, detailed protocols for the transcription of Dutch speech Part 2: Corpus Design Chapter 5 is about the design of a Corpus of Dutch Aphasic Speech. The design of a corpus heavily depends on its purpose. Therefore, the chapter starts with formulating the purpose a Corpus of Dutch Aphasic Speech would serve. Then, the chapter proceeds with several other relevant aspects of corpus design, such as obtaining permissions for using speech, the text types that should be included, what metadata about the patients are relevant and at which levels the speech should be annotated and transcribed. Obtaining permissions for using aphasic speech and making it available for other researchers, could be problematic. A committee has to grant these permissions. Even when it would be allowed to use the speech transcripts, for privacy reasons it might probably not be possible to use the speech recordings. 2

9 1.2. Overview Introduction Part 3: The Pilot Study The first level of transcription that has been examined, is the orthographic transcription (Chapter 6). An orthographic transcription is a verbatim record of what was actually said using the standard spelling conventions. The chapter starts with a comparison of a general set of guidelines (EAGLES), the CHILDES guidelines, and the guidelines developed for the SDC. Thereafter, it proceeds with discussing the orthographic transcription of six speech fragments according to the guidelines given in the SDC-protocol. Some points require special attention, because they are typical for aphasic speech. Chapter 7 focuses on the phonetic transcription of the aphasic speech. A phonetic transcription provides information on how words are pronounced. For the phonetic transcription of the aphasic speech the same procedure has been followed as for the phonetic transcription of the SDC. The transcription process was performed automatically; a grapheme-to-phoneme conversion program has been used. In Chapter 8, the part-of-speech tagging (POS-tagging) of the aphasic speech has been discussed. The task of part-of-speech tagging is assigning a part-of-speech label to each word of a text. Just as the chapter for the orthographic transcription, this chapter starts with a comparison of the EAGLES guidelines, the method used in CHILDES and the way the POS-tagging has been carried out within the SDC project. The orthographic transcriptions of the aphasic speech were tagged automatically using one of the taggers that has been used for the tagging of the SDC. The performance of this tagger on the aphasic speech is discussed in the final section of the chapter Conclusions and suggestions for future research The thesis ends with conclusions and suggestions for future research. Based on the findings of the pilot study, different issues for future research are mentioned. 3

10 Chapter 2 Aphasia The abilities to understand and produce spoken and written language are located in multiple areas of the brain (most times in the left hemisphere). When one of these areas or the connection between them is damaged, the language production and comprehension becomes impaired. This language impairment is called aphasia, a word derived from the Greek words a (not) and phasis (to speak). Aphasia is a language disorder, the intellect of aphasia patients is not damaged. In this chapter, the main causes of aphasia are discussed (Section 2.1). Thereafter, the most common varieties of aphasia are discussed (Section 2.2). Section 2.3 is about the characteristics of the patients involved in this pilot study. 2.1 Causes In the Netherlands, about 30,000 people suffer from aphasia. In 85% of the cases, the cause of aphasia is a CVA (stroke). Other causes are traumatic brain injuries (12%) and brain tumors (3%) (Davidse and Mackenbach, 1984). CVA CVA is short for cerebrovascular accident, also referred to as a stroke. A stroke is caused by a lack of blood supply to the brain due to an occlusion (90% of the cases) or by hemorrhage (10% of the cases). Depending on the area of the brain that is damaged, a CVA can cause coma, paralysis (reversible or irreversible), speech problems (aphasia), visual disturbances, and dementia (Wikipedia, 2005b). Traumatic brain injury A traumatic brain injury (TBI) is an injury to the brain caused by a severe blow to the head or by being shaken violently. Half of all TBIs are due to transportation accidents involving automobiles, motorcycles, bicycles, and pedestrians. Disabilities resulting from a TBI depend upon the severity of the injury, the location of the injury, and the age and general health of the patient. Some common disabilities include problems with cognition (thinking, memory, and reasoning), sensory processing (sight, hearing, touch, taste, and smell), communication (expression and understanding), and behavior or mental health (depression, 4

11 2.2. Varieties Aphasia anxiety, personality changes, aggression, acting out, and social inappropriateness). Language and communication problems are common disabilities in TBI patients. Some may experience aphasia, others may have difficulty with the more subtle aspects of communication, such as body language and emotional, non-verbal signals (Wikipedia, 2005c). Brain tumor A brain tumor is a mass of unnecessary cells growing in the brain. Within brain tumors benign and malignant tumors are distinguished. Those descriptions refer to the degree of malignancy or aggressiveness of a brain tumor. A benign brain tumor consists of very slow growing cells, usually has distinct borders, and rarely spreads. A malignant brain tumor is usually rapid growing, invasive, and life-threatening, these brain tumors are often called brain cancer. The time point of symptom onset in the course of disease correlates in many cases with the nature of the tumor (benign or malignant). Depending on the tumor location and the damage it may have caused to surrounding brain structures, any type of focal neurologic symptoms can occur, such as personality changes, cognitive and behavioral impairment, hemiparesis and aphasia (American Brain Tumor Association, 2004; Wikipedia, 2005a). 2.2 Varieties Language impairments differ depending on the location and size of the damage. The brain can be divided down the middle lengthwise into two halves called the cerebral hemispheres. One of these two is the dominant hemisphere for a certain task. The dominant hemisphere is more involved than the other hemisphere in governing certain body functions, such as controlling the arm and leg used preferentially in skilled movements. For most individuals, the left hemisphere is dominant for language. Approximately 70 percent of all individuals with damage to the left hemisphere experience some type of aphasia, whereas only 1 percent of persons with right hemispheric lesions will experience this (Akmajian et al., 2001). Each hemisphere is divided into four lobes, namely the frontal lobe, the parietal lobe, the temporal lobe, and the occipital lobe. Broca s area is located in the frontal lobe and Wernicke s area is situated in the temporal lobe of the dominant hemisphere, in the so-called perisylvian speech area. This zone contains, besides Broca s area and Wernicke s area, the supramarginal gyrus, the angular gyrus, and the arcuate fasciculus (Love and Webb, 1996). Figure 2.1 shows where Broca s area, Wernicke s area, and the arcuate fasciculus are situated in the brain. It also shows the primary motor cortex (controls movements of, among others, the speech muscles), the primary auditory cortex (responsible for processing of auditory information), and the primary visual cortex (responsible for processing of visual information). For each of the aphasia varieties, the main speech characteristics are mentioned (Love and Webb, 1996; Dharmaperwira-Prins and Maas, 2002; Blauw-van Mourik and Koning-Haanstra, 1990). Broca s aphasia (expressive aphasia, motor aphasia) Broca s aphasia is associated with damage to Broca s area in the brain (red area in figure 5

12 2.2. Varieties Aphasia Figure 2.1: Language area s in the brain 2.1). It is characterized by non-fluent speech containing many pauses. It typically has a telegraphic nature, because of the deletion of function words and disturbances in word order. Only the main content words are present, vital connecting words are missing. Repetition of words and phrases is impaired. Patients with Broca s aphasia also have phonological problems, they reduce sound clusters in words. Another characteristic of Broca s aphasia is that the patients encounter word finding difficulties, e.g. when a patient is asked what his wife s name is, he might not be able to come up with it. In addition to having impaired speech, people with Broca s aphasia also encounter writing difficulties. Writing is included in expressive language, so damage to Broca s area of the brain affects it. Writing can be additionally impaired because of weakness on the right side of the body. People with Broca s Aphasia have relatively good comprehension, it is mainly their expressive language that is impaired. Most Broca s aphasics are painfully aware of their own mistakes. The two fragments below illustrate the difficulty Broca s aphasics encounter in speaking (Akmajian et al., 2001). Examiner: Aphasic: Examiner: Aphasic: Tell me, what did you do before you retired? Uh, uh, uh, puh, par, partender, no. Carpenter? (shaking head yes) Carpenter, tuh, tuh, tenty [20] year. Examiner: Aphasic: Tell me about this picture. Boy... cook... cookie... took... cookie. Wernicke s aphasia (receptive aphasia, sensory aphasia) Wernicke s aphasia is associated with damage to Wernicke s area in the brain (yellow area in figure 2.1). It is a fluent aphasia characterized by difficulty in understanding language as well as difficulty in repetition of language. The speech is fluent, but paraphasic: parts of words are omitted, words are used incorrectly, neologisms are used and incorrect phonemes are substituted for correct phonemes. The content of what these patients say, ranges from mildly inappropriate to complete nonsense. Phrase length is normal and the syntactic structures of the sentences are most times acceptable. Reading ability is generally disturbed, and although writing ability is often retained, what is written may be abnormal. Patients with Wernicke s aphasia may not always be aware of their language difficulties. Akmajian et al. 6

13 2.2. Varieties Aphasia (2001) illustrated Wernicke s aphasia with the examples below. Examiner: Aphasic: Do you like it here in Kansas City? Yes, I am. Examiner: Aphasic: I d like to have you tell more about your problem. Yes, I ugh can t hill all of my way. I can t talk all of the things I do, and part of the part I can go allright, but I can t tell from the other people. I usually most of my things. I know what can I talk and know what they are but I can t always come back even though I know they should be in, and I know should something eely I should know what I m doing... Conduction aphasia (associative aphasia) Conduction aphasia is often associated with damage to the connection between the areas of Broca and Wernicke, the arcuate fasciculus (purple in figure 2.1) or in the left temporal lobe of the auditory association area. The areas themselves are still intact. Patients with conduction aphasia are unable to repeat words, sentences, and phrases. Speech is fluent and paraphasic, just as in Wernicke s aphasia. Auditory comprehension and reading comprehension are fairly good, just as in Broca s aphasia. Although patients with conduction aphasia are able to understand spoken language, they have word finding difficulties during the production of speech. The impact of this condition on reading and writing varies. In most cases, oral reading is paraphasic whereas silent reading is adequate. Spelling is poor, characterized by omissions, reversals, and substitutions of letters and words. Most patients with conduction aphasia are aware of their language problems. Global aphasia (total aphasia) Global aphasia is associated with damage to both Broca s and Wernicke s area. The symptoms of global aphasia are those of severe Broca s aphasia and Wernicke s aphasia combined: there is an almost total reduction of all aspects of spoken and written language, in both production and comprehension. Improvement may occur in one or both areas (expressive and receptive) over time with rehabilitation. Transcortical aphasia It is also possible that the site of lesion is situated outside the perisylvian speech area. Therefore, the language areas become isolated and cannot be reached. Depending on which language area is isolated, three transcortical aphasia types can be distinguished, namely transcortical motor aphasia, transcortical sensory aphasia, and mixed transcortical aphasia. The area around Broca s area is associated with transcortical motor aphasia. So, when a patients suffers from transcortical motor aphasia, the paths to between Broca s area and the other language areas are cut off. This variety resembles Broca s aphasia, except for the ability to repeat: this ability remains intact in transcortical motor aphasia. Just as transcortical motor aphasia resembles Broca s aphasia, transcortical sensory aphasia resembles Wernicke s aphasia. The area around Wernicke s area is damaged. The difference with Wernicke s aphasia is that the ability to repeat remains intact in transcortical sensory aphasia. 7

14 2.3. Patients Aphasia Transcortical mixed aphasia, also called isolation of the speech area, involves simple repetition. The only ability that is intact is the ability to repeat, patients echo what is said but can neither produce speech spontaneously nor understand it. This variety resembles global aphasia. Anomic aphasia (amnes(t)ic aphasia, nominal aphasia) The main characteristic of anomic aphasia is that the patient has word finding difficulties. The speech is relatively fluent and grammatical and the comprehension is good. The only deficit is trouble finding appropriate words. Anomic aphasia can be the result of a recovered aphasia of another aphasia type, but can also consist as its own aphasia type. Within anomic aphasia different types can be distinguished, depending on the place that is impaired (e.g. word production anomia, word selection anomia, semantic anomia). To illustrate what kind of problems anomic aphasics encounter, the following examples are given (Akmajian et al., 2001). Examiner: Aphasic: Who is the president of the United States? I can t say his name. I know the man, but I can t come out and say... I m very sorry, I just can t come out and say. I just can t write it to me now. Examiner: Aphasic: Can you tell me a girl s name? Of a girl s name, by mean, by which weight, I mean how old or young? Examiner: Aphasic: On what do we sleep? Of the week, er, of the night, oh from about 10:00, about 11:00 o clock at night until about uh 7:00 in the morning 2.3 Patients Within the pilot study, speech material of six aphasic patients has been considered. These patients were classified by their speech pathologist as being Broca s aphasics. However, we performed an aphasia test on them, which showed that they were not all pure Broca s. The first part of this section is about the test used to diagnose the patients. Thereafter, we proceed with discussing the speech characteristics of the patients involved in the pilot study The Akense Afasie Test (AAT) ( Aachen Aphasia Test ) The AAT consists of six subtests each testing the performance on one particular component of language. The six subtests involve a spontaneous speech sample, a token test, a repetition test, a written language test, a naming test, and a language comprehension test. The test is used to diagnose aphasic patients and to determine severeness and type of aphasia (Graetz et al., 1992). Spontaneous language sample The spontaneous language sample consists of a conversation between a speech therapist and an aphasia patient. There are five standard topics that are discussed during the conversation (e.g. profession, family, hobbies). By means of the conversation, six elements are judged, namely (1) Communicative behaviour (COM), (2) Articulation and 8

15 2.3. Patients Aphasia prosody (ART), (3) Formulaic language (AUT), (4) Semantics (SEM), (5) Phonology (FON), and (6) Syntax (SYN). For each of these points the patient can get a score between 1 (very impaired) and 5 ((almost) intact). Token test The token test starts with a pretest, in which the examiner tests whether the condition of the patient is good enough to perform this test. Thereafter, the test starts, the wrong responses are counted to determine the score. The test consists of five parts, each of them containing ten questions. The difficulty level increases over the parts. If a patient has only two or less good answers in one part, the remaining parts are skipped and the patient gets 10 points for these parts. Repetition The repetition test consists of five parts, also with increasing difficulty. The five parts are: (1) Sounds, (2) One-syllable words, (3) Multisyllable words, (4) Morphologically complex words, and (5) Sentences. The judgements are based on the percentages of phonemes or words that are correct, the number of times the speech therapist has to repeat the stimulus and the number of resumptions. Written language The written language component consists of three parts. In the first part, words and phrases have to be read aloud by the patient. When this part is finished, the patient has to compose words and phrases from blocks containing one or more syllables or words. In the last part of this subtest, the patient has to write down words and phrases to dictation. Naming The naming task involves pictures that have to be named. In the first part, these pictures are objects such as table, cigar, and candle. In the second part, ten colours have to be named. The third part contains again pictures from objects, but now from compound nouns, such as vacuum cleaner, screw-driver, and sailing boat. In the last part, the pictures show situations, such as a boy playing with a dog, two men quarreling, and a man fishing a boot out of the water. In this case, the patient has to say in one sentence what the picture is about. Comprehension The goal of the last component of the AAT is to test language comprehension. The comprehension is tested in four subtests of ten questions. In each question, the patient is shown four pictures. In the first and second part, the speech therapist reads words and sentences, respectively. The patient has to combine the heard word or sentence with the picture that best matches the heard word or sentence. In the third and fourth part, the patients have to read the words and sentences themselves and to combine them with the best matching picture The six patients For our pilot study, we used speech samples of six patients: three men and three women, with an average age of 54. The patients have visited an aphasia center for some years already, the time post onset was between three and four years. To obtain speech data, we made use of the Dutch version of the Aachen Aphasia Test (AAT) (Graetz et al., 1992). A qualified Speech and Language Pathologist conducted the test. Initially, we did not know exactly which parts of the 9

16 2.3. Patients Aphasia test would be needed for the pilot study. However, because because we also wanted to have an indication of the severity of the aphasia, we decided to conduct the whole test. The test data were automatically processed by the computer program AATP, which has been used to classify the patients. The results show that only one of the six patients was a pure Broca s aphasic (patient 4). Four of the patients were not classifiable at all in one of the types. However, because the pilot study is only intended for investigating possible problems and looking for requirements that must be fulfilled by a full corpus, it was not problematic that the severeness and type of aphasia differed among patients. The speech of all patients had a non-fluent character. The most important score for determining fluency of a patient is the sixth score within the spontaneous language sample. This is the score that gives information on the syntactic structures of the sentences. The score can vary between 0 (very heavy syntactic disorders) and 5 (no syntactic disorders). For our patients, the score on syntactic structures was 1 or 2. A score of 2 indicates that the sentences are short and usually syntactically incomplete. Besides, many flection forms and function words are not present. A score of 1 indicates that the patient almost does not use flection forms or function words and makes sentences of 1 or 2 words. The results on the test of the six patients involved in the pilot study are shown in Table 2.1. The figures in this table are the raw scores on the test and the percentages for the various aphasia varieties. Patient Test scores Spontaneous speech sample (communicative behaviour) (articulation and prosody) (automated language) (semantic structure) (phonematic structure) (syntactic structure) Token test Repetition Written language Naming Comprehension AATP scores Percentage Aphasia Percentage Broca Percentage Wernicke Percentage Amnestic Aphasia type??? Broca Amnestic? Table 2.1: The scores on the AAT of the patients involved in the pilot study For the pilot study we restricted ourselves to non-fluent speech. We decided to use a comparable sample, because this makes it possible to draw better conclusions. If all kinds of speech had been represented within the pilot study, it might have been more difficult to see whether problematic issues occur more than once or if they are typical for only this patient. A full Corpus of Dutch Aphasic Speech should contain speech samples from patients representing all different types of aphasia. 10

17 2.3. Patients Aphasia Data used for the pilot study Only the first component of the AAT, the spontaneous speech sample, was used for the pilot study. The samples contain between 300 and 500 words spoken by the aphasic patient. Although the conversation is not completely spontaneous, because the topics are already determined, we nevertheless used these samples. It is rather difficult to obtain completely spontaneous speech from aphasic patients, because they do not speak as much as people without speech impairments do. The content of their utterances is usually very informative and many of the patients only speak when it is necessary. Therefore, the interview is a good alternative for collecting spontaneous speech samples. The other components of the AAT were not used for the pilot study, but were necessary for determining the severity and type of aphasia the patients have. 11

18 Chapter 3 Corpora A lot of linguistic research is done by means of a corpus. The word corpus is derived from the Latin word corpus meaning body and refers to a collection of texts. A corpus can be divided into subcorpora. A subcorpus has all the properties of a corpus but is part of a larger corpus. Corpora and subcorpora are divided into components. A component is not necessarily an adequate sample of a language and in that way it is distinct from a corpus and a subcorpus. It is a collection of pieces of language that are selected and ordered according to a set of linguistic criteria that serve to characterize its linguistic homogeneity. Whereas a corpus may illustrate heterogeneity, and also a subcorpus to some extent, the component illustrates a particular type of language (Sinclair, 1996). This chapter first discusses some of the characteristics of a corpus (Section 3.1). Thereafter, it proceeds with discussing some general distinctions that can be drawn to describe a corpus (Section 3.2). Section 3.3 covers the general issues that have to be considered before a corpus can be developed. Section 3.4 is about the different transcription and annotation levels that can be added to enrich a corpus. In Section 3.5, some important corpora are discussed, such as the British National Corpus and the Brown Corpus. 3.1 Characteristics According to Sinclair (1996), a corpus is assumed to have certain standard properties. Unless stated, these characteristics are attributed to anything called a corpus. A corpus which has one or more non-default values for these characteristics is called a special corpus : its title should specify its deviations from the assumptions. The four characteristics given by Sinclair (1996) are: Quantity = large The default value of quantity is large. A corpus is assumed to contain a large number of words. The whole point of assembling a corpus is to gather data in quantity. It has to be stressed here that any corpus, however big, always is a minuscule sample of all the speech and writing produced by all the users of a language. The minimum size is not exactly specified, but some examples show that the sizes of important existing corpora are very large. For instance, the British National Corpus consists of 100 million words collected from 12

19 3.2. Types Corpora samples of written and spoken British English and the Spoken Dutch Corpus comprises 10 million words contemporary standard Dutch spoken by adults living in the Netherlands and Flanders. Quality = authentic The default value for quality is authentic. All the material is gathered from the genuine communications of people going about their normal business. Corpora of the language of children, geriatrics, non-native speakers, users of extreme dialects, and very specialized areas of communication should be designated special corpora because of the unrepresentative nature of the language involved. Simplicity = plain text The default value of simplicity is plain text. This means that the user can expect an unbroken string of ASCII characters, with any mark-up clearly identified, and separable from the text. Nowadays for most corpora the texts are stored in XML format. This markup language has been carefully designed and does not impose any additional linguistic information on the text. Largely, its role in relation to text representation is to preserve in linear coding some features which would otherwise be lost. Documented = yes The default value for documented is yes. This means that full details about the constituents of a component are kept separately from the component itself. Corpus users seem to prefer to keep the documentation of texts in a separate place from the texts themselves, and to include only a minimal header that contains a reference to the documentation. For the management of corpora this practice allows the effective separation of plain text from annotation with only a small amount of programming effort. According to MacEnery and Wilson (1996), a corpus used in corpus linguistics has four characteristics. First, the corpus is a representative sample of a language variety. Second, the term corpus implies a body of text of finite size. Although this is not always the case - there also exist so-called monitor corpora to which texts can be added later - the majority of the existing corpora are finite in size. The third characteristic is that the corpus should be machine-readable. Advantages of machine-readable corpora over written or spoken formats are that they can be searched and manipulated at speed and that it is easier to enrich the corpus with extra information, such as part-of-speech tags. The last characteristic is that a corpus should constitute a standard reference for the language variety that it represents. Therefore, the corpus should be available for other researchers. 3.2 Types Corpora can be subdivided according to different criteria. Some general distinctions are discussed in this section, such as the distinctions between written and spoken corpora and between synchronic and diachronic corpora. 13

20 3.2. Types Corpora General corpora versus specialized corpora The first distinction that can be drawn is the distinction between corpora that are compiled for general purpose research (general corpora) and corpora that are highly domain-specific (specialized corpora). Corpora compiled for general purpose research are generally used for a wide variety of different research objectives. Because the scope of a specialized corpus is more specific, the group of researchers interested in such a corpus usually is smaller. However, it can be used for instance to highlight particular differences between standard language and specific registers (Kennedy, 1998). An example of a general corpus is the Spoken Dutch Corpus (Oostdijk et al., 2002). The CHILDES project can be classified as a specialized corpus, because it contains only corpora on child language and impaired language (MacWhinney, 2000a,b). Written corpora versus spoken corpora Initially, all language corpora consisted of written material collected from already existing text sources that were often electronically available (e.g. novels, newspapers, manuals). Nowadays, spoken language corpora have also been developed; in such corpora recorded speech has been transcribed. However, the differences between text and speech data data are very complex as orthographically transcribed speech is not the same as written text. Gibbon et al. (1997) mention eight important differences between written texts and spoken language that have to be taken into account. For example, the durability of text: written text stays on the paper when it is written down, speech is transient and therefore necessarily has to be recorded to make it accessible for future use. This is a rather trivial distinction, but a more practical difference is the time and money concerned in the development of corpora: developing written corpora is more time-consuming and more expensive. A third difference concerns the editing behaviour of speakers: interruptions, hesitations, repetitions of words, and self-repairs are properties of spoken language usually not present in written texts. The Spoken Dutch Corpus is a prime example of a spoken corpus (Oostdijk et al., 2002), whereas the Brown Corpus contains only texts from written sources (Francis and Kucera, 1979). Synchronic corpora versus diachronic corpora Corpora can be designed and used for synchronic or diachronic studies. A synchronic corpus is an attempt to represent a language or a text type of one particular time span whereas a diachronic corpus represents a language or text type over a period of time in order to make it possible to investigate language changes and differences over time (Kennedy, 1998). Most corpora are synchronic, examples are the British National Corpus (Aston and Burnard, 1998) and the Spoken Dutch Corpus (Oostdijk et al., 2002). The Helsinki Corpus of English Texts contains a diachronic part covering the period between 750 and 1700 (Kytö, 1996). Monolingual corpora versus multilingual corpora A corpus may contain texts in one language (monolingual corpus) or in multiple languages (multilingual corpus). Most corpora are monolingual, such as the British National Corpus (Aston and Burnard, 1998) and the Spoken Dutch Corpus (Oostdijk et al., 2002). Within the multilingual 14

21 3.3. General issues Corpora corpora a distinction is made between comparable corpora and parallel corpora. A parallel corpus is a collection of texts, each of which is translated into one or more other languages than the original. The simplest case is where two languages only are involved: one of the corpora is an exact translation of the other. Some parallel corpora, however, exist in several languages. Parallel corpora are considered to be a very interesting research topic at the moment, because of the opportunity to align the original text and the translation, and to gain insights into the nature of translation. The English-Norwegian Parallel Corpus is a parallel corpus of English and Norwegian texts (Oksefjell, 1999). A comparable corpus is one which selects similar texts in more than one language or variety. The possibilities of a comparable corpus are to compare different languages or varieties in similar circumstances of communication (MacEnery and Wilson, 1996). The ECI Corpus is an example of a comparable corpus, it contains texts from several European languages (Armstrong-Warwick et al., 1994). Dynamic corpora versus static corpora Most corpora are finite in size. For instance, the British National Corpus (Aston and Burnard, 1998) and the Spoken Dutch Corpus (Oostdijk et al., 2002), are both static corpora. Dynamic corpora, also referred to as monitor corpora, on the other hand consist of a growing, non-finite collection of texts. A monitor corpus can be used to perform research after language changes (MacEnery and Wilson, 1996). The Corpus di Italiano Scritto (CORIS) is a general reference corpus of present-day written Italian. It follows a dynamic corpus model, which is updated every two years (Rossini Favretti et al., 2001). 3.3 General issues According to Kennedy (1998), there are some points that have to be considered before a corpus can be developed. In this section these points are discussed. Purpose The compiler of the corpus has to formulate what its purpose will be. What kind of research questions will be addressed with it? Different goals require different types of corpora: a corpus used for lexical studies requires another design than a corpus that is used for grammatical studies, and for sociolinguistics other issues are important than for psycholinguistics. It is possible to decide which of the characteristics mentioned in Section 3.1 the corpus has to fulfill once the purpose of the corpus is known. Text types Once the goal of the corpus is known, the developers have to decide what text types should be incorporated in the corpus. For a general corpus, as many as possible text types should be in the corpus, whereas for a specific corpus about the style of texts written by English authors in the 18th century, only English texts from the 18th century are needed. It is important that the corpus contains as many as possible text types of the language variety it represents. Because the corpus 15

22 3.3. General issues Corpora is a representation of that specific variety, it is important that it contains a balanced language sample of the variety. Permission Corpus compilers must observe copyright laws. This is not only the case for written texts, where permission must be obtained from authors and publishers, but also for spoken text. The key issue for the collection of spoken text is that there is no invasion of personal privacy. Markup Inconsistent methods of encoding text can cause confusion. Therefore, standards have been developed for the electronic encoding of text. Following a standard facilitates the portability of electronic texts, making it possible to re-use them in different contexts on different equipment. In 1988, the Text Encoding Initiative (TEI) started. The goal of this initiative was to formulate standards for text documentation, text representation, text analysis and interpretation, and metalanguage and syntax issues. This resulted in a first draft of the TEI guidelines in 1990 under the title Guidelines for the Encoding and Interchange of Machine-Readable Texts. In the course of years, the guidelines have changed, the current version of the guidelines, TEI P4, was published in 2002 (Sperberg-McQueen and Burnard, 2004). The TEI guidelines provide means of representing those features of a text which need to be identified explicitly in order to facilitate processing of the text by computer programs (Sperberg- McQueen and Burnard, 2004). It is an application of the markup language SGML. The guidelines specify a set of tags which may be inserted in the electronic representation of the text, in order to mark the text structure and other textual features of interest. Without such explicit markers, many important features remain difficult to locate by mechanical means such as computer programs, and thus difficult to process effectively. The process of inserting such explicit markers for implicit textual features is often called markup, and the term markup language denotes the rules which govern the use of markup in a set of encodings (Sperberg-McQueen and Burnard, 2004; Kennedy, 1998). Metadata Metadata can be defined as data about data. When speaking about corpora, the term refers to the kind of data that is needed to describe a text in sufficient detail and with sufficient accuracy for some program to determine whether or not that text is relevant in a particular case. Or, the kind of data needed to describe a speaker in sufficient detail and with sufficient accuracy for some program to determine whether or not that person is relevant in a particular case. The metadata play a key role in organizing the ways in which a corpus can be meaningfully processed. Multiple levels of metadata may be associated with a corpus. First, information relating to the corpus as a whole (e.g., its title, its purpose). Second, information relating to the individual components of the corpus (e.g., the bibliographic description of an article) and third, information about the speakers. The TEI guidelines also specify standards for metadata. 16

Index. Language Test (ANELT), 29, 235 auditory comprehension, 4,58, 100 Blissymbolics, 305

Index. Language Test (ANELT), 29, 235 auditory comprehension, 4,58, 100 Blissymbolics, 305 A Aachen Aphasia Test (AAT), 60-61, 70-73, 80, 233-234, 246, 250, 310-.311 Agraphia, 59 Alexia, 59 Amer-Ind Code, 354-355, 359-360 Amsterdam Nimmejen Everyday Language Test (ANELT), 29, 235 Aphasia amnestic,

More information

Beeson, P. M. (1999). Treating acquired writing impairment. Aphasiology, 13,

Beeson, P. M. (1999). Treating acquired writing impairment. Aphasiology, 13, Pure alexia is a well-documented syndrome characterized by impaired reading in the context of relatively intact spelling, resulting from lesions of the left temporo-occipital region (Coltheart, 1998).

More information

Improved Effects of Word-Retrieval Treatments Subsequent to Addition of the Orthographic Form

Improved Effects of Word-Retrieval Treatments Subsequent to Addition of the Orthographic Form Orthographic Form 1 Improved Effects of Word-Retrieval Treatments Subsequent to Addition of the Orthographic Form The development and testing of word-retrieval treatments for aphasia has generally focused

More information

Florida Reading Endorsement Alignment Matrix Competency 1

Florida Reading Endorsement Alignment Matrix Competency 1 Florida Reading Endorsement Alignment Matrix Competency 1 Reading Endorsement Guiding Principle: Teachers will understand and teach reading as an ongoing strategic process resulting in students comprehending

More information

Program Matrix - Reading English 6-12 (DOE Code 398) University of Florida. Reading

Program Matrix - Reading English 6-12 (DOE Code 398) University of Florida. Reading Program Requirements Competency 1: Foundations of Instruction 60 In-service Hours Teachers will develop substantive understanding of six components of reading as a process: comprehension, oral language,

More information

CEFR Overall Illustrative English Proficiency Scales

CEFR Overall Illustrative English Proficiency Scales CEFR Overall Illustrative English Proficiency s CEFR CEFR OVERALL ORAL PRODUCTION Has a good command of idiomatic expressions and colloquialisms with awareness of connotative levels of meaning. Can convey

More information

DIBELS Next BENCHMARK ASSESSMENTS

DIBELS Next BENCHMARK ASSESSMENTS DIBELS Next BENCHMARK ASSESSMENTS Click to edit Master title style Benchmark Screening Benchmark testing is the systematic process of screening all students on essential skills predictive of later reading

More information

have to be modeled) or isolated words. Output of the system is a grapheme-tophoneme conversion system which takes as its input the spelling of words,

have to be modeled) or isolated words. Output of the system is a grapheme-tophoneme conversion system which takes as its input the spelling of words, A Language-Independent, Data-Oriented Architecture for Grapheme-to-Phoneme Conversion Walter Daelemans and Antal van den Bosch Proceedings ESCA-IEEE speech synthesis conference, New York, September 1994

More information

Corpus Linguistics (L615)

Corpus Linguistics (L615) (L615) Basics of Markus Dickinson Department of, Indiana University Spring 2013 1 / 23 : the extent to which a sample includes the full range of variability in a population distinguishes corpora from archives

More information

CLASSIFICATION OF PROGRAM Critical Elements Analysis 1. High Priority Items Phonemic Awareness Instruction

CLASSIFICATION OF PROGRAM Critical Elements Analysis 1. High Priority Items Phonemic Awareness Instruction CLASSIFICATION OF PROGRAM Critical Elements Analysis 1 Program Name: Macmillan/McGraw Hill Reading 2003 Date of Publication: 2003 Publisher: Macmillan/McGraw Hill Reviewer Code: 1. X The program meets

More information

Organizing Comprehensive Literacy Assessment: How to Get Started

Organizing Comprehensive Literacy Assessment: How to Get Started Organizing Comprehensive Assessment: How to Get Started September 9 & 16, 2009 Questions to Consider How do you design individualized, comprehensive instruction? How can you determine where to begin instruction?

More information

English Language and Applied Linguistics. Module Descriptions 2017/18

English Language and Applied Linguistics. Module Descriptions 2017/18 English Language and Applied Linguistics Module Descriptions 2017/18 Level I (i.e. 2 nd Yr.) Modules Please be aware that all modules are subject to availability. If you have any questions about the modules,

More information

Understanding and Supporting Dyslexia Godstone Village School. January 2017

Understanding and Supporting Dyslexia Godstone Village School. January 2017 Understanding and Supporting Dyslexia Godstone Village School January 2017 By then end of the session I will: Have a greater understanding of Dyslexia and the ways in which children can be affected by

More information

Intra-talker Variation: Audience Design Factors Affecting Lexical Selections

Intra-talker Variation: Audience Design Factors Affecting Lexical Selections Tyler Perrachione LING 451-0 Proseminar in Sound Structure Prof. A. Bradlow 17 March 2006 Intra-talker Variation: Audience Design Factors Affecting Lexical Selections Abstract Although the acoustic and

More information

Stages of Literacy Ros Lugg

Stages of Literacy Ros Lugg Beginning readers in the USA Stages of Literacy Ros Lugg Looked at predictors of reading success or failure Pre-readers readers aged 3-53 5 yrs Looked at variety of abilities IQ Speech and language abilities

More information

2,1 .,,, , %, ,,,,,,. . %., Butterworth,)?.(1989; Levelt, 1989; Levelt et al., 1991; Levelt, Roelofs & Meyer, 1999

2,1 .,,, , %, ,,,,,,. . %., Butterworth,)?.(1989; Levelt, 1989; Levelt et al., 1991; Levelt, Roelofs & Meyer, 1999 23-47 57 (2006)? : 1 21 2 1 : ( ) $ % 24 ( ) 200 ( ) ) ( % : % % % Butterworth)? (1989; Levelt 1989; Levelt et al 1991; Levelt Roelofs & Meyer 1999 () " 2 ) ( ) ( Brown & McNeill 1966; Morton 1969 1979;

More information

Review in ICAME Journal, Volume 38, 2014, DOI: /icame

Review in ICAME Journal, Volume 38, 2014, DOI: /icame Review in ICAME Journal, Volume 38, 2014, DOI: 10.2478/icame-2014-0012 Gaëtanelle Gilquin and Sylvie De Cock (eds.). Errors and disfluencies in spoken corpora. Amsterdam: John Benjamins. 2013. 172 pp.

More information

STAFF DEVELOPMENT in SPECIAL EDUCATION

STAFF DEVELOPMENT in SPECIAL EDUCATION STAFF DEVELOPMENT in SPECIAL EDUCATION Factors Affecting Curriculum for Students with Special Needs AASEP s Staff Development Course FACTORS AFFECTING CURRICULUM Copyright AASEP (2006) 1 of 10 After taking

More information

How to Judge the Quality of an Objective Classroom Test

How to Judge the Quality of an Objective Classroom Test How to Judge the Quality of an Objective Classroom Test Technical Bulletin #6 Evaluation and Examination Service The University of Iowa (319) 335-0356 HOW TO JUDGE THE QUALITY OF AN OBJECTIVE CLASSROOM

More information

Dyslexia/dyslexic, 3, 9, 24, 97, 187, 189, 206, 217, , , 367, , , 397,

Dyslexia/dyslexic, 3, 9, 24, 97, 187, 189, 206, 217, , , 367, , , 397, Adoption studies, 274 275 Alliteration skill, 113, 115, 117 118, 122 123, 128, 136, 138 Alphabetic writing system, 5, 40, 127, 136, 410, 415 Alphabets (types of ) artificial transparent alphabet, 5 German

More information

Language Acquisition Chart

Language Acquisition Chart Language Acquisition Chart This chart was designed to help teachers better understand the process of second language acquisition. Please use this chart as a resource for learning more about the way people

More information

SLINGERLAND: A Multisensory Structured Language Instructional Approach

SLINGERLAND: A Multisensory Structured Language Instructional Approach SLINGERLAND: A Multisensory Structured Language Instructional Approach nancycushenwhite@gmail.com Lexicon Reading Center Dubai Teaching Reading IS Rocket Science 5% will learn to read on their own. 20-30%

More information

Discussion Data reported here confirm and extend the findings of Antonucci (2009) which provided preliminary evidence that SFA treatment can result

Discussion Data reported here confirm and extend the findings of Antonucci (2009) which provided preliminary evidence that SFA treatment can result Background Semantic Feature Analysis (SFA), which trains individuals to access semantic knowledge to facilitate access to specific labels, takes advantage of the fact that lexical retrieval is predicated

More information

Presentation Summary. Methods. Qualitative Approach

Presentation Summary. Methods. Qualitative Approach Presentation Summary Reading difficulties experienced by people with aphasia adversely impact their ability to access reading materials including novels, magazines, letters and health information (Brennan,

More information

Accelerated Learning Course Outline

Accelerated Learning Course Outline Accelerated Learning Course Outline Course Description The purpose of this course is to make the advances in the field of brain research more accessible to educators. The techniques and strategies of Accelerated

More information

Mandarin Lexical Tone Recognition: The Gating Paradigm

Mandarin Lexical Tone Recognition: The Gating Paradigm Kansas Working Papers in Linguistics, Vol. 0 (008), p. 8 Abstract Mandarin Lexical Tone Recognition: The Gating Paradigm Yuwen Lai and Jie Zhang University of Kansas Research on spoken word recognition

More information

Applications of memory-based natural language processing

Applications of memory-based natural language processing Applications of memory-based natural language processing Antal van den Bosch and Roser Morante ILK Research Group Tilburg University Prague, June 24, 2007 Current ILK members Principal investigator: Antal

More information

Phonological and Phonetic Representations: The Case of Neutralization

Phonological and Phonetic Representations: The Case of Neutralization Phonological and Phonetic Representations: The Case of Neutralization Allard Jongman University of Kansas 1. Introduction The present paper focuses on the phenomenon of phonological neutralization to consider

More information

AGENDA LEARNING THEORIES LEARNING THEORIES. Advanced Learning Theories 2/22/2016

AGENDA LEARNING THEORIES LEARNING THEORIES. Advanced Learning Theories 2/22/2016 AGENDA Advanced Learning Theories Alejandra J. Magana, Ph.D. admagana@purdue.edu Introduction to Learning Theories Role of Learning Theories and Frameworks Learning Design Research Design Dual Coding Theory

More information

Films for ESOL training. Section 2 - Language Experience

Films for ESOL training. Section 2 - Language Experience Films for ESOL training Section 2 - Language Experience Introduction Foreword These resources were compiled with ESOL teachers in the UK in mind. They introduce a number of approaches and focus on giving

More information

Language Acquisition by Identical vs. Fraternal SLI Twins * Karin Stromswold & Jay I. Rifkin

Language Acquisition by Identical vs. Fraternal SLI Twins * Karin Stromswold & Jay I. Rifkin Stromswold & Rifkin, Language Acquisition by MZ & DZ SLI Twins (SRCLD, 1996) 1 Language Acquisition by Identical vs. Fraternal SLI Twins * Karin Stromswold & Jay I. Rifkin Dept. of Psychology & Ctr. for

More information

AN INTRODUCTION (2 ND ED.) (LONDON, BLOOMSBURY ACADEMIC PP. VI, 282)

AN INTRODUCTION (2 ND ED.) (LONDON, BLOOMSBURY ACADEMIC PP. VI, 282) B. PALTRIDGE, DISCOURSE ANALYSIS: AN INTRODUCTION (2 ND ED.) (LONDON, BLOOMSBURY ACADEMIC. 2012. PP. VI, 282) Review by Glenda Shopen _ This book is a revised edition of the author s 2006 introductory

More information

South Carolina English Language Arts

South Carolina English Language Arts South Carolina English Language Arts A S O F J U N E 2 0, 2 0 1 0, T H I S S TAT E H A D A D O P T E D T H E CO M M O N CO R E S TAT E S TA N DA R D S. DOCUMENTS REVIEWED South Carolina Academic Content

More information

Reading Horizons. A Look At Linguistic Readers. Nicholas P. Criscuolo APRIL Volume 10, Issue Article 5

Reading Horizons. A Look At Linguistic Readers. Nicholas P. Criscuolo APRIL Volume 10, Issue Article 5 Reading Horizons Volume 10, Issue 3 1970 Article 5 APRIL 1970 A Look At Linguistic Readers Nicholas P. Criscuolo New Haven, Connecticut Public Schools Copyright c 1970 by the authors. Reading Horizons

More information

The College Board Redesigned SAT Grade 12

The College Board Redesigned SAT Grade 12 A Correlation of, 2017 To the Redesigned SAT Introduction This document demonstrates how myperspectives English Language Arts meets the Reading, Writing and Language and Essay Domains of Redesigned SAT.

More information

Changing User Attitudes to Reduce Spreadsheet Risk

Changing User Attitudes to Reduce Spreadsheet Risk Changing User Attitudes to Reduce Spreadsheet Risk Dermot Balson Perth, Australia Dermot.Balson@Gmail.com ABSTRACT A business case study on how three simple guidelines: 1. make it easy to check (and maintain)

More information

REVIEW OF CONNECTED SPEECH

REVIEW OF CONNECTED SPEECH Language Learning & Technology http://llt.msu.edu/vol8num1/review2/ January 2004, Volume 8, Number 1 pp. 24-28 REVIEW OF CONNECTED SPEECH Title Connected Speech (North American English), 2000 Platform

More information

The Common European Framework of Reference for Languages p. 58 to p. 82

The Common European Framework of Reference for Languages p. 58 to p. 82 The Common European Framework of Reference for Languages p. 58 to p. 82 -- Chapter 4 Language use and language user/learner in 4.1 «Communicative language activities and strategies» -- Oral Production

More information

Linking Task: Identifying authors and book titles in verbose queries

Linking Task: Identifying authors and book titles in verbose queries Linking Task: Identifying authors and book titles in verbose queries Anaïs Ollagnier, Sébastien Fournier, and Patrice Bellot Aix-Marseille University, CNRS, ENSAM, University of Toulon, LSIS UMR 7296,

More information

Part I. Figuring out how English works

Part I. Figuring out how English works 9 Part I Figuring out how English works 10 Chapter One Interaction and grammar Grammar focus. Tag questions Introduction. How closely do you pay attention to how English is used around you? For example,

More information

The Strong Minimalist Thesis and Bounded Optimality

The Strong Minimalist Thesis and Bounded Optimality The Strong Minimalist Thesis and Bounded Optimality DRAFT-IN-PROGRESS; SEND COMMENTS TO RICKL@UMICH.EDU Richard L. Lewis Department of Psychology University of Michigan 27 March 2010 1 Purpose of this

More information

TA Script of Student Test Directions

TA Script of Student Test Directions TA Script of Student Test Directions SMARTER BALANCED PAPER-PENCIL Spring 2017 ELA Grade 6 Paper Summative Assessment School Test Coordinator Contact Information Name: Email: Phone: ( ) Cell: ( ) Visit

More information

Clinical Review Criteria Related to Speech Therapy 1

Clinical Review Criteria Related to Speech Therapy 1 Clinical Review Criteria Related to Speech Therapy 1 I. Definition Speech therapy is covered for restoration or improved speech in members who have a speechlanguage disorder as a result of a non-chronic

More information

Accelerated Learning Online. Course Outline

Accelerated Learning Online. Course Outline Accelerated Learning Online Course Outline Course Description The purpose of this course is to make the advances in the field of brain research more accessible to educators. The techniques and strategies

More information

Assessing speaking skills:. a workshop for teacher development. Ben Knight

Assessing speaking skills:. a workshop for teacher development. Ben Knight Assessing speaking skills:. a workshop for teacher development Ben Knight Speaking skills are often considered the most important part of an EFL course, and yet the difficulties in testing oral skills

More information

Chapter 9: Conducting Interviews

Chapter 9: Conducting Interviews Chapter 9: Conducting Interviews Chapter 9: Conducting Interviews Chapter Outline: 9.1 Interviewing: A Matter of Styles 9.2 Preparing for the Interview 9.3 Example of a Legal Interview 9.1 INTERVIEWING:

More information

ELA/ELD Standards Correlation Matrix for ELD Materials Grade 1 Reading

ELA/ELD Standards Correlation Matrix for ELD Materials Grade 1 Reading ELA/ELD Correlation Matrix for ELD Materials Grade 1 Reading The English Language Arts (ELA) required for the one hour of English-Language Development (ELD) Materials are listed in Appendix 9-A, Matrix

More information

Teachers: Use this checklist periodically to keep track of the progress indicators that your learners have displayed.

Teachers: Use this checklist periodically to keep track of the progress indicators that your learners have displayed. Teachers: Use this checklist periodically to keep track of the progress indicators that your learners have displayed. Speaking Standard Language Aspect: Purpose and Context Benchmark S1.1 To exit this

More information

Possessive have and (have) got in New Zealand English Heidi Quinn, University of Canterbury, New Zealand

Possessive have and (have) got in New Zealand English Heidi Quinn, University of Canterbury, New Zealand 1 Introduction Possessive have and (have) got in New Zealand English Heidi Quinn, University of Canterbury, New Zealand heidi.quinn@canterbury.ac.nz NWAV 33, Ann Arbor 1 October 24 This paper looks at

More information

The development of a new learner s dictionary for Modern Standard Arabic: the linguistic corpus approach

The development of a new learner s dictionary for Modern Standard Arabic: the linguistic corpus approach BILINGUAL LEARNERS DICTIONARIES The development of a new learner s dictionary for Modern Standard Arabic: the linguistic corpus approach Mark VAN MOL, Leuven, Belgium Abstract This paper reports on the

More information

NAME: East Carolina University PSYC Developmental Psychology Dr. Eppler & Dr. Ironsmith

NAME: East Carolina University PSYC Developmental Psychology Dr. Eppler & Dr. Ironsmith Module 10 1 NAME: East Carolina University PSYC 3206 -- Developmental Psychology Dr. Eppler & Dr. Ironsmith Study Questions for Chapter 10: Language and Education Sigelman & Rider (2009). Life-span human

More information

Introduction to the Common European Framework (CEF)

Introduction to the Common European Framework (CEF) Introduction to the Common European Framework (CEF) The Common European Framework is a common reference for describing language learning, teaching, and assessment. In order to facilitate both teaching

More information

Formulaic Language and Fluency: ESL Teaching Applications

Formulaic Language and Fluency: ESL Teaching Applications Formulaic Language and Fluency: ESL Teaching Applications Formulaic Language Terminology Formulaic sequence One such item Formulaic language Non-count noun referring to these items Phraseology The study

More information

Rubric for Scoring English 1 Unit 1, Rhetorical Analysis

Rubric for Scoring English 1 Unit 1, Rhetorical Analysis FYE Program at Marquette University Rubric for Scoring English 1 Unit 1, Rhetorical Analysis Writing Conventions INTEGRATING SOURCE MATERIAL 3 Proficient Outcome Effectively expresses purpose in the introduction

More information

IMPROVING SPEAKING SKILL OF THE TENTH GRADE STUDENTS OF SMK 17 AGUSTUS 1945 MUNCAR THROUGH DIRECT PRACTICE WITH THE NATIVE SPEAKER

IMPROVING SPEAKING SKILL OF THE TENTH GRADE STUDENTS OF SMK 17 AGUSTUS 1945 MUNCAR THROUGH DIRECT PRACTICE WITH THE NATIVE SPEAKER IMPROVING SPEAKING SKILL OF THE TENTH GRADE STUDENTS OF SMK 17 AGUSTUS 1945 MUNCAR THROUGH DIRECT PRACTICE WITH THE NATIVE SPEAKER Mohamad Nor Shodiq Institut Agama Islam Darussalam (IAIDA) Banyuwangi

More information

English Language Arts Summative Assessment

English Language Arts Summative Assessment English Language Arts Summative Assessment 2016 Paper-Pencil Test Audio CDs are not available for the administration of the English Language Arts Session 2. The ELA Test Administration Listening Transcript

More information

First Grade Curriculum Highlights: In alignment with the Common Core Standards

First Grade Curriculum Highlights: In alignment with the Common Core Standards First Grade Curriculum Highlights: In alignment with the Common Core Standards ENGLISH LANGUAGE ARTS Foundational Skills Print Concepts Demonstrate understanding of the organization and basic features

More information

Welcome to the Purdue OWL. Where do I begin? General Strategies. Personalizing Proofreading

Welcome to the Purdue OWL. Where do I begin? General Strategies. Personalizing Proofreading Welcome to the Purdue OWL This page is brought to you by the OWL at Purdue (http://owl.english.purdue.edu/). When printing this page, you must include the entire legal notice at bottom. Where do I begin?

More information

ASSISTIVE COMMUNICATION

ASSISTIVE COMMUNICATION ASSISTIVE COMMUNICATION Rupal Patel, Ph.D. Northeastern University Department of Speech Language Pathology & Audiology & Computer and Information Sciences www.cadlab.neu.edu Communication Disorders Language

More information

Linguistics. Undergraduate. Departmental Honors. Graduate. Faculty. Linguistics 1

Linguistics. Undergraduate. Departmental Honors. Graduate. Faculty. Linguistics 1 Linguistics 1 Linguistics Matthew Gordon, Chair Interdepartmental Program in the College of Arts and Science 223 Tate Hall (573) 882-6421 gordonmj@missouri.edu Kibby Smith, Advisor Office of Multidisciplinary

More information

PART 1. A. Safer Keyboarding Introduction. B. Fifteen Principles of Safer Keyboarding Instruction

PART 1. A. Safer Keyboarding Introduction. B. Fifteen Principles of Safer Keyboarding Instruction Subject: Speech & Handwriting/Input Technologies Newsletter 1Q 2003 - Idaho Date: Sun, 02 Feb 2003 20:15:01-0700 From: Karl Barksdale To: info@speakingsolutions.com This is the

More information

COMPETENCY-BASED STATISTICS COURSES WITH FLEXIBLE LEARNING MATERIALS

COMPETENCY-BASED STATISTICS COURSES WITH FLEXIBLE LEARNING MATERIALS COMPETENCY-BASED STATISTICS COURSES WITH FLEXIBLE LEARNING MATERIALS Martin M. A. Valcke, Open Universiteit, Educational Technology Expertise Centre, The Netherlands This paper focuses on research and

More information

TRAITS OF GOOD WRITING

TRAITS OF GOOD WRITING TRAITS OF GOOD WRITING Each paper was scored on a scale of - on the following traits of good writing: Ideas and Content: Organization: Voice: Word Choice: Sentence Fluency: Conventions: The ideas are clear,

More information

Universal Design for Learning Lesson Plan

Universal Design for Learning Lesson Plan Universal Design for Learning Lesson Plan Teacher(s): Alexandra Romano Date: April 9 th, 2014 Subject: English Language Arts NYS Common Core Standard: RL.5 Reading Standards for Literature Cluster Key

More information

Think A F R I C A when assessing speaking. C.E.F.R. Oral Assessment Criteria. Think A F R I C A - 1 -

Think A F R I C A when assessing speaking. C.E.F.R. Oral Assessment Criteria. Think A F R I C A - 1 - C.E.F.R. Oral Assessment Criteria Think A F R I C A - 1 - 1. The extracts in the left hand column are taken from the official descriptors of the CEFR levels. How would you grade them on a scale of low,

More information

Author: Justyna Kowalczys Stowarzyszenie Angielski w Medycynie (PL) Feb 2015

Author: Justyna Kowalczys Stowarzyszenie Angielski w Medycynie (PL)  Feb 2015 Author: Justyna Kowalczys Stowarzyszenie Angielski w Medycynie (PL) www.angielskiwmedycynie.org.pl Feb 2015 Developing speaking abilities is a prerequisite for HELP in order to promote effective communication

More information

Extending Place Value with Whole Numbers to 1,000,000

Extending Place Value with Whole Numbers to 1,000,000 Grade 4 Mathematics, Quarter 1, Unit 1.1 Extending Place Value with Whole Numbers to 1,000,000 Overview Number of Instructional Days: 10 (1 day = 45 minutes) Content to Be Learned Recognize that a digit

More information

Procedia - Social and Behavioral Sciences 141 ( 2014 ) WCLTA Using Corpus Linguistics in the Development of Writing

Procedia - Social and Behavioral Sciences 141 ( 2014 ) WCLTA Using Corpus Linguistics in the Development of Writing Available online at www.sciencedirect.com ScienceDirect Procedia - Social and Behavioral Sciences 141 ( 2014 ) 124 128 WCLTA 2013 Using Corpus Linguistics in the Development of Writing Blanka Frydrychova

More information

Psychology and Language

Psychology and Language Psychology and Language Psycholinguistics is the study about the casual connection within human being linking experience with speaking and writing, and hearing and reading with further behavior (Robins,

More information

Human Factors Computer Based Training in Air Traffic Control

Human Factors Computer Based Training in Air Traffic Control Paper presented at Ninth International Symposium on Aviation Psychology, Columbus, Ohio, USA, April 28th to May 1st 1997. Human Factors Computer Based Training in Air Traffic Control A. Bellorini 1, P.

More information

1. REFLEXES: Ask questions about coughing, swallowing, of water as fast as possible (note! Not suitable for all

1. REFLEXES: Ask questions about coughing, swallowing, of water as fast as possible (note! Not suitable for all Human Communication Science Chandler House, 2 Wakefield Street London WC1N 1PF http://www.hcs.ucl.ac.uk/ ACOUSTICS OF SPEECH INTELLIGIBILITY IN DYSARTHRIA EUROPEAN MASTER S S IN CLINICAL LINGUISTICS UNIVERSITY

More information

Guidelines for Writing an Internship Report

Guidelines for Writing an Internship Report Guidelines for Writing an Internship Report Master of Commerce (MCOM) Program Bahauddin Zakariya University, Multan Table of Contents Table of Contents... 2 1. Introduction.... 3 2. The Required Components

More information

DOES RETELLING TECHNIQUE IMPROVE SPEAKING FLUENCY?

DOES RETELLING TECHNIQUE IMPROVE SPEAKING FLUENCY? DOES RETELLING TECHNIQUE IMPROVE SPEAKING FLUENCY? Noor Rachmawaty (itaw75123@yahoo.com) Istanti Hermagustiana (dulcemaria_81@yahoo.com) Universitas Mulawarman, Indonesia Abstract: This paper is based

More information

Training Staff with Varying Abilities and Special Needs

Training Staff with Varying Abilities and Special Needs Training Staff with Varying Abilities and Special Needs by Randy Boardman and Renée Fucilla In your role as a Nonviolent Crisis Intervention Certified Instructor, it is likely that at some point you will

More information

Course Law Enforcement II. Unit I Careers in Law Enforcement

Course Law Enforcement II. Unit I Careers in Law Enforcement Course Law Enforcement II Unit I Careers in Law Enforcement Essential Question How does communication affect the role of the public safety professional? TEKS 130.294(c) (1)(A)(B)(C) Prior Student Learning

More information

Module 12. Machine Learning. Version 2 CSE IIT, Kharagpur

Module 12. Machine Learning. Version 2 CSE IIT, Kharagpur Module 12 Machine Learning 12.1 Instructional Objective The students should understand the concept of learning systems Students should learn about different aspects of a learning system Students should

More information

Understanding the Relationship between Comprehension and Production

Understanding the Relationship between Comprehension and Production Carnegie Mellon University Research Showcase @ CMU Department of Psychology Dietrich College of Humanities and Social Sciences 1-1987 Understanding the Relationship between Comprehension and Production

More information

Speech Recognition at ICSI: Broadcast News and beyond

Speech Recognition at ICSI: Broadcast News and beyond Speech Recognition at ICSI: Broadcast News and beyond Dan Ellis International Computer Science Institute, Berkeley CA Outline 1 2 3 The DARPA Broadcast News task Aspects of ICSI

More information

How to analyze visual narratives: A tutorial in Visual Narrative Grammar

How to analyze visual narratives: A tutorial in Visual Narrative Grammar How to analyze visual narratives: A tutorial in Visual Narrative Grammar Neil Cohn 2015 neilcohn@visuallanguagelab.com www.visuallanguagelab.com Abstract Recent work has argued that narrative sequential

More information

The Cambridge Cookie-Theft Corpus: A Corpus of Directed and Spontaneous Speech of Brain-Damaged Patients and Healthy Individuals

The Cambridge Cookie-Theft Corpus: A Corpus of Directed and Spontaneous Speech of Brain-Damaged Patients and Healthy Individuals The Cambridge Cookie-Theft Corpus: A Corpus of Directed and Spontaneous Speech of Brain-Damaged Patients and Healthy Individuals Caroline Williams, Andrew Thwaites, Paula Buttery, Jeroen Geertzen Billi

More information

CALIFORNIA STATE UNIVERSITY, SAN MARCOS SCHOOL OF EDUCATION

CALIFORNIA STATE UNIVERSITY, SAN MARCOS SCHOOL OF EDUCATION CALIFORNIA STATE UNIVERSITY, SAN MARCOS SCHOOL OF EDUCATION COURSE: EDSL 691: Neuroscience for the Speech-Language Pathologist (3 units) Fall 2012 Wednesdays 9:00-12:00pm Location: KEL 5102 Professor:

More information

Preprint.

Preprint. http://www.diva-portal.org Preprint This is the submitted version of a paper presented at Privacy in Statistical Databases'2006 (PSD'2006), Rome, Italy, 13-15 December, 2006. Citation for the original

More information

Effective Instruction for Struggling Readers

Effective Instruction for Struggling Readers Section II Effective Instruction for Struggling Readers Chapter 5 Components of Effective Instruction After conducting assessments, Ms. Lopez should be aware of her students needs in the following areas:

More information

Longitudinal family-risk studies of dyslexia: why. develop dyslexia and others don t.

Longitudinal family-risk studies of dyslexia: why. develop dyslexia and others don t. The Dyslexia Handbook 2013 69 Aryan van der Leij, Elsje van Bergen and Peter de Jong Longitudinal family-risk studies of dyslexia: why some children develop dyslexia and others don t. Longitudinal family-risk

More information

The taming of the data:

The taming of the data: The taming of the data: Using text mining in building a corpus for diachronic analysis Stefania Degaetano-Ortlieb, Hannah Kermes, Ashraf Khamis, Jörg Knappen, Noam Ordan and Elke Teich Background Big data

More information

5. UPPER INTERMEDIATE

5. UPPER INTERMEDIATE Triolearn General Programmes adapt the standards and the Qualifications of Common European Framework of Reference (CEFR) and Cambridge ESOL. It is designed to be compatible to the local and the regional

More information

YMCA SCHOOL AGE CHILD CARE PROGRAM PLAN

YMCA SCHOOL AGE CHILD CARE PROGRAM PLAN YMCA SCHOOL AGE CHILD CARE PROGRAM PLAN (normal view is landscape, not portrait) SCHOOL AGE DOMAIN SKILLS ARE SOCIAL: COMMUNICATION, LANGUAGE AND LITERACY: EMOTIONAL: COGNITIVE: PHYSICAL: DEVELOPMENTAL

More information

1 st Quarter (September, October, November) August/September Strand Topic Standard Notes Reading for Literature

1 st Quarter (September, October, November) August/September Strand Topic Standard Notes Reading for Literature 1 st Grade Curriculum Map Common Core Standards Language Arts 2013 2014 1 st Quarter (September, October, November) August/September Strand Topic Standard Notes Reading for Literature Key Ideas and Details

More information

ADHD Classroom Accommodations for Specific Behaviour

ADHD Classroom Accommodations for Specific Behaviour ADHD Classroom Accommodations for Specific Behaviour 1.Difficulty following a plan (has high aspirations but lacks follow-through); wants to get A s but ends up with F s and doesn t understand where he

More information

Fountas-Pinnell Level P Informational Text

Fountas-Pinnell Level P Informational Text LESSON 7 TEACHER S GUIDE Now Showing in Your Living Room by Lisa Cocca Fountas-Pinnell Level P Informational Text Selection Summary This selection spans the history of television in the United States,

More information

CS 598 Natural Language Processing

CS 598 Natural Language Processing CS 598 Natural Language Processing Natural language is everywhere Natural language is everywhere Natural language is everywhere Natural language is everywhere!"#$%&'&()*+,-./012 34*5665756638/9:;< =>?@ABCDEFGHIJ5KL@

More information

Introduction. Beáta B. Megyesi. Uppsala University Department of Linguistics and Philology Introduction 1(48)

Introduction. Beáta B. Megyesi. Uppsala University Department of Linguistics and Philology Introduction 1(48) Introduction Beáta B. Megyesi Uppsala University Department of Linguistics and Philology beata.megyesi@lingfil.uu.se Introduction 1(48) Course content Credits: 7.5 ECTS Subject: Computational linguistics

More information

Software Maintenance

Software Maintenance 1 What is Software Maintenance? Software Maintenance is a very broad activity that includes error corrections, enhancements of capabilities, deletion of obsolete capabilities, and optimization. 2 Categories

More information

Signs, Signals, and Codes Merit Badge Workbook

Signs, Signals, and Codes Merit Badge Workbook Merit Badge Workbook This workbook can help you but you still need to read the merit badge pamphlet. The work space provided for each requirement should be used by the Scout to make notes for discussing

More information

PAGE(S) WHERE TAUGHT If sub mission ins not a book, cite appropriate location(s))

PAGE(S) WHERE TAUGHT If sub mission ins not a book, cite appropriate location(s)) Ohio Academic Content Standards Grade Level Indicators (Grade 11) A. ACQUISITION OF VOCABULARY Students acquire vocabulary through exposure to language-rich situations, such as reading books and other

More information

Writing Functional Dysphagia Goals

Writing Functional Dysphagia Goals Writing Functional Dysphagia Goals Free PDF ebook Download: Writing Functional Dysphagia Goals Download or Read Online ebook writing functional dysphagia goals in PDF Format From The Best User Guide Database

More information

PRESENTED BY EDLY: FOR THE LOVE OF ABILITY

PRESENTED BY EDLY: FOR THE LOVE OF ABILITY HOW TO BE YOUR CHILD S BEST IEP ADVOCATE PRESENTED BY EDLY: FOR THE LOVE OF ABILITY 888-EDLYOWL (888-335-9695) info@edlyeducation.com Nothing presented either orally or written in this seminar should be

More information

Merbouh Zouaoui. Melouk Mohamed. Journal of Educational and Social Research MCSER Publishing, Rome-Italy. 1. Introduction

Merbouh Zouaoui. Melouk Mohamed. Journal of Educational and Social Research MCSER Publishing, Rome-Italy. 1. Introduction Acquiring Communication through Conversational Training: The Case Study of 1 st Year LMD Students at Djillali Liabès University Sidi Bel Abbès Algeria Doi:10.5901/jesr.2014.v4n6p353 Abstract Merbouh Zouaoui

More information

Atypical Prosodic Structure as an Indicator of Reading Level and Text Difficulty

Atypical Prosodic Structure as an Indicator of Reading Level and Text Difficulty Atypical Prosodic Structure as an Indicator of Reading Level and Text Difficulty Julie Medero and Mari Ostendorf Electrical Engineering Department University of Washington Seattle, WA 98195 USA {jmedero,ostendor}@uw.edu

More information