Manual annota)on in a func)onal- typological grammar study (A study on the Javanese dialect of Kudus, Indonesia) Noor Malihah

Similar documents
Types of Research EDUC 500

The Structure of Relative Clauses in Maay Maay By Elly Zimmer

BULATS A2 WORDLIST 2

Chapter 4: Valence & Agreement CSLI Publications

Approaches to control phenomena handout Obligatory control and morphological case: Icelandic and Basque

CS224d Deep Learning for Natural Language Processing. Richard Socher, PhD

Beyond constructions:

Introduction to HPSG. Introduction. Historical Overview. The HPSG architecture. Signature. Linguistic Objects. Descriptions.

THE VERB ARGUMENT BROWSER

Building an HPSG-based Indonesian Resource Grammar (INDRA)

Using a Native Language Reference Grammar as a Language Learning Tool

Words come in categories

The Acquisition of Person and Number Morphology Within the Verbal Domain in Early Greek

Writing a composition

Enhancing Unlexicalized Parsing Performance using a Wide Coverage Lexicon, Fuzzy Tag-set Mapping, and EM-HMM-based Lexical Probabilities

Developing Grammar in Context

Derivational and Inflectional Morphemes in Pak-Pak Language

Case government vs Case agreement: modelling Modern Greek case attraction phenomena in LFG

International Journal of Informative & Futuristic Research ISSN (Online):

Linguistic Variation across Sports Category of Press Reportage from British Newspapers: a Diachronic Multidimensional Analysis

A Minimalist Approach to Code-Switching. In the field of linguistics, the topic of bilingualism is a broad one. There are many

California Department of Education English Language Development Standards for Grade 8

The Role of the Head in the Interpretation of English Deverbal Compounds

ELD CELDT 5 EDGE Level C Curriculum Guide LANGUAGE DEVELOPMENT VOCABULARY COMMON WRITING PROJECT. ToolKit

Formulaic Language and Fluency: ESL Teaching Applications

What the National Curriculum requires in reading at Y5 and Y6

Underlying and Surface Grammatical Relations in Greek consider

Hindi-Urdu Phrase Structure Annotation

The Pennsylvania State University. The Graduate School. College of the Liberal Arts THE TEACHABILITY HYPOTHESIS AND CONCEPT-BASED INSTRUCTION

Argument structure and theta roles

THE FU CTIO OF ACCUSATIVE CASE I MO GOLIA *

Root Cause Analysis. Lean Construction Institute Provider Number H561. Root Cause Analysis RCA

Basic Syntax. Doug Arnold We review some basic grammatical ideas and terminology, and look at some common constructions in English.

Hindi Aspectual Verb Complexes

SAMPLE. Chapter 1: Background. A. Basic Introduction. B. Why It s Important to Teach/Learn Grammar in the First Place

Loughton School s curriculum evening. 28 th February 2017

Adjectives tell you more about a noun (for example: the red dress ).

Language Acquisition by Identical vs. Fraternal SLI Twins * Karin Stromswold & Jay I. Rifkin

Unit 8 Pronoun References

LQVSumm: A Corpus of Linguistic Quality Violations in Multi-Document Summarization

Review in ICAME Journal, Volume 38, 2014, DOI: /icame

Today we examine the distribution of infinitival clauses, which can be

Advanced Grammar in Use

Inleiding Taalkunde. Docent: Paola Monachesi. Blok 4, 2001/ Syntax 2. 2 Phrases and constituent structure 2. 3 A minigrammar of Italian 3

Construction Grammar. University of Jena.

BASIC ENGLISH. Book GRAMMAR

National Literacy and Numeracy Framework for years 3/4

cmp-lg/ Jul 1995

Written by: YULI AMRIA (RRA1B210085) ABSTRACT. Key words: ability, possessive pronouns, and possessive adjectives INTRODUCTION

The Noun Phrase in Hawrami * Anders Holmberg, University of Newcastle David Odden, Ohio State University

cambridge occasional papers in linguistics Volume 8, Article 3: 41 55, 2015 ISSN

VERB MEANINGS AND THEIR EFFECTS ON SYNTACTIC BEHAVIORS: A STUDY WITH SPECIAL REFERENCE TO ENGLISH AND JAPANESE ERGATIVE PAIRS

Some Principles of Automated Natural Language Information Extraction

A Computational Evaluation of Case-Assignment Algorithms

Linking Task: Identifying authors and book titles in verbose queries

Specifying a shallow grammatical for parsing purposes

Opportunities for Writing Title Key Stage 1 Key Stage 2 Narrative

Phenomena of gender attraction in Polish *

Chunk Parsing for Base Noun Phrases using Regular Expressions. Let s first let the variable s0 be the sentence tree of the first sentence.

AN EXPERIMENTAL APPROACH TO NEW AND OLD INFORMATION IN TURKISH LOCATIVES AND EXISTENTIALS

Ch VI- SENTENCE PATTERNS.

Dear Teacher: Welcome to Reading Rods! Reading Rods offer many outstanding features! Read on to discover how to put Reading Rods to work today!

Context Free Grammars. Many slides from Michael Collins

THE INTERNATIONAL JOURNAL OF HUMANITIES & SOCIAL STUDIES

Progressive Aspect in Nigerian English

Emmaus Lutheran School English Language Arts Curriculum

The Noun Phrase in Hawrami 1 Anders Holmberg and David Odden

More Morphology. Problem Set #1 is up: it s due next Thursday (1/19) fieldwork component: Figure out how negation is expressed in your language.

AN ANALYSIS OF GRAMMTICAL ERRORS MADE BY THE SECOND YEAR STUDENTS OF SMAN 5 PADANG IN WRITING PAST EXPERIENCES

Constraining X-Bar: Theta Theory

2014 Colleen Elizabeth Fitzgerald

ENGBG1 ENGBL1 Campus Linguistics. Meeting 2. Chapter 7 (Morphology) and chapter 9 (Syntax) Pia Sundqvist

On the Notion Determiner

Universal Grammar 2. Universal Grammar 1. Forms and functions 1. Universal Grammar 3. Conceptual and surface structure of complex clauses

AN INTRODUCTION (2 ND ED.) (LONDON, BLOOMSBURY ACADEMIC PP. VI, 282)

EAGLE: an Error-Annotated Corpus of Beginning Learner German

Presentation Exercise: Chapter 32

Control and Boundedness

Minimalism is the name of the predominant approach in generative linguistics today. It was first

Frequency and pragmatically unmarked word order *

Parsing of part-of-speech tagged Assamese Texts

Proceedings of the 19th COLING, , 2002.

Tutorial on Paradigms

OWLs Across Borders: An Exploratory Study on the place of Online Writing Labs in the EFL Context

FOREWORD.. 5 THE PROPER RUSSIAN PRONUNCIATION. 8. УРОК (Unit) УРОК (Unit) УРОК (Unit) УРОК (Unit) 4 80.

11/29/2010. Statistical Parsing. Statistical Parsing. Simple PCFG for ATIS English. Syntactic Disambiguation

Sample Goals and Benchmarks

1/20 idea. We ll spend an extra hour on 1/21. based on assigned readings. so you ll be ready to discuss them in class

Intra-talker Variation: Audience Design Factors Affecting Lexical Selections

Modeling full form lexica for Arabic

Target Language Preposition Selection an Experiment with Transformation-Based Learning and Aligned Bilingual Data

Using dialogue context to improve parsing performance in dialogue systems

Basic Parsing with Context-Free Grammars. Some slides adapted from Julia Hirschberg and Dan Jurafsky 1

GERM 3040 GERMAN GRAMMAR AND COMPOSITION SPRING 2017

Pseudo-Passives as Adjectival Passives

Feature-Based Grammar

Programma di Inglese

Morphosyntactic and Referential Cues to the Identification of Generic Statements

Campus Academic Resource Program An Object of a Preposition: A Prepositional Phrase: noun adjective

An Evaluation of POS Taggers for the CHILDES Corpus

Transcription:

Manual annota)on in a func)onal- typological grammar study (A study on the Javanese dialect of Kudus, Indonesia) Noor Malihah

Loca)on Yogyakarta Kudus Solo

The grammar of Javanese SVO (verbal clauses). Javanese NPs lack number marking, plurality is indicated by a numeral. No tenses. Verbs may be combined with aspectual markers and modals.

Goal To annotate manually the JDK spoken and wrilen corpus To use the annota)on in a func)onal- typological grammar study, especially on the passive, the applica)ve, and the causa)ve.

Why passive, applica)ve, and causa)ve? a. Many scholars have broadly discussed the phenomena of passive, applica)ve, and causa)ve in the Austronesian languages. b. The same phenomenon: valency changing construc)on c. In JDK, they have dis)nc)ve features compared to standard Javanese d. Applica)ve and causa)ve have the same morphological markers in Javanese

Passive It contrasts to another construc)on, the ac)ve; The subject of the ac)ve corresponds to a non- obligatory oblique phrase in the passive; or is not overtly expressed; The subject of the passive, if there is one, corresponds to the direct object of the ac)ve; The construc)on is pragma)cally restricted rela)ve to the ac)ve; The construc)on displays special morphological marking of the verb. (Siewierska, 2005)

Example of passive in English a. John bought the book. b. The book was bought by John.

The applica)ve A sentence where an extra object is added. Haspelmath (2001): Applica)ve as a valency- increasing phenomenon where a direct object is added to a verb. It is just like in English (Gropen et al. 1989: 204): a. John gave a gi^ to Mary. b. John gave Mary a gi^.

Example of an applica)ve in JDK a. FS:03:M:A:C: 136 Lha otoma)se asu iku mau kan yo nyedak- i EMPH automa)cally dog that DEF EMPH also ACT.approach- APPL bulus iku mau turtle that DEF b. Non- applica)ve (manipulated example) Lha otoma)se asu iku mau kan yo nyedak ning EMPH automa)cally dog that DEF EMPH also ACT.approach to bulus iku mau turtle that DEF Huh, automa)cally, that dog also approached that turtle.

Causa)ve Causa)viza)on creates a new predicate with an agent causer added. Somebody makes someone do something. Talmy (2000), Shibatani (1976) define a causa)ve situa)on as a situa)on that can be analyzed into two sub- events: a causing and a caused event. The cause event must follow causally from the causing event. a. The caused event would not occur if the causing event did not occur; b. The caused event does indeed occur.

Example in English (1) a. The children danced. b. The teacher made the children dance. (2) a. The robber died. b. The policeman killed the robber.

General ideas A rela)vely small data collec)on Manually annotated the data for various gramma)cal features Use the tags to examine the correla)on between one code and the other code(s)

Data collec)on Type of data : Elicited narra)ve, spontaneous speech, wrilen data. Period : September 2010 January 2011 (5- month data collec)on) Place : Kudus regency, Central Java, Indonesia

Manual annota)on Goal To produce a corpus for a grammar study. I am not producing the perfect corpus for future genera)ons, but a workable corpus for my own use. The annotated corpus will be used to do the analysis of the JDK grammar. The manual annota)on of the JDK data is linguis)cally rich informa)on ranging from morphology through syntax and seman)cs.

Why manual annota)on? The data set contains a small number of annotated data (see table 1). a. Recording from 49 JDK na)ve speakers b. WriLen data from six ar)cles from a local newspaper

Table 1. The distribu)on of informants, clauses, words with different data sources Corpus Narra)ve Frog story Spontaneous speech Number of informants Number of clauses Number of words 41 2,431 37,716 8 1,045 6,103 WriLen data 6 586 3,547 TOTAL 55 4,062 47,366

Prepara)on A word document is used to transcribe and annotate. An excel sheet is used to record the quan)ta)ve results.

Step 1 Decided the codes used to annotate, including: a. Type of clauses (ac)ves, passives, and erga)ve- like) b. Applica)ves and causa)ves; c. Transi)vity of the verb base; d. Gramma)cal rela)ons; e. Seman)c features of the nouns; f. Seman)c roles of the nouns; g. POS; h. Data sources.

Step 2 Read through and annotated every single clause. Explicitly added informa)on on each clauses and words in each text in the corpora. These tags were used to look at the correla)on between a par)cular gramma)cal feature and the others.

A single clause: Rules: - Indicates a single situa)on or ac)on or event - A dependency of a predicate and an argument (Ewing, 1998: 14) Annota)on Each annota)on was placed in angle brackets, the posi)on of these tags varies.

Step 3: Code for data sources My transcripts were coded to indicate informa)on about the speakers who produced each clause. Each single clause is labeled using a uniform format. The ID code preceding each clause iden)fies the type of data, the sex, age, and place of residence of the speaker and clause number.

How to use the codes for data sources A combina)on of codes serves as a unique iden)fier for a par)cular clause. There is no clause that has the same string. Example: FS:01:F:A:C: 008 refers to data elicited using the frog story method, narrated by informant number one, who is female, adult and who lives in an urban area; and this is clause number eight in the transcript.

Codes applied to verbs Codes Informa>on Posi>on TR1 or TR2 or INT1 or INT2 PASS1 or PASS2 or PASS3 Ac)ve transi)ve/intransi)ve verbs. Each TR or INT is iden)fied as 1 (for verbs with the nasal prefix) or 2 (for verbs without the nasal prefix). Passive type 1, or passive type 2, or passive type 3. The classifica)on is based on the presence of agent, pa)ent, and preposi)on in a clause Immediately a^er the verb Immediately a^er the verb UNMARKED Passive without morphology Immediately a^er PASS1, or PASS2 or PASS3 ERGL1 or ERG2L APPL1 or APPL2 or APPL3 ERGL1 labels an erga)ve- like clause where the agent is the first person singular pronoun, ERGL2 codes an erga)ve- like clause where the agent is the second person pronoun APPL1 labels a verb with (a)ke; APPL2 shows a verb with na; and APPL3 indicates a verb with i. Immediately a^er the verb Immediately a^er TR1 or TR2 or PASS1 or PASS2 or PASS3 or ERG1 or ERG2

con.nue Codes Informa>on Posi>on CAUS1 or CAUS2 or CAUS3 CAUS1 labels a verb with (a)ke; CAUS2 shows a verb with na; and CAUS3 indicates a verb with i. Immediately a^er TR1 or TR2 or PASS1 or PASS2 or PASS3 or ERG1 or ERG2 ADV Indicates an adversa)ve passive Immediately a^er the verb ANS Ac)ve clause without Subject Immediately a^er TR1 or TR2 PNS Passive clause without subject Immediately a^er PASS1 or PASS2 or PASS3

Example (1) FS:01:M:A:C: 003 terus kui bocah- bocah kui mancing <INT2> then that child- child that ACT.go.fishing Then those children went fishing. (2) WR:07: 042 Suplo ngagetna <TR2> <CAUS1> paklike lan mboklike Suplo ACT.surprise.CAUS uncle and aunty Suplo caused his uncle and his aunty to surprise.

Codes applied to clauses Codes Informa>on Posi>on NOM1 or NOM2 NOM1 indicates a non- verbal clause and NOM2 labels an existen)al clause At the end of the clause. IMPER Impera)ve clause At the end of the clause REL Rela)ve marker Immediately a^er the Javanese rela)ve marker sing or kang

Examples (1) FS:02:M:A:C: 006 nanging Budi orak kuat <NOM1> But Budi NEG strong But Budi was not strong. (2) FS:03:M:A:C: 025 Loh kok malah ono bulus <NOM2> Huh EMP actually exist turtle Huh, actually there was a turtle.

Codes applied to nouns 1: Indica)ng seman)c features of the nouns Codes Informa>on Posi>on HUM or NONH Human or non- human noun Immediately a^er a noun ANIM or INA Animate or inanimate Immediately a^er label HUM or NONH DEF NP or INDEF Definite or Indefinite noun phrase 1 or 2 or 3 First person pronoun, or second person pronoun or third person pronoun Immediately a^er label ANIM or INA. Only for common nouns. NAME is used instead when a noun is a name. 1 or 2 or 3 is used instead when the noun is first person pronoun or second person pronoun or third person pronoun Immediately a^er label ANIM OR INA S or P Singular or plural Immediately a^er DEF NP or INDEF NP or NAME or 1 or 2 or 3

Examples (1) SS:02:F:A:C: 305 Setange iki <NONH> <INA> <DEF NP> <S> hurung Steering this NEG dibenak- benakke <PASS3> <CAUS2> PASS.fix- fix.caus This steering has not been fixed.

Codes indica)ng seman)c roles Codes Informa>on Posi>on AGT Agent A^er the code indica)ng the seman)c features of the nouns, immediately a^er SG or PL PAT Pa)ent A^er the code indica)ng the seman)c features of the nouns, immediately a^er SG or PL BEN Benefac)ve A^er the code indica)ng the seman)c features of the nouns, immediately a^er SG or PL REC Recipient A^er the code indica)ng the seman)c features of the nouns, immediately a^er SG or PL LOC Loca)on A^er the code indica)ng the seman)c features of the nouns, immediately a^er SG or PL INST Instrument A^er the code indica)ng the seman)c features of the nouns, immediately a^er SG or PL GOAL Goal A^er the code indica)ng the seman)c features of the nouns, immediately a^er SG or PL

Examples SS:02:F:A:C: 305 setange iki <INA> <NONH> <DEF NP> <S> <PAT> hurung dibenak- benakke <PASS3> <CAUS2> This steering has not been fixed. FS:02:M:A:C: 010 Kirike budi <NONH> <ANIM> <DEF NP> <S> <AGT> marani <TR2> <APPL3> buluse <GOAL> karo njegogi <TR2> <APPL3> Budi s dog approached the turtle and barked.

Codes indica)ng the gramma)cal rela)ons of the nouns Codes Informa>on Posi>on SUBJ Subject of the clause A^er the code indica)ng the seman)c roles of the noun OBJ Object of the clause A^er the code indica)ng the seman)c roles of the noun IO Indirect object of the clause A^er the code indica)ng the seman)c roles of the noun

Examples SS:02:F:A:C: 305 Setange iki <INA> <NONH> <DEF NP> <S> <PAT> <SUBJ> hurung dibenak- benakke <PASS3> <CAUS2> This steering has not been fixed. FS:02:M:A:C: 010 Kirike budi <NONH> <ANIM> <DEF NP> <S> <AGT> <SUBJ> marani <TR2> <APPL3> buluse <NONH> <ANIM> <GOAL> <OBJ> karo njegogi <TR2> <APPL3> Budi s dog approached the turtle and barked.

Codes indica)ng the lexical and morphosyntac)c features of the dialect Code Informa>on Posi>on JDK Lexical or morphosyntac)c features of JDK. To allow me to demonstrate that the clauses are originally produced by the na)ve speakers of JDK, the features need to be coded. I will only analyze any texts containing clauses with JDK features. Immediately a^er the features in the clause

Examples SS:02:F:A:C: 305 Setange iki <INA> <NONH> <DEF NP> <S> <PAT> <SUBJ> hurung <JDK> dibenak- benakke <PASS3> <CAUS2> This steering has not been fixed. FS:02:M:A:C: 010 Kirike budi <NONH> <ANIM> <DEF NP> <S> <AGT> <SUBJ> marani <TR2> <APPL3> buluse <NONH> <ANIM> <GOAL> <OBJ> karo njegogi <TR2> <APPL3> <JDK> Budi s dog approached the turtle and barked.

Results An annotated dataset containing relevant informa)on to answer my research ques)ons Quan)ta)ve results are obtained by coun)ng the co- occurrence of a par)cular feature in the dataset.

con.nue From these tags, I can describe a par)cular construc)on in data number xxx, for example: a. The type of clause b. The transi)vity of the verb base c. The animacy of the subject d. The animacy of promoted argument e. The seman)c role of the promoted argument

Example FS:02:M:A:C: 010 Kirike budi <NONH> <ANIM> <DEF NP> <S> <AGT> <SUBJ> marani <TR2> <APPL3> buluse <NONH> <ANIM> <GOAL> <OBJ> karo njegogi <TR2> <APPL3> <JDK> a. Data in FS:02:M:A:C: 010 is an applica)ve type 3 b. The agent is the subject and is non- human animate (animal). c. The promoted argument or the object is also a non- human animate (animal) and it is a goal.

How to use the results Combine one informa)on with another informa)on to answer about the use of a par)cular gramma)cal construc)on. For example: informa)on about seman)c role of a noun phrase can be combined with the applica)ve to answer how each seman)c role of the promoted argument is promoted with the applica)ve type 1.

How to use the tags (1) Search for the occurrences of a par)cular construc)on, for example applica)ve. Highlight all entries with applica)ve (APPL1, APPL2, APPL3) Put the entry for a par)cular construc)on in a separate file, for example: when I searched for an applica)ve, I will have four separate file for APPL1, APPL2, APPL3 and applica)ve all together

con.nue At the same )me, I used an excel sheet for several purposes, such as to list the verbs or other informa)on needed, to record the quan)ta)ve results, and to create a graph based on the quan)ta)ve results

List of verbs in APPL1

Quan)ta)ve results

Graph The distribu>on of subject animacy with the different applica>ve markers 64.9 Animate subject Inanimate subject 80.0 76.2 73.8 78.3 35.1 20.0 23.8 26.2 21.7 - na - (a)ke - i All applica)ve Baseline

How to use the tags (2) To examine the transi)vity of the verb bases in the applica)ve construc)ons, I looked at the tags on the verbs (TR1 or TR2 or INT1 or INT2 or ERGL1 or ERGL2) To see the animacy of the subject in the applica)ves, I used the tags for ANIM or INA and SUBJ

con.nue To see the animacy of the promoted argument in the applica)ves, I looked at the tags for ANIM or INA and OBJ (the promoted argument) To inves)gate the seman)c role of the promoted argument in the applica)ve, I examined the tags for seman)c roles (PAT or BEN or INST or LOC or GOAL or REC)

con.nue I also used these tags to count the frequency distribu)on with which each gramma)cal phenomenon co- occurs For example to examine the co- occurrence of the affixes used to promote each seman)c role.

Example The distribu>on of the affixes used to promote each seman>c role 100.0 100.0 - na - (a)ke - i 100.0 62.7 75.6 73.5 37.3 18.3 26.5 0.0 0.0 0.0 0.0 0.0 6.1 0.0 0.0 0.0 Benefac)ve Recipient Loca)on Goal Instrument Pa)ent

Challenge 1 To decide the appropriate codes in the annota)on which were relevant to the main research ques)ons. The annota)on should make it possible to search for specific informa)on in the data set For example: to adopt INT or INTR for an intransi)ve verb, S or SUBJ for a subject of a clause.

Challenge 2 Consistency For example: to adopt clear criteria on what counts as an animate or inanimate noun or other gramma)cal terms. Sikile asu the dog s leg is an animate or inanimate noun

Challenge 3 High accuracy For example: a. Mistyped <APPL1> à <APLL1> b. Extra space <ANIM> à < ANIM> c. Human mistakes <HUM> à <NONH>

Challenge 4 Many files Save each files for a par)cular construc)on in a separate file. For example: In the applica)ve, at least there were 5 files, namely: file for all dataset, file for applica)ve all together, file for applica)ve type 1, type 2 and type 3.

Challenge 5 Time- consuming Why? A manual entry of the analysis When there were any changes for one piece of informa)on, a revision is needed for the whole dataset start the tagging from the beginning

Summary Manual annota)on is possible to do in a func)onal- typological grammar study Some good points Some challenges

Thank you Ques)ons and sugges)ons? Or email me at n.malihah@lancs.ac.uk