Manual annota)on in a func)onal- typological grammar study (A study on the Javanese dialect of Kudus, Indonesia) Noor Malihah
Loca)on Yogyakarta Kudus Solo
The grammar of Javanese SVO (verbal clauses). Javanese NPs lack number marking, plurality is indicated by a numeral. No tenses. Verbs may be combined with aspectual markers and modals.
Goal To annotate manually the JDK spoken and wrilen corpus To use the annota)on in a func)onal- typological grammar study, especially on the passive, the applica)ve, and the causa)ve.
Why passive, applica)ve, and causa)ve? a. Many scholars have broadly discussed the phenomena of passive, applica)ve, and causa)ve in the Austronesian languages. b. The same phenomenon: valency changing construc)on c. In JDK, they have dis)nc)ve features compared to standard Javanese d. Applica)ve and causa)ve have the same morphological markers in Javanese
Passive It contrasts to another construc)on, the ac)ve; The subject of the ac)ve corresponds to a non- obligatory oblique phrase in the passive; or is not overtly expressed; The subject of the passive, if there is one, corresponds to the direct object of the ac)ve; The construc)on is pragma)cally restricted rela)ve to the ac)ve; The construc)on displays special morphological marking of the verb. (Siewierska, 2005)
Example of passive in English a. John bought the book. b. The book was bought by John.
The applica)ve A sentence where an extra object is added. Haspelmath (2001): Applica)ve as a valency- increasing phenomenon where a direct object is added to a verb. It is just like in English (Gropen et al. 1989: 204): a. John gave a gi^ to Mary. b. John gave Mary a gi^.
Example of an applica)ve in JDK a. FS:03:M:A:C: 136 Lha otoma)se asu iku mau kan yo nyedak- i EMPH automa)cally dog that DEF EMPH also ACT.approach- APPL bulus iku mau turtle that DEF b. Non- applica)ve (manipulated example) Lha otoma)se asu iku mau kan yo nyedak ning EMPH automa)cally dog that DEF EMPH also ACT.approach to bulus iku mau turtle that DEF Huh, automa)cally, that dog also approached that turtle.
Causa)ve Causa)viza)on creates a new predicate with an agent causer added. Somebody makes someone do something. Talmy (2000), Shibatani (1976) define a causa)ve situa)on as a situa)on that can be analyzed into two sub- events: a causing and a caused event. The cause event must follow causally from the causing event. a. The caused event would not occur if the causing event did not occur; b. The caused event does indeed occur.
Example in English (1) a. The children danced. b. The teacher made the children dance. (2) a. The robber died. b. The policeman killed the robber.
General ideas A rela)vely small data collec)on Manually annotated the data for various gramma)cal features Use the tags to examine the correla)on between one code and the other code(s)
Data collec)on Type of data : Elicited narra)ve, spontaneous speech, wrilen data. Period : September 2010 January 2011 (5- month data collec)on) Place : Kudus regency, Central Java, Indonesia
Manual annota)on Goal To produce a corpus for a grammar study. I am not producing the perfect corpus for future genera)ons, but a workable corpus for my own use. The annotated corpus will be used to do the analysis of the JDK grammar. The manual annota)on of the JDK data is linguis)cally rich informa)on ranging from morphology through syntax and seman)cs.
Why manual annota)on? The data set contains a small number of annotated data (see table 1). a. Recording from 49 JDK na)ve speakers b. WriLen data from six ar)cles from a local newspaper
Table 1. The distribu)on of informants, clauses, words with different data sources Corpus Narra)ve Frog story Spontaneous speech Number of informants Number of clauses Number of words 41 2,431 37,716 8 1,045 6,103 WriLen data 6 586 3,547 TOTAL 55 4,062 47,366
Prepara)on A word document is used to transcribe and annotate. An excel sheet is used to record the quan)ta)ve results.
Step 1 Decided the codes used to annotate, including: a. Type of clauses (ac)ves, passives, and erga)ve- like) b. Applica)ves and causa)ves; c. Transi)vity of the verb base; d. Gramma)cal rela)ons; e. Seman)c features of the nouns; f. Seman)c roles of the nouns; g. POS; h. Data sources.
Step 2 Read through and annotated every single clause. Explicitly added informa)on on each clauses and words in each text in the corpora. These tags were used to look at the correla)on between a par)cular gramma)cal feature and the others.
A single clause: Rules: - Indicates a single situa)on or ac)on or event - A dependency of a predicate and an argument (Ewing, 1998: 14) Annota)on Each annota)on was placed in angle brackets, the posi)on of these tags varies.
Step 3: Code for data sources My transcripts were coded to indicate informa)on about the speakers who produced each clause. Each single clause is labeled using a uniform format. The ID code preceding each clause iden)fies the type of data, the sex, age, and place of residence of the speaker and clause number.
How to use the codes for data sources A combina)on of codes serves as a unique iden)fier for a par)cular clause. There is no clause that has the same string. Example: FS:01:F:A:C: 008 refers to data elicited using the frog story method, narrated by informant number one, who is female, adult and who lives in an urban area; and this is clause number eight in the transcript.
Codes applied to verbs Codes Informa>on Posi>on TR1 or TR2 or INT1 or INT2 PASS1 or PASS2 or PASS3 Ac)ve transi)ve/intransi)ve verbs. Each TR or INT is iden)fied as 1 (for verbs with the nasal prefix) or 2 (for verbs without the nasal prefix). Passive type 1, or passive type 2, or passive type 3. The classifica)on is based on the presence of agent, pa)ent, and preposi)on in a clause Immediately a^er the verb Immediately a^er the verb UNMARKED Passive without morphology Immediately a^er PASS1, or PASS2 or PASS3 ERGL1 or ERG2L APPL1 or APPL2 or APPL3 ERGL1 labels an erga)ve- like clause where the agent is the first person singular pronoun, ERGL2 codes an erga)ve- like clause where the agent is the second person pronoun APPL1 labels a verb with (a)ke; APPL2 shows a verb with na; and APPL3 indicates a verb with i. Immediately a^er the verb Immediately a^er TR1 or TR2 or PASS1 or PASS2 or PASS3 or ERG1 or ERG2
con.nue Codes Informa>on Posi>on CAUS1 or CAUS2 or CAUS3 CAUS1 labels a verb with (a)ke; CAUS2 shows a verb with na; and CAUS3 indicates a verb with i. Immediately a^er TR1 or TR2 or PASS1 or PASS2 or PASS3 or ERG1 or ERG2 ADV Indicates an adversa)ve passive Immediately a^er the verb ANS Ac)ve clause without Subject Immediately a^er TR1 or TR2 PNS Passive clause without subject Immediately a^er PASS1 or PASS2 or PASS3
Example (1) FS:01:M:A:C: 003 terus kui bocah- bocah kui mancing <INT2> then that child- child that ACT.go.fishing Then those children went fishing. (2) WR:07: 042 Suplo ngagetna <TR2> <CAUS1> paklike lan mboklike Suplo ACT.surprise.CAUS uncle and aunty Suplo caused his uncle and his aunty to surprise.
Codes applied to clauses Codes Informa>on Posi>on NOM1 or NOM2 NOM1 indicates a non- verbal clause and NOM2 labels an existen)al clause At the end of the clause. IMPER Impera)ve clause At the end of the clause REL Rela)ve marker Immediately a^er the Javanese rela)ve marker sing or kang
Examples (1) FS:02:M:A:C: 006 nanging Budi orak kuat <NOM1> But Budi NEG strong But Budi was not strong. (2) FS:03:M:A:C: 025 Loh kok malah ono bulus <NOM2> Huh EMP actually exist turtle Huh, actually there was a turtle.
Codes applied to nouns 1: Indica)ng seman)c features of the nouns Codes Informa>on Posi>on HUM or NONH Human or non- human noun Immediately a^er a noun ANIM or INA Animate or inanimate Immediately a^er label HUM or NONH DEF NP or INDEF Definite or Indefinite noun phrase 1 or 2 or 3 First person pronoun, or second person pronoun or third person pronoun Immediately a^er label ANIM or INA. Only for common nouns. NAME is used instead when a noun is a name. 1 or 2 or 3 is used instead when the noun is first person pronoun or second person pronoun or third person pronoun Immediately a^er label ANIM OR INA S or P Singular or plural Immediately a^er DEF NP or INDEF NP or NAME or 1 or 2 or 3
Examples (1) SS:02:F:A:C: 305 Setange iki <NONH> <INA> <DEF NP> <S> hurung Steering this NEG dibenak- benakke <PASS3> <CAUS2> PASS.fix- fix.caus This steering has not been fixed.
Codes indica)ng seman)c roles Codes Informa>on Posi>on AGT Agent A^er the code indica)ng the seman)c features of the nouns, immediately a^er SG or PL PAT Pa)ent A^er the code indica)ng the seman)c features of the nouns, immediately a^er SG or PL BEN Benefac)ve A^er the code indica)ng the seman)c features of the nouns, immediately a^er SG or PL REC Recipient A^er the code indica)ng the seman)c features of the nouns, immediately a^er SG or PL LOC Loca)on A^er the code indica)ng the seman)c features of the nouns, immediately a^er SG or PL INST Instrument A^er the code indica)ng the seman)c features of the nouns, immediately a^er SG or PL GOAL Goal A^er the code indica)ng the seman)c features of the nouns, immediately a^er SG or PL
Examples SS:02:F:A:C: 305 setange iki <INA> <NONH> <DEF NP> <S> <PAT> hurung dibenak- benakke <PASS3> <CAUS2> This steering has not been fixed. FS:02:M:A:C: 010 Kirike budi <NONH> <ANIM> <DEF NP> <S> <AGT> marani <TR2> <APPL3> buluse <GOAL> karo njegogi <TR2> <APPL3> Budi s dog approached the turtle and barked.
Codes indica)ng the gramma)cal rela)ons of the nouns Codes Informa>on Posi>on SUBJ Subject of the clause A^er the code indica)ng the seman)c roles of the noun OBJ Object of the clause A^er the code indica)ng the seman)c roles of the noun IO Indirect object of the clause A^er the code indica)ng the seman)c roles of the noun
Examples SS:02:F:A:C: 305 Setange iki <INA> <NONH> <DEF NP> <S> <PAT> <SUBJ> hurung dibenak- benakke <PASS3> <CAUS2> This steering has not been fixed. FS:02:M:A:C: 010 Kirike budi <NONH> <ANIM> <DEF NP> <S> <AGT> <SUBJ> marani <TR2> <APPL3> buluse <NONH> <ANIM> <GOAL> <OBJ> karo njegogi <TR2> <APPL3> Budi s dog approached the turtle and barked.
Codes indica)ng the lexical and morphosyntac)c features of the dialect Code Informa>on Posi>on JDK Lexical or morphosyntac)c features of JDK. To allow me to demonstrate that the clauses are originally produced by the na)ve speakers of JDK, the features need to be coded. I will only analyze any texts containing clauses with JDK features. Immediately a^er the features in the clause
Examples SS:02:F:A:C: 305 Setange iki <INA> <NONH> <DEF NP> <S> <PAT> <SUBJ> hurung <JDK> dibenak- benakke <PASS3> <CAUS2> This steering has not been fixed. FS:02:M:A:C: 010 Kirike budi <NONH> <ANIM> <DEF NP> <S> <AGT> <SUBJ> marani <TR2> <APPL3> buluse <NONH> <ANIM> <GOAL> <OBJ> karo njegogi <TR2> <APPL3> <JDK> Budi s dog approached the turtle and barked.
Results An annotated dataset containing relevant informa)on to answer my research ques)ons Quan)ta)ve results are obtained by coun)ng the co- occurrence of a par)cular feature in the dataset.
con.nue From these tags, I can describe a par)cular construc)on in data number xxx, for example: a. The type of clause b. The transi)vity of the verb base c. The animacy of the subject d. The animacy of promoted argument e. The seman)c role of the promoted argument
Example FS:02:M:A:C: 010 Kirike budi <NONH> <ANIM> <DEF NP> <S> <AGT> <SUBJ> marani <TR2> <APPL3> buluse <NONH> <ANIM> <GOAL> <OBJ> karo njegogi <TR2> <APPL3> <JDK> a. Data in FS:02:M:A:C: 010 is an applica)ve type 3 b. The agent is the subject and is non- human animate (animal). c. The promoted argument or the object is also a non- human animate (animal) and it is a goal.
How to use the results Combine one informa)on with another informa)on to answer about the use of a par)cular gramma)cal construc)on. For example: informa)on about seman)c role of a noun phrase can be combined with the applica)ve to answer how each seman)c role of the promoted argument is promoted with the applica)ve type 1.
How to use the tags (1) Search for the occurrences of a par)cular construc)on, for example applica)ve. Highlight all entries with applica)ve (APPL1, APPL2, APPL3) Put the entry for a par)cular construc)on in a separate file, for example: when I searched for an applica)ve, I will have four separate file for APPL1, APPL2, APPL3 and applica)ve all together
con.nue At the same )me, I used an excel sheet for several purposes, such as to list the verbs or other informa)on needed, to record the quan)ta)ve results, and to create a graph based on the quan)ta)ve results
List of verbs in APPL1
Quan)ta)ve results
Graph The distribu>on of subject animacy with the different applica>ve markers 64.9 Animate subject Inanimate subject 80.0 76.2 73.8 78.3 35.1 20.0 23.8 26.2 21.7 - na - (a)ke - i All applica)ve Baseline
How to use the tags (2) To examine the transi)vity of the verb bases in the applica)ve construc)ons, I looked at the tags on the verbs (TR1 or TR2 or INT1 or INT2 or ERGL1 or ERGL2) To see the animacy of the subject in the applica)ves, I used the tags for ANIM or INA and SUBJ
con.nue To see the animacy of the promoted argument in the applica)ves, I looked at the tags for ANIM or INA and OBJ (the promoted argument) To inves)gate the seman)c role of the promoted argument in the applica)ve, I examined the tags for seman)c roles (PAT or BEN or INST or LOC or GOAL or REC)
con.nue I also used these tags to count the frequency distribu)on with which each gramma)cal phenomenon co- occurs For example to examine the co- occurrence of the affixes used to promote each seman)c role.
Example The distribu>on of the affixes used to promote each seman>c role 100.0 100.0 - na - (a)ke - i 100.0 62.7 75.6 73.5 37.3 18.3 26.5 0.0 0.0 0.0 0.0 0.0 6.1 0.0 0.0 0.0 Benefac)ve Recipient Loca)on Goal Instrument Pa)ent
Challenge 1 To decide the appropriate codes in the annota)on which were relevant to the main research ques)ons. The annota)on should make it possible to search for specific informa)on in the data set For example: to adopt INT or INTR for an intransi)ve verb, S or SUBJ for a subject of a clause.
Challenge 2 Consistency For example: to adopt clear criteria on what counts as an animate or inanimate noun or other gramma)cal terms. Sikile asu the dog s leg is an animate or inanimate noun
Challenge 3 High accuracy For example: a. Mistyped <APPL1> à <APLL1> b. Extra space <ANIM> à < ANIM> c. Human mistakes <HUM> à <NONH>
Challenge 4 Many files Save each files for a par)cular construc)on in a separate file. For example: In the applica)ve, at least there were 5 files, namely: file for all dataset, file for applica)ve all together, file for applica)ve type 1, type 2 and type 3.
Challenge 5 Time- consuming Why? A manual entry of the analysis When there were any changes for one piece of informa)on, a revision is needed for the whole dataset start the tagging from the beginning
Summary Manual annota)on is possible to do in a func)onal- typological grammar study Some good points Some challenges
Thank you Ques)ons and sugges)ons? Or email me at n.malihah@lancs.ac.uk