Italian Sign Language (LIS) Corpus. Mirko Santoro & Carlo Geraci CNRS, Institut Jean-Nicod

Similar documents
A Minimalist Approach to Code-Switching. In the field of linguistics, the topic of bilingualism is a broad one. There are many

1 st Quarter (September, October, November) August/September Strand Topic Standard Notes Reading for Literature

Controlled vocabulary

BULATS A2 WORDLIST 2

Houghton Mifflin Reading Correlation to the Common Core Standards for English Language Arts (Grade1)

Intra-talker Variation: Audience Design Factors Affecting Lexical Selections

AN EXPERIMENTAL APPROACH TO NEW AND OLD INFORMATION IN TURKISH LOCATIVES AND EXISTENTIALS

The Acquisition of Person and Number Morphology Within the Verbal Domain in Early Greek

Introduction to HPSG. Introduction. Historical Overview. The HPSG architecture. Signature. Linguistic Objects. Descriptions.

Sample Goals and Benchmarks

ENGBG1 ENGBL1 Campus Linguistics. Meeting 2. Chapter 7 (Morphology) and chapter 9 (Syntax) Pia Sundqvist

First Grade Curriculum Highlights: In alignment with the Common Core Standards

Taught Throughout the Year Foundational Skills Reading Writing Language RF.1.2 Demonstrate understanding of spoken words,

Improved Effects of Word-Retrieval Treatments Subsequent to Addition of the Orthographic Form

California Department of Education English Language Development Standards for Grade 8

Enhancing Unlexicalized Parsing Performance using a Wide Coverage Lexicon, Fuzzy Tag-set Mapping, and EM-HMM-based Lexical Probabilities

Acquiring verb agreement in HKSL: Optional or obligatory?

L1 and L2 acquisition. Holger Diessel

2/15/13. POS Tagging Problem. Part-of-Speech Tagging. Example English Part-of-Speech Tagsets. More Details of the Problem. Typical Problem Cases

Words come in categories

Vocabulary Usage and Intelligibility in Learner Language

Modeling full form lexica for Arabic

Derivational and Inflectional Morphemes in Pak-Pak Language

LING 329 : MORPHOLOGY

Linguistic Variation across Sports Category of Press Reportage from British Newspapers: a Diachronic Multidimensional Analysis

Phonological and Phonetic Representations: The Case of Neutralization

CS 598 Natural Language Processing

ELD CELDT 5 EDGE Level C Curriculum Guide LANGUAGE DEVELOPMENT VOCABULARY COMMON WRITING PROJECT. ToolKit

Phenomena of gender attraction in Polish *

Context Free Grammars. Many slides from Michael Collins

FOREWORD.. 5 THE PROPER RUSSIAN PRONUNCIATION. 8. УРОК (Unit) УРОК (Unit) УРОК (Unit) УРОК (Unit) 4 80.

English Language and Applied Linguistics. Module Descriptions 2017/18

Language Acquisition Fall 2010/Winter Lexical Categories. Afra Alishahi, Heiner Drenhaus

Language Acquisition by Identical vs. Fraternal SLI Twins * Karin Stromswold & Jay I. Rifkin

Programma di Inglese

Target Language Preposition Selection an Experiment with Transformation-Based Learning and Aligned Bilingual Data

What the National Curriculum requires in reading at Y5 and Y6

LQVSumm: A Corpus of Linguistic Quality Violations in Multi-Document Summarization

Author: Fatima Lemtouni, Wayzata High School, Wayzata, MN

Eyebrows in French talk-in-interaction

Tracy Dudek & Jenifer Russell Trinity Services, Inc. *Copyright 2008, Mark L. Sundberg

The Effect of Extensive Reading on Developing the Grammatical. Accuracy of the EFL Freshmen at Al Al-Bayt University

Developing Grammar in Context

Today we examine the distribution of infinitival clauses, which can be

Linking Task: Identifying authors and book titles in verbose queries

Formulaic Language and Fluency: ESL Teaching Applications

Lesson 2. La Familia. Independent Learner please see your lesson planner for directions found on page 43.

Author: Justyna Kowalczys Stowarzyszenie Angielski w Medycynie (PL) Feb 2015

Minimalism is the name of the predominant approach in generative linguistics today. It was first

Parallel Evaluation in Stratal OT * Adam Baker University of Arizona

English for Life. B e g i n n e r. Lessons 1 4 Checklist Getting Started. Student s Book 3 Date. Workbook. MultiROM. Test 1 4

Compositional Semantics

Opportunities for Writing Title Key Stage 1 Key Stage 2 Narrative

The Conversational User Interface

Ch VI- SENTENCE PATTERNS.

Organizing Comprehensive Literacy Assessment: How to Get Started

Program Matrix - Reading English 6-12 (DOE Code 398) University of Florida. Reading

To appear in The TESOL encyclopedia of ELT (Wiley-Blackwell) 1 RECASTING. Kazuya Saito. Birkbeck, University of London

CELTA. Syllabus and Assessment Guidelines. Third Edition. University of Cambridge ESOL Examinations 1 Hills Road Cambridge CB1 2EU United Kingdom

Written by: YULI AMRIA (RRA1B210085) ABSTRACT. Key words: ability, possessive pronouns, and possessive adjectives INTRODUCTION

On the nature of voicing assimilation(s)

NAME: East Carolina University PSYC Developmental Psychology Dr. Eppler & Dr. Ironsmith

On the Notion Determiner

Reading Grammar Section and Lesson Writing Chapter and Lesson Identify a purpose for reading W1-LO; W2- LO; W3- LO; W4- LO; W5-

Possessive have and (have) got in New Zealand English Heidi Quinn, University of Canterbury, New Zealand

Argument structure and theta roles

Linguistics 220 Phonology: distributions and the concept of the phoneme. John Alderete, Simon Fraser University

Journal of Phonetics

Cross Language Information Retrieval

Case government vs Case agreement: modelling Modern Greek case attraction phenomena in LFG

Creating Travel Advice

Developing True/False Test Sheet Generating System with Diagnosing Basic Cognitive Ability

Inleiding Taalkunde. Docent: Paola Monachesi. Blok 4, 2001/ Syntax 2. 2 Phrases and constituent structure 2. 3 A minigrammar of Italian 3

The Prosodic (Re)organization of Determiners

Analyzing Linguistically Appropriate IEP Goals in Dual Language Programs

5. UPPER INTERMEDIATE

Some Principles of Automated Natural Language Information Extraction

CHILDREN S POSSESSIVE STRUCTURES: A CASE STUDY 1. Andrew Radford and Joseph Galasso, University of Essex

Considerations for Aligning Early Grades Curriculum with the Common Core

The Acquisition of English Grammatical Morphemes: A Case of Iranian EFL Learners

(3) Vocabulary insertion targets subtrees (4) The Superset Principle A vocabulary item A associated with the feature set F can replace a subtree X

Using a Native Language Reference Grammar as a Language Learning Tool

Lexical Collocations (Verb + Noun) Across Written Academic Genres In English

CROSS-LANGUAGE INFORMATION RETRIEVAL USING PARAFAC2

TESL /002 Principles of Linguistics Professor N.S. Baron Spring 2007 Wednesdays 5:30 pm 8:00 pm

Advanced Grammar in Use

Florida Reading Endorsement Alignment Matrix Competency 1

Developing a TT-MCTAG for German with an RCG-based Parser

Derivational: Inflectional: In a fit of rage the soldiers attacked them both that week, but lost the fight.

Using dialogue context to improve parsing performance in dialogue systems

The Structure of Relative Clauses in Maay Maay By Elly Zimmer

Generating Test Cases From Use Cases

lgarfield Public Schools Italian One 5 Credits Course Description

A Critique of Running Records

GERM 3040 GERMAN GRAMMAR AND COMPOSITION SPRING 2017

Corpus Linguistics (L615)

Participate in expanded conversations and respond appropriately to a variety of conversational prompts

A Study of Metacognitive Awareness of Non-English Majors in L2 Listening

Language Acquisition French 2016

HinMA: Distributed Morphology based Hindi Morphological Analyzer

Transcription:

Italian Sign Language (LIS) Corpus Mirko Santoro & Carlo Geraci CNRS, Institut Jean-Nicod

Roadmap The IJN Sign Language Group Our (ID-)Practice Comparing (ID-)Practices Conclusions 2

The IJN SL Group Carlo Geraci LIS Corpus (Linguistic Corpus) Atlas (Avatar project: Politecnico of Turin & Univ. of Turin) LIS4ALL (Avatar project: Politecnico of Turin & Univ. of Turin) ELISIR (Avatar project: University of Turin & Venice) Valentina Aristodemo & Mirko Santoro (& Lara Mantovan, Ca Foscari University, Venice) LIS Corpus (Linguistic Corpus) Yann Cantin Corpus-LSF-Paris (Linguistic Corpus under constrution) 3

The LIS Corpus project The PRIN 2007 project, Dimensions of Variation in Italian Sign Language (PI Caterina Donati) 165 participants (from 10 cities) 1h. of recordings each Three age groups young group between (18-30 years old) medium group between (31-54 years old) old group between (over 55 years old) Task/data type Free conversation ( 45 minutes) Wh-question elicitation task ( 5 minutes) Spontaneous narration ( 10 minutes) Picture naming task No sign Bank (yet) 4

Our template (In collaboration with Kyle Duarte) Utterance (ID-)Gloss tier 1 = Dominant hand Phonology Morphology Syntax Semantics Dominant hand phonetics (ID-)Gloss tier 2 = Non Dominant hand Non-Dominant hand phonetics 5

My first 100 signs Text type Narration Number of glosses at the ID-gloss tier The first 100 signs for each signer (16500 tokens) No sign bank Once additional tiers specify phonological and morphological properties 6

Roadmap The IJN Sign Language Group Our (ID-)Practice Comparing (ID-)Practices Conclusions 7

What do we have in mind? A research project (short-term perspective) Collect some data Quick and dirty results Publish or perish A tool for research (long-term perspective) Something that can be re-used Something that we can add knowledge to No need to publish soon 8

Conflicting perspectives The researcher view expert linguist (at least in one field) the more information the better I want it yesterday The annotator view (for the ID-GLOSS level) not necessarily a linguist (student, informant, signer, Deaf) few information maybe tomorrow Data analyser view Possibly a linguist Columns & cells with non-overlapping values 9

ID-glossing Gloss tier 1 = Dominant hand Gloss tier 2 = Non-Dominant hand Why? It is a phonological criterion (happy linguist :-) The annotator does not have to switch tier Data extraction can be done only ones What if I am interested in handedness switching? (ID-)glosses are not suited for that. Other tiers are needed 10

ID-Glossing = memory task? Some rules of thumb showing that the ideal world is not so perfect after all: 1. The task must be simple (few specific knowledge required) No long training, no long term memory overload 2. Avoid complex procedures (few things at a time) No procedural memory overload 3. Avoid conventions (the less number of symbols the better) No short term memory overload 4. Avoid ambiguities (conflict with avoid conventions) No short term memory overload 11

The task must be simple Mechanical tasks 1. Select tier 4. Select the duration 6. Enter basic annotation 8. Add extra symbols Linguistic tasks 2. Identify the sign 3. Apply criteria for sign boundaries 5. Remember basic symbols 7. Remember extra symbols 12

Identify the sign The criteria are theoretically based after Brentari (1998). We look at the dynamic component of the sign: HS change Or change Loc change In case of complex movements, we use the more proximal movement as the reference movement. 13

Symbols (i) What theory of the lexicon? Brentari and Padden (2000) Core signs Italian word: MAMMA (mummy) ID-Glossing? No special coding for lexical or phonological variants at this stage We need further levels of phonology and morphology to be spelled out MACCHINA (car) GUIDARE (drive) 14

Symbols (ii) Special signs Pronouns = IX-+ number of person (IX-1, IX-2, IX-3) Buoys = IX-LOC (+ additional info on a separate tier) Classifiers = PASSARE-CL (meaning + symbol: want more? more tiers) Fingerspelling = C-I-A-O 15

Symbols (iii) Extra Phonological information no extra information is added at the (ID-)gloss level everything is added in dependent tier(s) under phonology Extra Morphological information Pointing sign: Person & Locative function is added (IX-1, IX-2, IX-3, IX-LOC). Is it really relevant? (maybe not, definitely redundant) Negative incorporation: -NEG is added Compounds: - separates the two (or more) stems 16

Conventions (i) Basic conventions imposed by Italian morphology no inflection on verbs (infinitival forms *guid vs. guidare) adjectives always in masculine singular nouns always singular MACCHINA (car) GUIDARE (drive) 17

Conventions (ii) Special conventions/symbols CL segno-nome (= name-sign) IX? NEG - 18

Avoid ambiguities vs. avoid conventions is - ambiguous in our notation language? MOTHER-FATHER (separate compounds) C-I-A-O (separate fingerspelling) PASSARE-CL (identifies classifiers) METTERE-A-POSTO (PUT) Notice that: - is not ambiguous. It means: one single gloss is not enough to describe the sign Notice that to avoid ambiguity new symbols and new conventions are required 19

Roadmap The IJN Sign Language Group Our (ID-)Practice Comparing (ID-)Practices Conclusions 20

Comparing Phonological info BSL NGT LIS Summary 2 hands vs 1 hand Y Y Y same Pointing ø Y ø LIS and BSL are simpler Classifiers Y Y ø LIS simpler 21

Comparing Morphological info BSL NGT LIS Summary Pointing Y Ø Y NGT is simpler Compound ^ - - same Negincorporati Directional verb -NOT -NOT -NEG same Ø only1 Ø Plurality Ø PL Ø LIS and BSL are simpler than NGT LIS and BSL are simpler than NGT Classifiers Y Y Ø LIS simpler 22

Comparing Special signs BSL NGT LIS Summary Buoy sem.+buoy COUNTING- HAND- Lexical Variants IX-LOC 1, 2, 3, A, B, C, Ø Numbers ONE 1 ONE Fingerspelin g FS: WORD #:WORD W-O-R-D LIS is simpler LIS is simpler NGT is simpler LIS is more complex Pointing PT: PT IX-number IX -LOC IX-POSSnumber LIS is simpler 23

Comparing Special signs BSL NGT LIS Summary Classifiers sym+ sym+ -CL LIS is simpler Gesture+ G: % gesture same Construed Action G:CA: % -CL LIS is simpler Number sequence NINETEEN^EI GHT^NINE 1989 MILLENOVEC ENTOOTTANT ANOVE LIS and BSL are more complex Sign-names Number incorporation HOUR-FOUR HOUR-4 QUATTRO- 24 SEGNO- NOME ORA LIS is simpler LIS and BSL are more complex

Discussion Overall, LIS (ID-)glosses are simpler than BSL and NGT The task of the annotator is simpler (close to 0 interpretation of data or phenomena) LIS (ID-)glosses have more in common with BSL than with NGT More tiers are required to get the same amount of information The ELAN template is overall more complex (hide tiers is the key feature) 25

Roadmap The IJN Sign Language Group Our (ID-)Practice Comparing (ID-)Practices Conclusions 26

Conclusions Annotation is not an easy task Different and conflicting perspectives have to be taken into account even at the very basic level of (ID-)glosses Our practice avoid additional phonological info additional morphological info Linguistic phenomena are to be glossed at the relevant (dependent) tier LIS glosses are more similar to BSL than NGT 27

Many thanks to Fabio Poletti Kyle Duarte Lara Mantovan Valentina Aristodemo Yann Cantin Part of this work is realized by the contribution of ERC Advanced Grant FRONTSEM (PI: Philippe Schlenker) & Labex Funds (PI: Carlo Geraci) 28

References Brentari, Diane. 1998. A prosodic model of Sign Language Phonology. Cambridge, MA: MIT Press. Brentari, Diane & Carol A. Padden. 2000. Native and Foreign Vocabulary in American Sign Language: A Lexicon With Multiple Origins. In Diane Brentari (ed.), Foreign Vocabulary in Sign Language: A Cross-linguistic Investigation of Word Formation, 87 119. Mahwah, NJ: Lawrence Erlbaum Associates. 29

Discussion: is it simple enough? On the excessive load (10 min.) Is it possible to find a reasonable compromise? How long is the training of an (ID-)annotator before s/he can provide reliable annotations? Is the extra in the ID-Gloss is necessary? (10 min.) To what extent the use of Regular expressions in ELAN searches may help avoiding complex (ID-)practices? Can we shift the burden of complexity on the shoulders of the researcher not the annotator? 30