How does the brain acquire phonetic (and phonological) knowledge and where is it stored? Bernd J. Kröger. Thank you for the invitation!

Similar documents
1. REFLEXES: Ask questions about coughing, swallowing, of water as fast as possible (note! Not suitable for all

On the Formation of Phoneme Categories in DNN Acoustic Models

Quarterly Progress and Status Report. Voiced-voiceless distinction in alaryngeal speech - acoustic and articula

The Perception of Nasalized Vowels in American English: An Investigation of On-line Use of Vowel Nasalization in Lexical Access

Phonological and Phonetic Representations: The Case of Neutralization

Proceedings of Meetings on Acoustics

Audible and visible speech

Accelerated Learning Online. Course Outline

Mandarin Lexical Tone Recognition: The Gating Paradigm

Speech Recognition using Acoustic Landmarks and Binary Phonetic Feature Classifiers

Accelerated Learning Course Outline

DEVELOPMENT OF LINGUAL MOTOR CONTROL IN CHILDREN AND ADOLESCENTS

Unvoiced Landmark Detection for Segment-based Mandarin Continuous Speech Recognition

Consonants: articulation and transcription

Quarterly Progress and Status Report. VCV-sequencies in a preliminary text-to-speech system for female speech

Self-Supervised Acquisition of Vowels in American English

Artificial Neural Networks written examination

A Neural Network GUI Tested on Text-To-Phoneme Mapping

Class-Discriminative Weighted Distortion Measure for VQ-Based Speaker Identification

Evolution of Symbolisation in Chimpanzees and Neural Nets

SARDNET: A Self-Organizing Feature Map for Sequences

Python Machine Learning

Universal contrastive analysis as a learning principle in CAPT

Self-Supervised Acquisition of Vowels in American English

OPTIMIZATINON OF TRAINING SETS FOR HEBBIAN-LEARNING- BASED CLASSIFIERS

Perceptual Auditory Aftereffects on Voice Identity Using Brief Vowel Stimuli

To appear in the Proceedings of the 35th Meetings of the Chicago Linguistics Society. Post-vocalic spirantization: Typology and phonetic motivations

Dyslexia/dyslexic, 3, 9, 24, 97, 187, 189, 206, 217, , , 367, , , 397,

SOUND STRUCTURE REPRESENTATION, REPAIR AND WELL-FORMEDNESS: GRAMMAR IN SPOKEN LANGUAGE PRODUCTION. Adam B. Buchwald

have to be modeled) or isolated words. Output of the system is a grapheme-tophoneme conversion system which takes as its input the spelling of words,

Phonological encoding in speech production

Phonological Processing for Urdu Text to Speech System

Language Acquisition Fall 2010/Winter Lexical Categories. Afra Alishahi, Heiner Drenhaus

Lecture 2: Quantifiers and Approximation

Title:A Flexible Simulation Platform to Quantify and Manage Emergency Department Crowding

Proposal of Pattern Recognition as a necessary and sufficient principle to Cognitive Science

Neural pattern formation via a competitive Hebbian mechanism

The analysis starts with the phonetic vowel and consonant charts based on the dataset:

A Cross-language Corpus for Studying the Phonetics and Phonology of Prominence

raıs Factors affecting word learning in adults: A comparison of L2 versus L1 acquisition /r/ /aı/ /s/ /r/ /aı/ /s/ = individual sound

Probability and Statistics Curriculum Pacing Guide

The Journey to Vowelerria VOWEL ERRORS: THE LOST WORLD OF SPEECH INTERVENTION. Preparation: Education. Preparation: Education. Preparation: Education

Learners Use Word-Level Statistics in Phonetic Category Acquisition

English Language and Applied Linguistics. Module Descriptions 2017/18

REVIEW OF NEURAL MECHANISMS FOR LEXICAL PROCESSING IN DOGS BY ANDICS ET AL. (2016)

Clinical Application of the Mean Babbling Level and Syllable Structure Level

AGENDA LEARNING THEORIES LEARNING THEORIES. Advanced Learning Theories 2/22/2016

Infants learn phonotactic regularities from brief auditory experience

Rhythm-typology revisited.

Speech Segmentation Using Probabilistic Phonetic Feature Hierarchy and Support Vector Machines

Phonetics. The Sound of Language

INPE São José dos Campos

On Human Computer Interaction, HCI. Dr. Saif al Zahir Electrical and Computer Engineering Department UBC

Stages of Literacy Ros Lugg

Certified Six Sigma - Black Belt VS-1104

The Good Judgment Project: A large scale test of different methods of combining expert predictions

Perceived speech rate: the effects of. articulation rate and speaking style in spontaneous speech. Jacques Koreman. Saarland University

Pobrane z czasopisma New Horizons in English Studies Data: 18/11/ :52:20. New Horizons in English Studies 1/2016

On the nature of voicing assimilation(s)

Artificial Neural Networks

Knowledge Transfer in Deep Convolutional Neural Nets

THE RECOGNITION OF SPEECH BY MACHINE

Speech Recognition at ICSI: Broadcast News and beyond

Acoustic correlates of stress and their use in diagnosing syllable fusion in Tongan. James White & Marc Garellek UCLA

SEGMENTAL FEATURES IN SPONTANEOUS AND READ-ALOUD FINNISH

Translational Display of. in Communication Sciences and Disorders

CSC200: Lecture 4. Allan Borodin

INTRODUCTION J. Acoust. Soc. Am. 102 (3), September /97/102(3)/1891/7/$ Acoustical Society of America 1891

Beeson, P. M. (1999). Treating acquired writing impairment. Aphasiology, 13,

Design Of An Automatic Speaker Recognition System Using MFCC, Vector Quantization And LBG Algorithm

Course Outline. Course Grading. Where to go for help. Academic Integrity. EE-589 Introduction to Neural Networks NN 1 EE

Revisiting the role of prosody in early language acquisition. Megha Sundara UCLA Phonetics Lab

Tracy Dudek & Jenifer Russell Trinity Services, Inc. *Copyright 2008, Mark L. Sundberg

University of Victoria School of Exercise Science, Physical and Health Education EPHE 245 MOTOR LEARNING. Calendar Description Units: 1.

Module 12. Machine Learning. Version 2 CSE IIT, Kharagpur

Language Development: The Components of Language. How Children Develop. Chapter 6

Linguistics. Undergraduate. Departmental Honors. Graduate. Faculty. Linguistics 1

**Note: this is slightly different from the original (mainly in format). I would be happy to send you a hard copy.**

Your Partner for Additive Manufacturing in Aachen. Community R&D Services Education

Sample Goals and Benchmarks

ADDIS ABABA UNIVERSITY SCHOOL OF GRADUATE STUDIES MODELING IMPROVED AMHARIC SYLLBIFICATION ALGORITHM

Written by Joseph Chilton Pearce Thursday, 01 March :00 - Last Updated Wednesday, 25 February :34

The Thinking Hand: Embodiment of Tool Use, Social Cognition and Metaphorical Thinking and Implications for Learning Design

Researcher Development Assessment A: Knowledge and intellectual abilities

Corpus Linguistics (L615)

Spinal Cord. Student Pages. Classroom Ac tivities

QuickStroke: An Incremental On-line Chinese Handwriting Recognition System

Florida Reading Endorsement Alignment Matrix Competency 1

BABBLING STAGE CONSTRUCTION OF CHILDREN S LANGUAGE ACQUISITION ON RURAL AREA LAMPUNG

Time series prediction

Speech Perception in Dyslexic Children. With and Without Language Impairments. Franklin R. Manis. University of Southern California.

Quarterly Progress and Status Report. Sound symbolism in deictic words

Breaking the Habit of Being Yourself Workshop for Quantum University

An Acoustic Phonetic Account of the Production of Word-Final /z/s in Central Minnesota English

STT 231 Test 1. Fill in the Letter of Your Choice to Each Question in the Scantron. Each question is worth 2 point.

Human Emotion Recognition From Speech

AUTOMATIC DETECTION OF PROLONGED FRICATIVE PHONEMES WITH THE HIDDEN MARKOV MODELS APPROACH 1. INTRODUCTION

Perceptual scaling of voice identity: common dimensions for different vowels and speakers

Speaking Rate and Speech Movement Velocity Profiles

International Journal of Advanced Networking Applications (IJANA) ISSN No. :

Transcription:

How does the brain acquire phonetic (and phonological) knowledge and where is it stored? Bernd J. Kröger Neurophonetics Group Department of Phoniatrics, Pedaudiology, and Communication Disorders RWTH Aachen University, Germany and School of Computer Science and Technology Tianjin University, China Thank you for the invitation!

Preliminary Note This talk is mainly based on computer simulation experiments Using a neurocomputational model of speech production, perception, and acquisition (Kröger et al. 2009) Three working modes: Speech Acquisition: babbling and imitation (Kröger et al. 2012) phonetics Speech Production Speech Perception Hypotheses concerning brain regions physics, computer science cognitive sciences, neuroscience

Outline The Structure of the Model Speech Acquisition: How to feed in Knowledge? Related brain regions Further Work

Outline The Structure of the Model Speech Acquisition: How to feed in Knowledge? Related brain regions Further Work

Assumptions for Structure of the Model Four neural maps (layers): 4 diff. assemblies of model neurons ; Three state maps as parts of working memory (distributed motor and sensory representations) SOM as part of long term memory SOM learns: sensori-motor associations Training: leads to synaptic weight adj. random pattern generator (babbling training set) motor plan map long-term memory SOM (Kohonen) working memory t = 250 msec; auditory map execution: t = 12.5 msec somatosensory map sensory processing: t = 12.5 msec; then: temporal storage lower level productionperception loop vocal tract model Birkholz et al. (2007)

Structure of the Model vocal tract model Birkholz et al. (2007) SOM states are local: one model neuron represents a syllabic state; Case production: we need synaptic connections with same link weights back to state maps long-term memory SOM (Kohonen) working memory t = 250 msec; motor plan map auditory map somatosensory map execution: t = 12.5 msec sensory processing: t = 12.5 msec; then: temporal storage

face-to-face communication, trianngulation Structure of the Model One further extension of the model is needed: Connection between sensorimotor and cognitive modules -> four state maps Babbling: exploring my own vocal tract (learning sensorimotor-relations); three state maps (as introduced earlier) Imitation: acoustic data by external speaker + linguistic information (communication) motor plan map long-term memory SOM (Kohonen) phonemic map working memory t = 200 msec auditory map somatosensory map execution sensory processing vocal tract model external speaker

Structure of the Model After training: synaptic link weights represent the different states for each SOM neuron production: activate a SOM neuron (from top), co-activation of motor plan and auditory states perception: calculate a winner neuron (from bottom); coactivation of phonemic state knowledge is stored in the neural links

Model Neurons Model neurons: neural activation is quantified by mean activation rates within a specific time period (here: 250 msec; duration of a syllable) ; Activation rate models are simple but capable of modeling important aspects of working and long-term memory (Oberauer, 2009: memory capacity) In addition: a model neuron summarizes the activity of an assembly of real neurons (near in space, e.g. a cortical column? ); Thus: our model neurons average over space and time And: Cortical model neurons are ordered in 2D-maps map 1 map 2 map 2 (SOM) map 1 (Spitzer 2000, after Mumford 1992)

Learning Firstly, a winner neuron is identified for each training item; Hebbian learning: within a neighborhood kernel (center = winner neuron) synaptic weights w ij between SOM and state maps are updated: w ij (t+1) - w ij (t) = N j (t)*l(t)*(s i -w ij (t)) with N: neighborhood kernel around best matching unit BMU for a specific training stimulus S=(s 1, s 2,, s n ) constantly decreasing with time during learning L: learning factor constantly decreasing with time during learning i = 1,, N over all state maps (input); j = 1,, M for SOM synaptic weights w ij approach (generalized) stimulus activation pattern s i -> unsupervised learning SOM s s s Input across all state maps!

Outline Introduction: Speech is Movement! The Structure of the Model Speech Acquisition: How to feed in Knowledge? Related brain regions Further Work

Speech Acquisition Six simulation experiments for speech acquisition: Later: Three simulation experiments -> testing performance (speech production, speech perception)

List of simulation experiments: Speech Acquisition 1. Protovocalic babbling: 1076 training items; 15x15 SOM 2. Protoconsonantal babbling: 279 training items; 15x15 SOM 3. Vocalic imitation (model language) (5 vowels [i,e,a,o,u]): 500 training items; 15x15 SOM 4. Consonantal imitation (model language) (15 CV syllables [b,d,g]): 465 training items; 15x15 SOM 5. Imitation of a symmetrical model language (60 syllables: V, CV, CCV): 600 training items; 25x25 SOM [b,d,g, p,t,k, m,n, l], [bl,gl,pl,kl] (no generalization) 6. Imitation of natural language (200 most frequent syllables of Standard German): 703 training items; 25x25 SOM (no generalization) prelinguistic linguistic: artificial language linguistic: natural language 500 to 150 training cycles per experiment (babbling to imitation) (one cycle = random application of all training items) The main result: (1) association of sensory and motor states; (b) ordering of states (syllables) with respect to phonetic features (c) emergence of phoneme regions at SOM level

Experiment 1 and 3: Vowel Babbling and Imitation: Training Items [i] Red points: 1076 babbling items [a] [u]

Experiment 1 and 3: Vowel Babbling and Imitation: Training Items /i/ /e/ Red points: 1076 babbling items Imitation items: Blue squares, green diamonds : 100 realizations of each phoneme; Variability of phoneme realizations is adapted from natural data (overlap) /o/ /a/ /u/

Training Results: Phonetic Map for Vowels low high The phonetic map now associates motor plan, sensory, and phonemic states: back front /u/ /o/ /i/ /e/ /a/ After babbling: 1) An ordering occurs with respect to the vocalic dimensions back-front, low-high 2 ) an association of sensory and motor states occurs (grey bars, red lines) After imitation: Now in addition: Neuron (box) is outlined, if phonemic link weight value for a phoneme is > 0.8 (80%) That means: After imitation: in addition 3 ) an association with phonemic states occurs; phoneme regions occur (variation: exemplars)

Training Results: Phonetic Map of CV-items 15x15 phonetic map: each box represents one neuron api lab Grey bars and red lines represent neural link weights to state maps: Auditory link weights: formant transitions Motor plan link weights: 5 bars (grey) - first three: closure: lab/api/dors - last two: proto-vow.: back-front, lowhigh association of motor plan and sensory states occurs for each neuron an ordering occurs with respect to 1 ) place of articulation lab/api/dor dor

Training Results: Phonetic Map of CV-items 15x15 phonetic map: each box represents one neuron front back back low api back lab Grey bars and red lines represent neural link weights to state maps: Auditory link weights: formant transitions Motor plan link weights: 5 bars (grey) - first three: closure: lab/api/dors - last two: proto-vow.: back-front, lowhigh association of motor plan and sensory states occurs for each neuron low dor front an ordering occurs with respect to 1 ) place of articulation lab/api/dor 2 ) proto-vocalic dimensions: low-high/ front-back (for each consonantal place) low front

i e lab V Phonetic Map of Model Language CV CCV nas a u o dor lat api plos voiced voiceless Model Language: V = /i, e, a, o, u/ C = plosives /p, t, k, b, d, g/, C = nasals /m, n/, and C = lateral /l/ CCV: first C = plosives /b, g, p, k/; second C = lateral 60 syllables with 10 realizations per syllable 600 stimuli; exposed to the network 10 times each Result: strong phonetic ordering: 1 ) V-, CV-, and CCV-regions are separated ; 2 ) place and manner of articulation; 3 ) vowels, 4 ) voice Strong phonetic ordering results, because the model language is completely symmetric, i.e. same frequencies for all syllables / phoneme-combinations

Experiment 6: Training a Natural Language Children s book data base: 40 books (til 6 years): Standard German (transcription) 6513 sentences; 70512 words 8217 different words; 4763 different syllables 200 most frequent syllables realized by one speaker (sentences) 27 to one times 703 realizations (prop. to frequency) articulatory resynthesis 703 motor plan states and appropriate sensory states; 300 exposures to the network per training item 210900 training steps; Rank of Syllable Frequency in Corpus Number of Training- Items 1 2367 27 20 692 8 50 390 4 100 193 2 200 88 1

Hypermodal Phonetic Map phonetic map 25x25 SOM after training: zoom in Neurons are marked, if they represent a phonemic state: (exhibitory synaptic weight > 80% )

Hypermodal Phonetic Map @-cluster C1: place of articulation CVC-region More than one SOM neuron may represent a syllable (different realizations) C1: manner: plosive A weak ordering of syllables with respect to phon. features CV-region C1: manner: fricative C1: manner: nasal CCV-region phonetic features occur at the level of this SOM

Hypermodal Phonetic Map @-cluster C1: place of articulation stronger [e]-f2 CVC-region Display of link weights to auditory state map C1: manner: plosive less phonation stores how a syllable sounds (audit. memory) CV-region C1: manner: nasal C1: manner: fricative CCV-region

Hypermodal Phonetic Map @-cluster C1: place of articulation longer [e]-activation CVC-region less phonation C1: manner: plosive Display of link weights to motor plan state map (for the same SOM neurons) (motor plan repository) CV-region C1: manner: nasal C1: manner: fricative CCV-region

Hypermodal Phonetic Map phonetic map 25x25 SOM after training: A 2 nd training: learning parameters (learning rate and neighborhood kernel factor ) are slightly changed in orde to get less gaps in the map Link weights to phonemic map

Hypermodal Phonetic Map phonetic map 25x25 SOM after training: A 2 nd training: learning parameters (learning rate and neighborhood kernel factor ) are slightly changed in orde to get less gaps in the map Link weights to auditory map

Hypermodal Phonetic Map phonetic map 25x25 SOM after training: A 2 nd training: learning parameters (learning rate and neighborhood kernel factor ) are slightly changed in orde to get less gaps in the map Link weights to motor plan map

number of neurons number of neurons Training Results: Exemplar Representation Number of SOM neurons representing a syllable is proportional to number of training items for that syllable (syllable frequency in target language): Neural plasticity: more stored exemplars for frequent syllables (require more space in the brain) number of syllables: number of training items number of training items Kannampuzha (2012)

number of syllables Learning Curve Number of syllables already learned by SOM as function of training cycles: should be a less abrupt increase need Growing SOMs number of cycles Kannampuzha (2012)

Training Results: Performance Training is stopped if production and perception is learned (i.e. each syllable is represented in SOM by phonemic link weight > 0.8) Production (states represented by SOM neurons): Identification rate of 96% for 50 most frequent syllables (done by one subject) Perception (done by the model itself): Here, test items, different from training items (same speaker) are identified: 92% identification rate for 50 most frequent syllables: identification rate drops for less frequent syllables (time normalization needs to be included) Results from optimal training data sets Perception: Replication of important behavioral phenomena: categorical perception is stronger for CV than for V (Kröger et al. 2009, Speech Communication 51: 793-809; needs: training of 20 different instances of the model; 20 different virtual listeners )

Categorical Perception nonlinear relation between acoustic and perceptual domain: regions with perceptual constancy -> preferred as phoneme regions Phoneme regions are identified by identification experiments Phoneme boundaries -> have better discrimination of equidistant stimuli (peaks) -> discrimination experiments (e.g. ABX-experiments: is A=X or B=X?) ga da ba

Categorical Perception Basis for experiments : an acoustically equidistant stimulus set / continuum (for V and CV) a pool of around 20 listeners for performing the experiments (we trained 20 instances of the model!) Modeling: Identification: a SOM winner neuron activates a phonemic state Discrimination is assumed to increase with increase in physical distance of activated states within the SOM

Categorical Perception Two stimulus continua for V: from /i/ to /a/ and for CV from /ba/ to /ga/ Typical results: stronger categorical perception for CV than for V (see phoneme boundaries from measured discrimination!) V = /i e a/ CV = /ba da ga/ discrimi -nation identification calculated discrimi -nation 13 V- and CV-Stimuli [i] [e] [a] [ba] [da] [ga] interpolation interpolation interpolation interpolation

Categorical Perception Behavioral data (Pompino-Marschall 1995): adapted from Stevens et al. (1969) Modeling (Kröger et al. 2009):

The V-stimuli are continuously distributed within the V-SOM space The CV-stimuli are more clusterd within the CV-SOM space (display of one of 20 brains ) May result from topological ordering of phonetic features: 3 feature dimensions in 2 anatomical brain dimensions for CV is difficult one big cluster supramodal phonetic map including phoneme regions Why? Answer from modeling: V = /i e a o u/ three small clusters CV = /ba da ga/ 13 V- and CV-stimuli [i] [e] [a] [ba] [da] [ga]

Outline Introduction: Speech is Movement! The Structure of the Model Speech Acquisition: How to feed in Knowledge? Related brain regions (not published thus far!) Further Work

face-to-face communication, trianngulation Related brain regions: Where are the maps located? A hypotheses! 4 state maps (working memory) One SOM (long term memory) motor plan map long-term memory SOM (Kohonen) phonemic map working memory t = 200 msec auditory map somatosensory map execution sensory processing vocal tract model external speaker

neural pathway; just copying / mirror activation patterns one to one long distance (arcuatis) Hypothetical Cortical Regions associated with specific neural maps in our model primary maps (in PAs) frontal state maps (in UAAs and HAAs) following Guenther (2006): error maps phonetic map (SOM): need to be close to all state maps (complex mappings); only solution: two Hubs: somatosensory parietal neural mappings each to each motor plan auditory phonemic occipital temporal LA: limbic area PA: primary area UA: unimodal association HA: heteromodal assoc. Prosiegel & Paulig (2002)

mirroring pathway (long distance) Structure of the Model Goal: Verification of the model structure by imaging experiments heteromodal cortical areas phonemic map unimodal association areas motor plan map frontal phonetic map temporal auditory map unimodal association areas primary cortical somatosensory map parietal peripheral vocal tract model external speaker

Outline Introduction: Speech is Movement! The Structure of the Model Speech Acquisition: How to feed in Knowledge? Related brain regions Further Work

Further Work More realistic (non-ideal) settings for getting training data: including imperfect imitation ; including different speakers for imitation (how does speaker normalization take place in the model?) Growing SOM approach! (acquisition -> maps grow with input) Underpinning the model by more behavioral and brain imaging data (e.g. imaging studies: Eckers, Heim, Kröger)

Acknowledgements Jim Kannampuzha (Dipl.- Inf.) programming Dept. Phoniatrics, Pedaudiology, and Communication Disorders, RWTH Aachen University; Now: Head Acoustics GmbH, Aachen Cornelia Eckers (M.Sc.) fmri experiments Dept. Phoniatrics, Pedaudiology, and Communication Disorders, RWTH Aachen University;

Please add a realistic brain model! Thank you! Literature: www.speechtrainer.eu