Landmark in Chinese CAPT Xie Yanlu Beijing Language and Culture University
Outline English landmark Methods to select Chinese landmark Experiments in Chinese CAPT Discussion
06// 3
Objective in using computer aided pronunciation training (CAPT) Basic fact: learner's erroneous sound always deviates a little from the canonical sound. Lip: spread Pinyin: e rounding o Rounding e sound: e{o} Spreading o sound: o{w} Mispronunciation detection is a typical distinctive feature selection problem
Quantal nonlinearities High-Slope Nonlinearities are Natural Category Boundaries (Stevens, 989) Acoustics I Ċ ċ Articulation Stable region Stable region Natural category = robustness to noise and variation, therefore languages tend to choose natural boundaries as their distinctive features. 06// 5
Nonlinear Map from Acoustic Features to Perceptual Features (Kuhl 99)
Consonant Confusions at -6dB SNR P T K F TH S SH B D P 80 43 64 7 4 6 T 7 84 55 5 9 3 8 K 66 76 07 8 9 4 F 8 9 75 48 7 TH 9 7 6 04 64 3 7 5 4 5 6 4 5 S 8 5 4 3 39 07 45 4 3 3 SH 6 3 4 6 9 95 B 5 4 4 D 8 G V DH 6 Z ZH M N G DH Z ZH M N 3 36 0 9 47 6 6 5 5 80 45 0 0 6 3 63 66 3 9 37 56 48 5 5 45 45 3 6 7 86 58 5 7 0 7 6 8 94 44 6 8 3 8 45 9 4 3 7 4 4 V 5 Distinctive Features: ±nasal, ±voiced, ±fricative, ±strident 4 3 4 6 6 4 77 46 47 63
Pronunciation Erroneous Tendency (PET) Confusions in CAPT raising lowering advancing backing lengthening shortening centralizing rounding spreading labiodentalizing laminalizing devoicing voicing insertion deletion stopping fricativizing lateral nasalizing PET Diacritic s Spreading w Backing - Shorting ; Laminalizi ng sh E.g. Notation Round sound u has a spreading lip The tongue position n{-} of phoneme is a little back The aspiration p{;} duration of phoneme p is shorter Balade-palatal phoneme sh is sh{sh} pronounced as Japanese laminaalveolar u{w}
Confusions in CAPT PET Laminalizin g Backing Spreading Shorting 06// Diacritics PET sh zh ch sh x zh z j ch q q6 en x sh j x sh an an ang e v v j ang ang ing ing u u iu q6 f f eng q eng ang q j i sh zh sh k k g r r uo uo 9
Phonetic landmark A phonetic landmark is an instantaneous speech event that is perceptually salient ( salient" = easy to detect), and that has high information density about the message the speaker wishes to communicate. 06// 0
Landmarks are Redundant Stevens, 999 backed To recognize a stop consonant, it is necessary and sufficient to hear any one of these: Release into vowel Closure from vowel Ejective burst three acoustic landmarks with very different spectral patterns.
landmark locations Four different candidate landmark locations: 06// the temporal midpoint of the vowel the boundary between the vowel and the consonant the middle of the consonant the boundary between the consonant and its following segment
Englsih Landmark ) For all vowel -type phones (usually has labels that starts with the letters a, e, i, o, u, for example, [ih], [ae], etc.) => Find the middle of the interval = (start time + end time)/ and put a V landmark ) For all glide-type phones ( [h], [w], [y], [r], [l] ) => find the middle of the interval, and put a G landmark 3) For all nasal-type phones ( [m], [n], [ng] ) => at the start time, put the Nc landmark, and at the end time, put the Nr landmark 4) For all stop-closure phones ( [b-cl], [d-cl], etc.) => at the start time, put the Sc landmark 5) For all stop-type phones ( [b], [d], etc.) => at the start time, put the Sr landmark 6) For all fricative-type phones ( [v], [dh], [z], etc.) => at the start time, put the Fc landmark, and at the end time, put the Fr landmark 7) For all affricate-type phones ([jh] or [dj], [ch] ) => at the start time, put the Sr landmark, and also put the Fc landmark, and at the end time, put the Fr landmark 06// 3
How to find Chinese landmark Refer to English Landmark in IPA Perception Observation Intuition/Guess? 06// 4
How to find Chinese landmark English landmark in CAPT IPA projection Chinese landmark in CAPT Nasal: an/ang en/eng in/ing Dorsal: j q x k/z c s Vowel: v u eng r uo Zh/ch 06// sh zh ch x j an v ang ing u f eng q k r uo 5
How to find Chinese landmark: perception of modified speech pure vowel nasalized vowel nasal consonant IV+t-N I V T I V T IV-T+N I V N I V N IV-T+n I V N I V N IV+t-N: nasal consonant is cut and nasalized vowel is exchanged, IV-T+N: nasalized vowel is cut, IV-T+n: nasalized vowel is cut and nasal consonant is exchanged
/ban/ vs /bang/ ban V T N bang V T N IV+t-N 06// Revised Revised V V T T ban V T N bang V T Revised3 N IV-T+N V N ban Revised4 V N V T N bang V T N Revised5 Revised6 V V N IV-T+n IV+t-N: nasal consonant is cut and nasalized vowel is exchanged, IV-T+N: nasalized vowel is cut, IV-T+n: nasalized vowel is cut and nasal consonant is exchanged 7 N
the nasalized vowels play a dominating role in perception 06// 8
How to find Chinese landmark Dorsal Dorsal 06// 9
following vowel landmark T and VOT (Wu 989) Coarticulation (Öhman 966) Initial C, first V, T and P all start at the syllable onset (Xu 006) We cannot explain the result of Dorsal Due to the landmark? Or due to the coarticulation? 06// 0
Englsih Landmark & Chinese Landmark 06//
System validation Text #speakers 30 utterances 899 #phonemes 643 Average 4 utterance #kinds of specific PETs F Score true positive rate (TPR) positive rate (FPR). 7 females #utterances length per 65 Receiver Operating Characteristic (ROC): Receiver Operating Characteristic (ROC) metric that formulates the relationship between true positive rate (TPR) and false positive rate (FPR).
Phonetic Labels
Best acoustic cues selected for individual phones 06// 4
Landmark: onset of vowel Nearly the same Eng>Chn Chn>Eng Receiver Operating Characteristic (ROC) 06// 5
Landmark-: following vowel Eng>Chn Chn>Eng 06// Eng>Chn Eng>Chn 6
Discussion English landmarks locating at both start and end of durations for most of the 6 phones slightly outperformed Chinese landmarks that was defined by the empirical analysis of error pairs in the large scale corpus. Chinese landmarks might lose some significant information on discriminating pronunciation errors especially for the nasal phones and fricative phones. 06// 7
Convolution Forgetting Curve Model Xie Yanlu Beijing Language and Culture University
Outline Introduction Exponential shape forgetting curve model Convolution Forgetting Curve Model Experiments in cognitive learning
the procedure of memory(ebbinghaus H,93) f (t ) a exp( at ) a3 exponential function in forgetting (Wixted, J. T., etc 99) f (t ) a exp( t / T ) a exp( t / T ) a3 (Rubin, David, C.etc 999) Quantitative Description Mathematical Description
Exponential shape forgetting curve model Forgetting curve from University of Waterloo
Procedure of convolution memory model (Baddeley AD.000) Central Executive Input Visuo-spatial sketchpad Episodic Buffer Long term memory Phonological loop Output
Convolution Forgetting Curve Model Long-term memory conformation is the result of interaction of input and the central executive in the working memory. In consideration of the relationship between stimulation (study) and memory, it is alike interaction of signal and system in circuit theory y t f h t d f (t )* h(t )
One time learning convolution model (OCM) y t (t )* h(t ) h(t ) Parameters represent the personal intrinsic characteristic of the learner y t h(t ) a exp( at ) a3
Repeated learning convolution model (RCM) N N y t f (t ntn ) * h(t ) n y t (t Tn ) * h(t ) n N h(t Tn ) n N y t a exp a (t Tn ) Na3 n
General repeated learning convolution model (GRCM) 450 N N y t u (t Tn ) u (t Tn ) * h(t ) u (t Tn ) u(t Tn ) * a exp( at ) a3 n n 400 350 300 50 00 a 0 t a exp( a (t Tn )) a3 (t Tn ) n N a exp( a (t T )) exp(a ) a N t n 3 n a N 50 00 50 0 0 000 000 3000 4000 5000 6000 7000 8000 9000 0000 500 450 400 350 300 50 00 50 00 50 0 0 000 000 3000 4000 5000 6000 7000 8000 9000 0000
Perceptual training Day Pretest Day Synthesized F0 + continuity samples 06// Midtest Adaptive training Mandarin perception pattern Day 7 High variability training Post test Single syllable database 37
Experiments in cognitive learning The test materials are all the same 60/0 natural words, which are voiced by native speaker. Learners are forced to judge the words tone in 5 minutes. The probability of recall for the experiments of 60 trails Learn er 3 7 \Day 0.60 0.75 0.80 3 0.87 0.75 0.9 4 0.68 0.85 0.85 5 0.85 0.93 0.95 6 0.93 0.98 0.98 8 0.97 0.95 0.9 0.97 0.98 3 0.87 0.98 0.98 Avg 0.843 0.899 0.94 The probability of recall for the experiments of 0 trails Learn er 3 4 5 6 \Day 0.75 0.6 0.75 0.95 0.9 0.8 0.9 0.9 0.95 3 0.95 0.65 0.95 0.9 4 0.65 0.85 0.9 0.85 0.9 0.9 5 0.85 0.95 0.75 0.65 0.85 6 0.8 0.95 7 0.9 0.95 0.75 0.85 0.85 0.95 8 0.85 0.8 0.95 0.95 0.95 9 0.9 0.95 0.85 0.85 0.9 0.9 0 0.85 0.9 0.95 0.95 0.9 0.9 0.85 0.9 0.9 0.9 Avg 0.85 0.86 0.88 0.9 0.93 0.94
Experiments in cognitive learning 0.98 formula () day data3 data7 formula () probability of recall 0.96 The calculated a, a, a3 with 60 trails tests result (Averaged) 0.94 0.9 0.9 0.88 0.86 MSE of day and3 MSE of day 7 r 0.00 0.00 0.003 0.05 0.004 0 0 0.984 0.8 formul a MSE a a a3 0.84 0 0 0 half day 30 0.3 0.4 0 0.4 The forgetting curves of convolution model (average) 40
Experiments in cognitive learning 0.95 The MSE and r with 0 trails tests result Train day formula MSE of all day MSE of train day MSE of predict day r 0.03 0 0.08 0.08 0 0 0.563 0.906 and 0.08 0.0 0.07 0.570, and 3 0.05 0.003 0 0.00,,3 and 4,,3,4 and 5,, 3, 4,5 and 6 0.007 0.003 0.004 0.003 0.00 0.00 0.00 0.003 0.00 0.00 0.003 0.00 0.00 0.00 0.0 0.006 0.04 0.008 0.00 0.00 0.00 0 0 0.93 0.99 0.93 0.99 0.966 0.99 0.98 0.977 0.98 formula () day day day3 day4 day5 day6 formula () probability of recall 0.9 0.85 0.8 0.75 0 5 0 5 0 5 30 35 40 half day 06// 40
Discussion Improving the traditional forgetting curve model. With few memory data, the individual s forgetting curve can be drawn. Providing a certain basis to design better teaching methods. Some factors that affect the phonetic teaching performance can be analyzed.
Thank you for your attention! Any questions?