Landmark in Chinese CAPT

Similar documents
On the Formation of Phoneme Categories in DNN Acoustic Models

Unvoiced Landmark Detection for Segment-based Mandarin Continuous Speech Recognition

Speech Recognition using Acoustic Landmarks and Binary Phonetic Feature Classifiers

Speech Segmentation Using Probabilistic Phonetic Feature Hierarchy and Support Vector Machines

Mandarin Lexical Tone Recognition: The Gating Paradigm

Quarterly Progress and Status Report. Voiced-voiceless distinction in alaryngeal speech - acoustic and articula

1. REFLEXES: Ask questions about coughing, swallowing, of water as fast as possible (note! Not suitable for all

Phonology Revisited: Sor3ng Out the PH Factors in Reading and Spelling Development. Indiana, November, 2015

SEGMENTAL FEATURES IN SPONTANEOUS AND READ-ALOUD FINNISH

Consonants: articulation and transcription

Dyslexia/dyslexic, 3, 9, 24, 97, 187, 189, 206, 217, , , 367, , , 397,

Universal contrastive analysis as a learning principle in CAPT

The Perception of Nasalized Vowels in American English: An Investigation of On-line Use of Vowel Nasalization in Lexical Access

Pobrane z czasopisma New Horizons in English Studies Data: 18/11/ :52:20. New Horizons in English Studies 1/2016

A Cross-language Corpus for Studying the Phonetics and Phonology of Prominence

Phonological and Phonetic Representations: The Case of Neutralization

To appear in the Proceedings of the 35th Meetings of the Chicago Linguistics Society. Post-vocalic spirantization: Typology and phonetic motivations

Phonological Processing for Urdu Text to Speech System

Rhythm-typology revisited.

Phonetics. The Sound of Language

Segregation of Unvoiced Speech from Nonspeech Interference

Lip reading: Japanese vowel recognition by tracking temporal changes of lip shape

Perceived speech rate: the effects of. articulation rate and speaking style in spontaneous speech. Jacques Koreman. Saarland University

Rachel E. Baker, Ann R. Bradlow. Northwestern University, Evanston, IL, USA

Contrastiveness and diachronic variation in Chinese nasal codas. Tsz-Him Tsui The Ohio State University

Proceedings of Meetings on Acoustics

Automatic English-Chinese name transliteration for development of multilingual resources

**Note: this is slightly different from the original (mainly in format). I would be happy to send you a hard copy.**

Eli Yamamoto, Satoshi Nakamura, Kiyohiro Shikano. Graduate School of Information Science, Nara Institute of Science & Technology

Christine Mooshammer, IPDS Kiel, Philip Hoole, IPSK München, Anja Geumann, Dublin

Linguistics 220 Phonology: distributions and the concept of the phoneme. John Alderete, Simon Fraser University

On the nature of voicing assimilation(s)

Consonant-Vowel Unity in Element Theory*

Quarterly Progress and Status Report. VCV-sequencies in a preliminary text-to-speech system for female speech

Journal of Phonetics

A Neural Network GUI Tested on Text-To-Phoneme Mapping

Books Effective Literacy Y5-8 Learning Through Talk Y4-8 Switch onto Spelling Spelling Under Scrutiny

Speech Recognition at ICSI: Broadcast News and beyond

Different Task Type and the Perception of the English Interdental Fricatives

The analysis starts with the phonetic vowel and consonant charts based on the dataset:

DIBELS Next BENCHMARK ASSESSMENTS

Learning Methods in Multilingual Speech Recognition

NCU IISR English-Korean and English-Chinese Named Entity Transliteration Using Different Grapheme Segmentation Approaches

Why Is the Chinese Curriculum Difficult for Immigrants Children from Southeast Asia

Richardson, J., The Next Step in Guided Writing, Ohio Literacy Conference, 2010

The NICT/ATR speech synthesis system for the Blizzard Challenge 2008

Parallel Evaluation in Stratal OT * Adam Baker University of Arizona

Florida Reading Endorsement Alignment Matrix Competency 1

SARDNET: A Self-Organizing Feature Map for Sequences

age, Speech and Hearii

STUDIES WITH FABRICATED SWITCHBOARD DATA: EXPLORING SOURCES OF MODEL-DATA MISMATCH

Learners Use Word-Level Statistics in Phonetic Category Acquisition

Body-Conducted Speech Recognition and its Application to Speech Support System

Language Acquisition by Identical vs. Fraternal SLI Twins * Karin Stromswold & Jay I. Rifkin

Stages of Literacy Ros Lugg

Automatic Pronunciation Checker

Journal of Phonetics

On Developing Acoustic Models Using HTK. M.A. Spaans BSc.

Underlying Representations

First Grade Curriculum Highlights: In alignment with the Common Core Standards

Radical CV Phonology: the locational gesture *

Atypical Prosodic Structure as an Indicator of Reading Level and Text Difficulty

Revisiting the role of prosody in early language acquisition. Megha Sundara UCLA Phonetics Lab

Disambiguation of Thai Personal Name from Online News Articles

Houghton Mifflin Reading Correlation to the Common Core Standards for English Language Arts (Grade1)

1 st Quarter (September, October, November) August/September Strand Topic Standard Notes Reading for Literature

WHEN THERE IS A mismatch between the acoustic

DEVELOPMENT OF LINGUAL MOTOR CONTROL IN CHILDREN AND ADOLESCENTS

Acoustic correlates of stress and their use in diagnosing syllable fusion in Tongan. James White & Marc Garellek UCLA

Detecting English-French Cognates Using Orthographic Edit Distance

The ABCs of O-G. Materials Catalog. Skills Workbook. Lesson Plans for Teaching The Orton-Gillingham Approach in Reading and Spelling

Class-Discriminative Weighted Distortion Measure for VQ-Based Speaker Identification

The Acquisition of English Intonation by Native Greek Speakers

AUTOMATIC DETECTION OF PROLONGED FRICATIVE PHONEMES WITH THE HIDDEN MARKOV MODELS APPROACH 1. INTRODUCTION

Statewide Framework Document for:

Speech Synthesis in Noisy Environment by Enhancing Strength of Excitation and Formant Prominence

An Acoustic Phonetic Account of the Production of Word-Final /z/s in Central Minnesota English

Demonstration of problems of lexical stress on the pronunciation Turkish English teachers and teacher trainees by computer

OCR for Arabic using SIFT Descriptors With Online Failure Prediction

Word Stress and Intonation: Introduction

Speech Emotion Recognition Using Support Vector Machine

Running head: DELAY AND PROSPECTIVE MEMORY 1

Probability and Statistics Curriculum Pacing Guide

source or where they are needed to distinguish two forms of a language. 4. Geographical Location. I have attempted to provide a geographical

Weave the Critical Literacy Strands and Build Student Confidence to Read! Part 2

A study of speaker adaptation for DNN-based speech synthesis

Learning to Read and Spell Words:

Speech Recognition by Indexing and Sequencing

Comparison Between Three Memory Tests: Cued Recall, Priming and Saving Closed-Head Injured Patients and Controls

The Bruins I.C.E. School

PHONETIC DISTANCE BASED ACCENT CLASSIFIER TO IDENTIFY PRONUNCIATION VARIANTS AND OOV WORDS

Speaker Recognition. Speaker Diarization and Identification

Taught Throughout the Year Foundational Skills Reading Writing Language RF.1.2 Demonstrate understanding of spoken words,

Correspondence between the DRDP (2015) and the California Preschool Learning Foundations. Foundations (PLF) in Language and Literacy

A survey of intonation systems

A Minimalist Approach to Code-Switching. In the field of linguistics, the topic of bilingualism is a broad one. There are many

ADDIS ABABA UNIVERSITY SCHOOL OF GRADUATE STUDIES MODELING IMPROVED AMHARIC SYLLBIFICATION ALGORITHM

arxiv: v1 [cs.cl] 2 Apr 2017

THE RECOGNITION OF SPEECH BY MACHINE

Math 96: Intermediate Algebra in Context

Transcription:

Landmark in Chinese CAPT Xie Yanlu Beijing Language and Culture University

Outline English landmark Methods to select Chinese landmark Experiments in Chinese CAPT Discussion

06// 3

Objective in using computer aided pronunciation training (CAPT) Basic fact: learner's erroneous sound always deviates a little from the canonical sound. Lip: spread Pinyin: e rounding o Rounding e sound: e{o} Spreading o sound: o{w} Mispronunciation detection is a typical distinctive feature selection problem

Quantal nonlinearities High-Slope Nonlinearities are Natural Category Boundaries (Stevens, 989) Acoustics I Ċ ċ Articulation Stable region Stable region Natural category = robustness to noise and variation, therefore languages tend to choose natural boundaries as their distinctive features. 06// 5

Nonlinear Map from Acoustic Features to Perceptual Features (Kuhl 99)

Consonant Confusions at -6dB SNR P T K F TH S SH B D P 80 43 64 7 4 6 T 7 84 55 5 9 3 8 K 66 76 07 8 9 4 F 8 9 75 48 7 TH 9 7 6 04 64 3 7 5 4 5 6 4 5 S 8 5 4 3 39 07 45 4 3 3 SH 6 3 4 6 9 95 B 5 4 4 D 8 G V DH 6 Z ZH M N G DH Z ZH M N 3 36 0 9 47 6 6 5 5 80 45 0 0 6 3 63 66 3 9 37 56 48 5 5 45 45 3 6 7 86 58 5 7 0 7 6 8 94 44 6 8 3 8 45 9 4 3 7 4 4 V 5 Distinctive Features: ±nasal, ±voiced, ±fricative, ±strident 4 3 4 6 6 4 77 46 47 63

Pronunciation Erroneous Tendency (PET) Confusions in CAPT raising lowering advancing backing lengthening shortening centralizing rounding spreading labiodentalizing laminalizing devoicing voicing insertion deletion stopping fricativizing lateral nasalizing PET Diacritic s Spreading w Backing - Shorting ; Laminalizi ng sh E.g. Notation Round sound u has a spreading lip The tongue position n{-} of phoneme is a little back The aspiration p{;} duration of phoneme p is shorter Balade-palatal phoneme sh is sh{sh} pronounced as Japanese laminaalveolar u{w}

Confusions in CAPT PET Laminalizin g Backing Spreading Shorting 06// Diacritics PET sh zh ch sh x zh z j ch q q6 en x sh j x sh an an ang e v v j ang ang ing ing u u iu q6 f f eng q eng ang q j i sh zh sh k k g r r uo uo 9

Phonetic landmark A phonetic landmark is an instantaneous speech event that is perceptually salient ( salient" = easy to detect), and that has high information density about the message the speaker wishes to communicate. 06// 0

Landmarks are Redundant Stevens, 999 backed To recognize a stop consonant, it is necessary and sufficient to hear any one of these: Release into vowel Closure from vowel Ejective burst three acoustic landmarks with very different spectral patterns.

landmark locations Four different candidate landmark locations: 06// the temporal midpoint of the vowel the boundary between the vowel and the consonant the middle of the consonant the boundary between the consonant and its following segment

Englsih Landmark ) For all vowel -type phones (usually has labels that starts with the letters a, e, i, o, u, for example, [ih], [ae], etc.) => Find the middle of the interval = (start time + end time)/ and put a V landmark ) For all glide-type phones ( [h], [w], [y], [r], [l] ) => find the middle of the interval, and put a G landmark 3) For all nasal-type phones ( [m], [n], [ng] ) => at the start time, put the Nc landmark, and at the end time, put the Nr landmark 4) For all stop-closure phones ( [b-cl], [d-cl], etc.) => at the start time, put the Sc landmark 5) For all stop-type phones ( [b], [d], etc.) => at the start time, put the Sr landmark 6) For all fricative-type phones ( [v], [dh], [z], etc.) => at the start time, put the Fc landmark, and at the end time, put the Fr landmark 7) For all affricate-type phones ([jh] or [dj], [ch] ) => at the start time, put the Sr landmark, and also put the Fc landmark, and at the end time, put the Fr landmark 06// 3

How to find Chinese landmark Refer to English Landmark in IPA Perception Observation Intuition/Guess? 06// 4

How to find Chinese landmark English landmark in CAPT IPA projection Chinese landmark in CAPT Nasal: an/ang en/eng in/ing Dorsal: j q x k/z c s Vowel: v u eng r uo Zh/ch 06// sh zh ch x j an v ang ing u f eng q k r uo 5

How to find Chinese landmark: perception of modified speech pure vowel nasalized vowel nasal consonant IV+t-N I V T I V T IV-T+N I V N I V N IV-T+n I V N I V N IV+t-N: nasal consonant is cut and nasalized vowel is exchanged, IV-T+N: nasalized vowel is cut, IV-T+n: nasalized vowel is cut and nasal consonant is exchanged

/ban/ vs /bang/ ban V T N bang V T N IV+t-N 06// Revised Revised V V T T ban V T N bang V T Revised3 N IV-T+N V N ban Revised4 V N V T N bang V T N Revised5 Revised6 V V N IV-T+n IV+t-N: nasal consonant is cut and nasalized vowel is exchanged, IV-T+N: nasalized vowel is cut, IV-T+n: nasalized vowel is cut and nasal consonant is exchanged 7 N

the nasalized vowels play a dominating role in perception 06// 8

How to find Chinese landmark Dorsal Dorsal 06// 9

following vowel landmark T and VOT (Wu 989) Coarticulation (Öhman 966) Initial C, first V, T and P all start at the syllable onset (Xu 006) We cannot explain the result of Dorsal Due to the landmark? Or due to the coarticulation? 06// 0

Englsih Landmark & Chinese Landmark 06//

System validation Text #speakers 30 utterances 899 #phonemes 643 Average 4 utterance #kinds of specific PETs F Score true positive rate (TPR) positive rate (FPR). 7 females #utterances length per 65 Receiver Operating Characteristic (ROC): Receiver Operating Characteristic (ROC) metric that formulates the relationship between true positive rate (TPR) and false positive rate (FPR).

Phonetic Labels

Best acoustic cues selected for individual phones 06// 4

Landmark: onset of vowel Nearly the same Eng>Chn Chn>Eng Receiver Operating Characteristic (ROC) 06// 5

Landmark-: following vowel Eng>Chn Chn>Eng 06// Eng>Chn Eng>Chn 6

Discussion English landmarks locating at both start and end of durations for most of the 6 phones slightly outperformed Chinese landmarks that was defined by the empirical analysis of error pairs in the large scale corpus. Chinese landmarks might lose some significant information on discriminating pronunciation errors especially for the nasal phones and fricative phones. 06// 7

Convolution Forgetting Curve Model Xie Yanlu Beijing Language and Culture University

Outline Introduction Exponential shape forgetting curve model Convolution Forgetting Curve Model Experiments in cognitive learning

the procedure of memory(ebbinghaus H,93) f (t ) a exp( at ) a3 exponential function in forgetting (Wixted, J. T., etc 99) f (t ) a exp( t / T ) a exp( t / T ) a3 (Rubin, David, C.etc 999) Quantitative Description Mathematical Description

Exponential shape forgetting curve model Forgetting curve from University of Waterloo

Procedure of convolution memory model (Baddeley AD.000) Central Executive Input Visuo-spatial sketchpad Episodic Buffer Long term memory Phonological loop Output

Convolution Forgetting Curve Model Long-term memory conformation is the result of interaction of input and the central executive in the working memory. In consideration of the relationship between stimulation (study) and memory, it is alike interaction of signal and system in circuit theory y t f h t d f (t )* h(t )

One time learning convolution model (OCM) y t (t )* h(t ) h(t ) Parameters represent the personal intrinsic characteristic of the learner y t h(t ) a exp( at ) a3

Repeated learning convolution model (RCM) N N y t f (t ntn ) * h(t ) n y t (t Tn ) * h(t ) n N h(t Tn ) n N y t a exp a (t Tn ) Na3 n

General repeated learning convolution model (GRCM) 450 N N y t u (t Tn ) u (t Tn ) * h(t ) u (t Tn ) u(t Tn ) * a exp( at ) a3 n n 400 350 300 50 00 a 0 t a exp( a (t Tn )) a3 (t Tn ) n N a exp( a (t T )) exp(a ) a N t n 3 n a N 50 00 50 0 0 000 000 3000 4000 5000 6000 7000 8000 9000 0000 500 450 400 350 300 50 00 50 00 50 0 0 000 000 3000 4000 5000 6000 7000 8000 9000 0000

Perceptual training Day Pretest Day Synthesized F0 + continuity samples 06// Midtest Adaptive training Mandarin perception pattern Day 7 High variability training Post test Single syllable database 37

Experiments in cognitive learning The test materials are all the same 60/0 natural words, which are voiced by native speaker. Learners are forced to judge the words tone in 5 minutes. The probability of recall for the experiments of 60 trails Learn er 3 7 \Day 0.60 0.75 0.80 3 0.87 0.75 0.9 4 0.68 0.85 0.85 5 0.85 0.93 0.95 6 0.93 0.98 0.98 8 0.97 0.95 0.9 0.97 0.98 3 0.87 0.98 0.98 Avg 0.843 0.899 0.94 The probability of recall for the experiments of 0 trails Learn er 3 4 5 6 \Day 0.75 0.6 0.75 0.95 0.9 0.8 0.9 0.9 0.95 3 0.95 0.65 0.95 0.9 4 0.65 0.85 0.9 0.85 0.9 0.9 5 0.85 0.95 0.75 0.65 0.85 6 0.8 0.95 7 0.9 0.95 0.75 0.85 0.85 0.95 8 0.85 0.8 0.95 0.95 0.95 9 0.9 0.95 0.85 0.85 0.9 0.9 0 0.85 0.9 0.95 0.95 0.9 0.9 0.85 0.9 0.9 0.9 Avg 0.85 0.86 0.88 0.9 0.93 0.94

Experiments in cognitive learning 0.98 formula () day data3 data7 formula () probability of recall 0.96 The calculated a, a, a3 with 60 trails tests result (Averaged) 0.94 0.9 0.9 0.88 0.86 MSE of day and3 MSE of day 7 r 0.00 0.00 0.003 0.05 0.004 0 0 0.984 0.8 formul a MSE a a a3 0.84 0 0 0 half day 30 0.3 0.4 0 0.4 The forgetting curves of convolution model (average) 40

Experiments in cognitive learning 0.95 The MSE and r with 0 trails tests result Train day formula MSE of all day MSE of train day MSE of predict day r 0.03 0 0.08 0.08 0 0 0.563 0.906 and 0.08 0.0 0.07 0.570, and 3 0.05 0.003 0 0.00,,3 and 4,,3,4 and 5,, 3, 4,5 and 6 0.007 0.003 0.004 0.003 0.00 0.00 0.00 0.003 0.00 0.00 0.003 0.00 0.00 0.00 0.0 0.006 0.04 0.008 0.00 0.00 0.00 0 0 0.93 0.99 0.93 0.99 0.966 0.99 0.98 0.977 0.98 formula () day day day3 day4 day5 day6 formula () probability of recall 0.9 0.85 0.8 0.75 0 5 0 5 0 5 30 35 40 half day 06// 40

Discussion Improving the traditional forgetting curve model. With few memory data, the individual s forgetting curve can be drawn. Providing a certain basis to design better teaching methods. Some factors that affect the phonetic teaching performance can be analyzed.

Thank you for your attention! Any questions?