AN EFFECTIVE METHOD FOR EDUCATION IN ACOUSTICS AND SPEECH SCIENCE Integrating textbooks, computer simulation and physical models

Similar documents
Quarterly Progress and Status Report. VCV-sequencies in a preliminary text-to-speech system for female speech

Consonants: articulation and transcription

Speech Recognition using Acoustic Landmarks and Binary Phonetic Feature Classifiers

Body-Conducted Speech Recognition and its Application to Speech Support System

THE RECOGNITION OF SPEECH BY MACHINE

Speaker Identification by Comparison of Smart Methods. Abstract

Voice conversion through vector quantization

Mandarin Lexical Tone Recognition: The Gating Paradigm

Eli Yamamoto, Satoshi Nakamura, Kiyohiro Shikano. Graduate School of Information Science, Nara Institute of Science & Technology

Audible and visible speech

Class-Discriminative Weighted Distortion Measure for VQ-Based Speaker Identification

Speech Segmentation Using Probabilistic Phonetic Feature Hierarchy and Support Vector Machines

BUILD-IT: Intuitive plant layout mediated by natural interaction

A comparison of spectral smoothing methods for segment concatenation based speech synthesis

Design Of An Automatic Speaker Recognition System Using MFCC, Vector Quantization And LBG Algorithm

PRODUCT COMPLEXITY: A NEW MODELLING COURSE IN THE INDUSTRIAL DESIGN PROGRAM AT THE UNIVERSITY OF TWENTE

AUTOMATIC DETECTION OF PROLONGED FRICATIVE PHONEMES WITH THE HIDDEN MARKOV MODELS APPROACH 1. INTRODUCTION

Phonetics. The Sound of Language

AGENDA LEARNING THEORIES LEARNING THEORIES. Advanced Learning Theories 2/22/2016

age, Speech and Hearii

Lip reading: Japanese vowel recognition by tracking temporal changes of lip shape

Speaker recognition using universal background model on YOHO database

Unvoiced Landmark Detection for Segment-based Mandarin Continuous Speech Recognition

Speech Recognition at ICSI: Broadcast News and beyond

A Neural Network GUI Tested on Text-To-Phoneme Mapping

Speech Emotion Recognition Using Support Vector Machine

Human Emotion Recognition From Speech

International Journal of Computational Intelligence and Informatics, Vol. 1 : No. 4, January - March 2012

Rachel E. Baker, Ann R. Bradlow. Northwestern University, Evanston, IL, USA

Parallel Evaluation in Stratal OT * Adam Baker University of Arizona

Student Perceptions of Reflective Learning Activities

Proceedings of Meetings on Acoustics

Quarterly Progress and Status Report. Voiced-voiceless distinction in alaryngeal speech - acoustic and articula

Multidisciplinary Engineering Systems 2 nd and 3rd Year College-Wide Courses

DEVELOPMENT OF LINGUAL MOTOR CONTROL IN CHILDREN AND ADOLESCENTS

Noise-Adaptive Perceptual Weighting in the AMR-WB Encoder for Increased Speech Loudness in Adverse Far-End Noise Conditions

Analysis of Emotion Recognition System through Speech Signal Using KNN & GMM Classifier

CLASSIFICATION OF PROGRAM Critical Elements Analysis 1. High Priority Items Phonemic Awareness Instruction

Electromagnetic Spectrum Webquest Answer Key

Klaus Zuberbühler c) School of Psychology, University of St. Andrews, St. Andrews, Fife KY16 9JU, Scotland, United Kingdom

A Cross-language Corpus for Studying the Phonetics and Phonology of Prominence

Robust Speech Recognition using DNN-HMM Acoustic Model Combining Noise-aware training with Spectral Subtraction

Linguistics. The School of Humanities

On the Combined Behavior of Autonomous Resource Management Agents

Commanding Officer Decision Superiority: The Role of Technology and the Decision Maker

Circuit Simulators: A Revolutionary E-Learning Platform

INPE São José dos Campos

DEPARTMENT OF JAPANESE LANGUAGE AND STUDIES

White Paper. The Art of Learning

Stages of Literacy Ros Lugg

COMPUTER INTERFACES FOR TEACHING THE NINTENDO GENERATION

On Developing Acoustic Models Using HTK. M.A. Spaans BSc.

Perceptual scaling of voice identity: common dimensions for different vowels and speakers

Constructing a support system for self-learning playing the piano at the beginning stage

Strategy for teaching communication skills in dentistry

have to be modeled) or isolated words. Output of the system is a grapheme-tophoneme conversion system which takes as its input the spelling of words,

A Minimalist Approach to Code-Switching. In the field of linguistics, the topic of bilingualism is a broad one. There are many

South Carolina English Language Arts

The IDN Variant Issues Project: A Study of Issues Related to the Delegation of IDN Variant TLDs. 20 April 2011

A 3D SIMULATION GAME TO PRESENT CURTAIN WALL SYSTEMS IN ARCHITECTURAL EDUCATION

Fix Your Vowels: Computer-assisted training by Dutch learners of Spanish

The Perception of Nasalized Vowels in American English: An Investigation of On-line Use of Vowel Nasalization in Lexical Access

This Performance Standards include four major components. They are

Developing True/False Test Sheet Generating System with Diagnosing Basic Cognitive Ability

Speaker Recognition. Speaker Diarization and Identification

Mastering Team Skills and Interpersonal Communication. Copyright 2012 Pearson Education, Inc. publishing as Prentice Hall.

Segregation of Unvoiced Speech from Nonspeech Interference

Running Head: STUDENT CENTRIC INTEGRATED TECHNOLOGY

5.1 Sound & Light Unit Overview

Learning Methods for Fuzzy Systems

HOLMER GREEN SENIOR SCHOOL CURRICULUM INFORMATION

Math 098 Intermediate Algebra Spring 2018

Phonological and Phonetic Representations: The Case of Neutralization

A GENERIC SPLIT PROCESS MODEL FOR ASSET MANAGEMENT DECISION-MAKING

Edinburgh Research Explorer

9 Sound recordings: acoustic and articulatory data

Learning Methods in Multilingual Speech Recognition

SOUND STRUCTURE REPRESENTATION, REPAIR AND WELL-FORMEDNESS: GRAMMAR IN SPOKEN LANGUAGE PRODUCTION. Adam B. Buchwald

Abstractions and the Brain

PROCESS USE CASES: USE CASES IDENTIFICATION

Institutional repository policies: best practices for encouraging self-archiving

On the Formation of Phoneme Categories in DNN Acoustic Models

Mathematics subject curriculum

Cambridgeshire Community Services NHS Trust: delivering excellence in children and young people s health services

Specification and Evaluation of Machine Translation Toy Systems - Criteria for laboratory assignments

A Case-Based Approach To Imitation Learning in Robotic Agents

Master s Programme in Computer, Communication and Information Sciences, Study guide , ELEC Majors

Modern TTS systems. CS 294-5: Statistical Natural Language Processing. Types of Modern Synthesis. TTS Architecture. Text Normalization

Prince2 Foundation and Practitioner Training Exam Preparation

AGS THE GREAT REVIEW GAME FOR PRE-ALGEBRA (CD) CORRELATED TO CALIFORNIA CONTENT STANDARDS

The Impact of the Multi-sensory Program Alfabeto on the Development of Literacy Skills of Third Stage Pre-school Children

Perceptual Auditory Aftereffects on Voice Identity Using Brief Vowel Stimuli

Modeling function word errors in DNN-HMM based LVCSR systems

Innovating Toward a Vibrant Learning Ecosystem:

SOFTWARE EVALUATION TOOL

University of Groningen. Systemen, planning, netwerken Bosman, Aart

The Acquisition of English Intonation by Native Greek Speakers

Designing and adapting tasks in lesson planning: a critical process of Lesson Study

SEGMENTAL FEATURES IN SPONTANEOUS AND READ-ALOUD FINNISH

Copyright Corwin 2015

Transcription:

AN EFFECTIVE METHOD FOR EDUCATION IN ACOUSTICS AND SPEECH SCIENCE Integrating textbooks, computer simulation and physical models PACS: 43.10.Sv Arai, Takayuki Dept. of Electrical and Electronics Eng., Sophia University 7-1 Koi-cho, Chiyoda-ku Tokyo, 102-8554 Japan Tel: +81-3-3238-3411 Fax: +81-3-3238-3321 E-mail: arai@sophia.ac.jp ABSTRACT We proposed an effective method for education in acoustics that integrates three educational tools: textbooks, computer simulation and physical models. Our focus was the mechanism of vowel production in the speech and hearing sciences. We implemented a computer model by approximating a vowel as a plane wave propagating inside an acoustic tube, the diameters of which vary successively. In addition to the simulation, we made several physical models of the human vocal tract, composed of transparent acrylic materials, to give students an intuitive understanding of vowel production. As a result, we confirmed that the integration of computer simulations, textbook explanations and physical models was extremely powerful, especially for students with less technical backgrounds.

INTRODUCTION Fields related to speech communication intersect in crucial ways with acoustics. Fig. 1 shows a simplified projection of this relationship. In this figure, imagine a speaker on the left and a listener on the right. The center of the figure pictures acoustics as the bridge between speech production and perception. Below that, the field of Speech and Hearing Sciences spans speech production and perception, and may be seen as an application of those areas of research. Speech Pathology, also spanning these sections, is based crucially on those fields, with acoustics being an important contributor. Parallel to these and above them, are the Linguistic fields of Phonetics and Phonology. Phonetics comprises three important subfields: Articulatory Phonetics, Acoustic Phonetics and Auditory Phonetics, the latter being related to Psycho-acoustics, which is the term used in the figure. Speech technology, including automatic speech recognition, speech synthesis and speech coding, is overlaid as an application of the fields of Phonetics and Phonology. That acoustics is related to so many fields explains the variety of backgrounds found in acoustics student populations. At Sophia University, the author is teaching acoustics not only to technical students but also to students majoring in fields such as Linguistics, Psychology, and Speech Pathology. We believe that an education in acoustics is important not only for college-level students, but also for high-school or potentially even elementary-school students. Therefore, we are motivated to develop intuitive and effective methods for educating students of different ages and from varied backgrounds. In this paper, we propose an integration of certain educational tools which enables students to grasp concepts in acoustics more intuitively. Our model incorporates the use of textbooks, computer simulation, and physical models. Fig. 1.- Speech and Acoustics

TEXTBOOKS Textbooks are an excellent tool for presenting a subject systematically. But in speech science, they generally contain a large dose of mathematical and technical information which, though necessary for detail, might pose a barrier for beginners or those lacking a technical background. For these students, more intuitive textbooks are needed. An ideal textbook for such multivariate readers would: - rely heavily on figures and descriptions, not only equations and formulas, - use more examples and colloquia for describing phenomena, and - repeat explanations of a single topic from different angles. Although such textbooks exist, more specialized ones are needed to cover Acoustics and related fields. COMPUTER SIMULATION With the widespread use of computers, computer-based educational tools are increasingly available. One of the advantages of this is our ability to show a complex phenomenon virtually, by simulation. Additionally, in multimedia environments, we can record, playback, and analyze sounds. Another strength of computer-based learning is that it can address different styles of learning, as students are able to access the system interactively and at their own pace. We are seeing more papers and demonstrations on the topic of education in acoustics, and several attempts have been made to address those aspects of acoustics education (e.g., Eurospeech [1]). One such attempt is an electronic tool for education [2]. This tool contains topics relating to speech production and perception, as well as basic speech science, including: Fig. 2.- Vocal-tract simulator

- how spectrograms are constructed, - how the source and filter act in a linear speech-production model, and - how vowels sound differently on the F1-F2 formant plane. Another useful electronic tool is the simulation portrayed in Fig. 2, where users can hear vowel sounds and see the spectra and the location on the F1-F2 plane simultaneously by changing the configuration of a vocal tract in real time. In this simulation, vowels are produced which correspond to the area functions of the vocal tract. Users can experiment with visual input (the shape of the vocal tract) and acoustic output (vowel sounds). Thus, computer simulations have huge potential, and further development is anticipated. However, it should be pointed out that these same attributes which provide the most clear advantages for learners are also a source of difficulty. Depending on the nature of the computer model, physical constraints existing in the real world may be obscured in a computer simulation, where the boundary between virtual and real is not clear. To address this problem, we have designed physical models, respectful of real-world physical constraints, as described in the following section. PHYSICAL MODELS AS AN EDUCATION TOOL Acoustics is a naturally intuitive science. We can both produce and perceive sounds. We have found that education in acoustics is more effective when students have access to tools that produce the sounds they are studying. Nevertheless, although tools for basic acoustic phenomena such as vibrating tuning forks and resonance are widely used, there are fewer physical tools for speech related areas. Because of this, we believe such tools should be made more widely available in the field of Speech Science. Mechanical models of human speech organs have been reported in the past for various purposes. In the 18th century, Krazenstein and Von Kempelen proposed mechanical models for vowel and consonant production [3]. In the 20th century, Chiba and Kajiyama (1941) confirmed that vowel quality is determined by the configuration of the vocal tract, and they used mechanical models to support their findings [4]. Later, several more models were reported. For example, Umeda and Teranishi (1966) designed mechanical models to investigate vowel and voice quality [5]. Recently, Dang and Honda (1995) used a physical model to illustrate the effects of the pyriform fossa, a side branch at the larynx on vowel spectra [6]. Mochida et al. (1999) made a mechanical model to test their method for measuring the configuration of the human vocal tract using acoustic signals [7]. There exists relatively sparse literature on models developed specifically for education, and only some models reported on have been on exhibition [8],[9]. To address this scarcity, we designed mechanical models of the human vocal tract to be used in speech science classrooms [10],[11]. The models give students an intuitive understanding of vowel production, particularly its linearity, and are intended to compliment (not replace) the technical explanations found in those excellent textbooks available for speech education. Our models are based on Chiba and Kajiyama s mechanical models [4]. In their section Artificial Vowels [4], they confirm that the mechanically produced sounds do have many of the same characteristics as naturally produced vowels.

Fig. 3.- Two types of mechanical models of the human vocal-tract (from Arai, 2001 [10]) Fig. 3 shows two types of mechanical models of the human vocal tract: the plate model (on the left) and the cylinder model (on the right). The two models are made of acrylic resin because it is both transparent and easy to sculpt. For the plate model, each plate has a hole in the center so that when placed side-by-side the holes form an acoustic tube, the cross-sectional area of which changes in a step-wise fashion. For the cylinder model, the cavity forms a round bottle-shape, based on the measurements by Chiba and Kajiyama [4]. When the sound source is connected to one end of either of the models, a vowel-like sound is emitted from the other end. We confirmed that our models, when used in a classroom environment, are particularly effective for increasing student understanding of the theories of speech production. First, because of the tube s transparency, the location of the constriction is visible to the naked eye, as is the overall shape of the cavity. This design helped observers associate the quality of a vowel with the location of constriction on the model. Second, the relationship between frequency and pitch was illustrated by channeling sound sources with different frequencies through the tube. Students were able to observe that the pitch of the out coming sound is determined by the fundamental frequency of the input signal. Third, by changing the order of the plates to simulate constrictions at nodes and antinodes, students were able to hear the effects of formants shifting position. Additionally, we provided spectral analyses of the output sounds, so students were able to see how the frequencies of the formants changed, as well. Being able to hear and see the effects of formant shift helped learners understand how vowels change depending on the location of constriction(s) in the vocal tract. Fourth, measurements taken from the models are reproducible, so students can go back to an arbitrary measurement and get the same result, which helps them to test their hypotheses as they learn these concepts. Fifth, using the models along with computer simulation software makes it possible to compare a measured spectrum with one derived from theoretical computation, something useful for advanced students.

SUMMARY An effective method for education in acoustics was proposed which integrates the use of textbooks, computer simulation and physical models. Our proposal has potential beyond the fields of acoustics and speech science, in that it is applicable to any field relating to education. We should, of course, continue to expend effort developing each of these three educational tools in their own right, but at the same time, it is our feeling that more resources should be devoted to the organic melding together of these three, for a future of increasingly sophisticated and effective educational methodologies. ACKNOWLEDGMENTS I would like to thank all of the people who have provided opportunities for me to consider this topic in education, and from whose comments and discussions I have benefited, especially Prof. Tsutomu Sugawara, Prof. Kyoko Iitaka, Prof. Mitsuko Shindo, Michiko Yoshida, Nobuyuki Usuki, Setsuko Imatomi, Hirokazu Sato, Naoki Ishii, and Terri Lander. BIBLIOGRAPHICAL REFERENCES [1] http://eurospeech2001.org/ese/education_areana/ programme.html [2] Sensimetrics: Speech Production and Perception I (http://www.sens.com/spp1.htm). [3] B. Gold and N. Morgan, Speech and Audio Signal Processing, John Wiley & Sons, 2000. [4] T. Chiba and M. Kajiyama, The Vowel: Its Nature and Structure, Tokyo-Kaiseikan Pub. Co., Ltd., Tokyo, 1941. [5] N. Umeda and R. Teranishi, Phonemic feature and vocal feature: Synthesis of speech sounds, using an acoustical model of vocal tract, J. Acoust. Soc. Jpn., Vol. 22, No. 4, pp. 195-203, 1966. [6] J. Dang and K. Honda, Acoustic effects of the pyriform fossa on vowel spectra, Technical Report of IEICE, Vol. SP95-10, pp. 1-6, 1995 (in Japanese). [7] T. Mochida et al., Acoustical measurement of vocal tract area function using replicas of oral cavity, Meeting of the Acoust. Soc. Jpn., Vol. 1, pp. 307-308, Sep.-Oct. 1999 (in Japanese). [8] http://www.exploratorium.edu/exhibit_services/exhibits/ [9] http://www.kagakukan.city.hamamatsu.shizuoka.jp/tenji/ [10] T. Arai, The replication of Chiba and Kajiyama s mechanical models of the human vocal cavity, J. Phonetic Soc., Vol. 5, No. 2, pp. 31-38, 2001. [11] T. Arai et al., Prototype of a vocal-tract model for vowel production designed for education in speech science, Proc. of Eurospeech, Vol. 4, pp. 2791-2794, 2001.