Soft-computing Methods for Text-to-Speech Driven Avatars
|
|
- Randolf Casey
- 6 years ago
- Views:
Transcription
1 Soft-computing Methods for Text-to-Speech Driven Avatars MARIO MALCANGI DICo Dipartimento di Informatica e Comunicazione Università degli Studi di Milano Via Comelico Milano ITALY malcangi@dico.unimi.it Abstract: - This paper presents a new approach for driving avatars with text-to-speech synthesis that uses pure text as an information source. The goal is to move lips and face muscles on the basis of the phonetic nature of the utterance and the related expression. Several methods came together to define this solution. Rule-based text-to-speech synthesis generates phonetic and expression transcription of the text to be uttered by the avatar. Phonetic transcription is used to train two artificial neural networks, one for text-to-phone transcription and the other for phone-to-viseme mapping. Then two fuzzylogic engines were tuned for smoothed control of lip and face movements. Key-Words: - phone-to-viseme conversion, text-to-speech synthesis, artificial neural networks, fuzzy logic 1 Introduction Speech communication can be considered a single medium with a multimodal representation of the information. When a person utters speech, the information communicated to another is not only semantic and syntactic but also emotional, expressive, gestural, and so forth. In lip-synching applications based on direct synchronization of uttered speech with lip and face movements [1], information embedded in speech is often lost because it is too difficult to extract information like emotion or gesture. Only a few general speech parameters, such as amplitude and pitch variability, can be measured and tracked. However, these low-level measurements fall far short of those we need to drive an avatar with the full information content of the uttered speech. This approach leads to very good results for lip synchronization, but greatly impoverished expression can be driven onto the avatar, resulting in very limited naturalness. To overcame this problem, text-based synthetic speech (text-to-speech) can be used instead of natural speech to drive the avatar. Text-to-speech synthesis is currently used to drive avatars' lip movements, but only for text-reading tasks. The avatar's face seems unnatural during utterance because no emotion or gesture information is provided by current text-to-speech systems. Text-to-viseme may be the right approach to control an avatar for natural utterance. The text-to-viseme process can translate text into the appropriate viseme and supplement this basic information with other related information such as emotion or gesture [ 2] [3 ] [4]. Rule-based, text-to-viseme synthesis has been successfully implemented by considering emotion an additional item of information [ 5] and for direct visualspeech synthesis [6]. In these approaches, speech synthesis and face-control synthesis are separate tasks, although in human utterance behavior they belong to an integrated task. Artificial, neural-network-based, text-to- viseme synthesis has been also explored [ 7] [8], demonstrating that greater naturalness can be achieved with a soft-computing rather than a hard-computing approach. Fuzzy logic has proven highly effective in smoothing the action of the logical control rules that move an avatar's face muscles during emotional behavior [9]. This research combines the use of artificial neural networks and fuzzy logic to generate phoneme and viseme information that drives face movements during the utterance of a text, as humans do. Our goal is to use pure text to feed the whole process, as a human does when reading a text. Reading text aloud consists of a complex set of tasks. The lower level of these tasks involves correctly uttering each word in the text according to a set of hidden pronunciation rules. Our research tries to solve the problem of reading the words of a pure text aloud by generating both the speech and the related whole-avatar face motion. 2 Process framework To design the expressive synchronized-speech and facesynthesis system, a two-phase process framework was built. The whole process can be considered a general-purpose model for designing an integrated system of expressive, avatar-based speech communication in human-computer interfaces. The first phase involves training and tuning two artificial neural networks (ANNs) for text-to-phones and for phonesto-viseme synthesis, respectively. Two fuzzy-logic engines are also used to smooth speech and face-muscle control. As shown in Figure 1, a rule-based, text-tophone/expression transcriber trains the ANN-based, text-tophone generator and the ANN-based text-to-viseme generator. Using such a transcriber, only pure ASCII text is used to train the ANNs. Ancillary data for speech and facial ISSN: ISBN:
2 expressiveness is automatically extracted from the text by means of regular-expression-based description rules. The two fuzzy-logic engines are manually tuned using a fuzzy logic developing environment. This enables us to edit the fuzzy rules and membership functions according to expert experience. (The tuning task can be also performed by a genetic algorithm). A formant-based speech synthesizer and a viseme generator comprise the additional components of the test process. The formant-based synthesizer allows full control of all speech parameters, so any modulation of speech can be achieved. The viseme generator allows control of face movements and expression during utterance. 3 Text-to-phone/expression transcription by rules Text-to-phone/expression consists of a series of processing steps applied to the text. The text is first preprocessed to convert non alphabetical elements such as numbers, sequences, abbreviations, and special ASCII symbols into the corresponding expanded text. Punctuation and word boundaries are processed by a set of rules that encodes the expression. Each word in the text is converted into phone/expression streams by a language-specific set of rulesleave two blank lines between successive sections as here. The rules have the following format: C(A)D = B (1) Figure 1. Training and tuning process of the ANNs and the fuzzylogic engines. The second phase consists of testing the speech synthesis in a synchronous execution with face motion, as shown in Figure 2. A is the text transformed into the phonetic/expression B if the text to which it belongs matches A in the sequence CAD. C is a pre-context string and D is a post-context string. To compile the rules, the following classes of elements were defined: (!) (^) ($) (#) ([AEIOUY]+) (:) ([^AEIOUY]*) (+) ([EIY]) (2) ($) ([^AEIOUY]) (.) ([BDGJMNRVWZ]) (^) ([NR]) Figure 2. Testing process for expressive speech synthesis and face-motion control. For each class, a regular expressions has been used for compact encoding of the rules. ISSN: ISBN:
3 4 Artificial neural-network architecture The two ANNs used for text-to-phone/expression transcription and for phone/expression-to-viseme conversion are both three-layer, feed-forward, backpropagation architectures (FFBP-ANN). taking into account the pre-context and post-context of the current input character. Figure 4. Sliding window Figure 3. Architecture of the FFBP- ANN. The first ANN takes text as input and yields phone/expression transcription. This output is the input for the second ANN whose output is viseme encoding. A linear activation function controls the connections at input and hidden layer nodes. A non-linear (sigmoid) activation function connects hidden-layer nodes to outputlayer. The non-linear activation function is: s i = 1 1+e I i I i = j w ij s j where: s i is the output of the i-th unit E i is the total input w ij is the weight from the j-th to i-th unit The first ANN's input is a text window of nine consecutive characters. This window slides from right to left. Current output encodes the phone and the expression that corresponds to the middle character in the input-layer string, The text-to-phone/expression transcription system is used to train the ANN for text-to-phone/expression transcription. This generates the ANN input-output training patterns for a large variety of texts, so the ANN learns how to read an unknown text with expression. Training the second ANN proceeds in similar fashion but it is conducted only after the first ANN has been fully trained. The first ANN's output is used as input for the second ANN, employing the same sliding-window strategy. A basic viseme set is used as reference for ANN training during the error back-propagation process. 5 Fuzzy-logic engines for controlling smoothed speech and face movement The two trained ANNs are able to drive the speech synthesizer and the avatar face. However, to give greater naturalness to speech utterance and face movement, a smoothing action needs to be performed on the ANNs' outputs, prior to applying them to the speech synthesizer and the avatar's facethe two trained ANNs are able to drive the speech synthesizer and the avatar face. However, to give greater naturalness to speech utterance and face movement, a smoothing action needs to be performed on the ANNs' outputs, prior to applying them to the speech synthesizer and the avatar's face controller. Two fuzzylogic engines were tuned to accomplish this. The two fuzzy subsystems must convert the ANN-output expression state into control levels for speech dynamics and for face muscles. Crisp information (intensity, level, etc.) about expression was transformed into fuzzy rules. The resulting crisp control level comes from an appropriate defuzzifying process. ISSN: ISBN:
4 The two fuzzy subsystems have identical engine structure and differ only in their settings (knowledge base). They consist of a fuzzifying front end, a rule-based inference engine, and a defuzzifying back end. The first step in the fuzzy-engine tuning process consists of modeling crisp intensity and level information into fuzzy measurements. This is done by modeling seven fuzzy sets: Imperceptibly low Very low Moderately low Medium Moderately high Very high Strongly high The third step consists of defuzzifying the control output grade. To do this, a set of singleton membership functions and a weighted-average calculation (center of gravity) were used to convert control degree into crisp control: Control= (A x B) / A+B Figure 6 illustrates the membership function shapes used to defuzzify the inferred smoothed controls. Triangular and trapezoidal membership functions are used to implement these fuzzy sets. The shape and relations among these are qualitatively reported in Figure 5. Tuning is accomplished by an expert who uses a fuzzy-logic development environment to simulate and evaluate the resulting membership degrees for each crisp input. The second step consists of editing and tuning a set of inference rules such as: IF x AND y THEN z where x and y are membership grades for the intensity and level of speech and facial expression we intend to smooth before they are applied as controls. z is the degree of control to be applied Figure 6. Singleton membership function to defuzzyfy controls. 6 Speech synthesis model The speech synthesizer model we refer emulates the human vocal tract. The purpose of this choice is that unlimited utterances need to be generated. Naturalness in speech production by this speech synthesis model is achieved by means of dynamic control of its processing elements: filters, generators, and modulators. Coarticulation, phonetic articulation-rate, inflection (pitch) are all controllable, in static or dynamic mode. Speech nature (male, female, child, etc.) and alteration (bass, baritone, etc.) can also be controlled. Figure 5. Fuzzy modeling of speech synthesis and facial control inputs. 7 Facial control modeling Speech intensity is used to control two different components of facial modeling: lips and facial modifications during expressive utterance. Lips and facial expression are controlled in terms of mouth opening and strength of expression-control muscles. ISSN: ISBN:
5 The fuzzy, smoothed control produces variable dynamics during the utterance of stationary speech units such as phonemes and allophones. This dynamic control is used to modulate the amplitude of the lip-opening strength, resulting in more natural movement. Expression-control muscles are also dynamically controlled to produce modifications, such as: Facial muscles stretching/relaxing Eyebrows frowning Forehead wrinkling Nostrils extending/contracting driven from auditory speech, in Proceeding of AVSP 99, [7] G. Zoric, I. S. Pandzic, Real-time language independent lip synchronization method using a genetic algorithm, Signal Processing, Volume 86, Issue 12, December 2006, pp , [8] D. W. Massaro, J. Beskow, M. M. Cohen, C. L. Fry, T. Rodriguez, Picture my voice: Audio to visual speech synthesis using artificial neural networks, Proceedings of AVSP 99, Santa Cruz, California, Conclusion Preliminary results of this research demonstrate that soft computing offers a good solution for the smoothed control of avatars during the expressive utterance of text. Using pure text as input information, correct expressive utterance of each word (letter sequence) was achieved. Furthermore, the related expressive avatar face movements were synchronized. The next step will apply a similar approach to the automatic extraction of high-level expression information related to word sequenze. References: [1] M. Malcangi, R. de Tintis, Audio based real-time speech animation of embodied conversational agents, in A. Camurri, G. Volpe (Eds.): Gesture-Based Communication in Human-Computer Interaction, selected revised papers of The 5 th Intrnational Workshop on Gesture and Sign Language based Human-Computer Interaction, GW 2003, Lecture Notes in Artificial Intelligence LNAI 2915 (Subseries of Lecture Notes in Computer Science), Springer-Verlag, Berlin Eidelberg, [2] T. Masuko, T. Kobayashi, M. Tamura, J. Masubuchi, K. Tokuda, Text-to-visual speech synthesis based on parameter generation from HMM, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, Volume 6, Issue, May 1998 Page(s): vol.6, [3] W. Gao, L. Xu, B. Yin, Y. Liu, Y. Song, J, Yan, J. Yan, J. Zhou, H. Chen, A text-driven sign language synthesis system, Proceedings of CAD & Graphics 97, December 2-5, 1997, Shenzhem, China. [4] M. A. Zliekha, S. Al-Moubayed, O. Al-Dakkak, N. Ghneim, Emotional audio visual arabic text to speech, in Proceedings of Eusipco 2006, [5] J. Beskow, Rule-based visual speech synthesis, ESCA, Eurospeech 95, Madrid, September, [6] E. Agelfor, J. Beskow, B. Granstrom, M- Lundeberg, G. Salvi, K. Spens, T. Ohman, Synthetic visulal speech ISSN: ISBN:
Learning Methods for Fuzzy Systems
Learning Methods for Fuzzy Systems Rudolf Kruse and Andreas Nürnberger Department of Computer Science, University of Magdeburg Universitätsplatz, D-396 Magdeburg, Germany Phone : +49.39.67.876, Fax : +49.39.67.8
More informationEli Yamamoto, Satoshi Nakamura, Kiyohiro Shikano. Graduate School of Information Science, Nara Institute of Science & Technology
ISCA Archive SUBJECTIVE EVALUATION FOR HMM-BASED SPEECH-TO-LIP MOVEMENT SYNTHESIS Eli Yamamoto, Satoshi Nakamura, Kiyohiro Shikano Graduate School of Information Science, Nara Institute of Science & Technology
More informationA Neural Network GUI Tested on Text-To-Phoneme Mapping
A Neural Network GUI Tested on Text-To-Phoneme Mapping MAARTEN TROMPPER Universiteit Utrecht m.f.a.trompper@students.uu.nl Abstract Text-to-phoneme (T2P) mapping is a necessary step in any speech synthesis
More informationEvolutive Neural Net Fuzzy Filtering: Basic Description
Journal of Intelligent Learning Systems and Applications, 2010, 2: 12-18 doi:10.4236/jilsa.2010.21002 Published Online February 2010 (http://www.scirp.org/journal/jilsa) Evolutive Neural Net Fuzzy Filtering:
More informationhave to be modeled) or isolated words. Output of the system is a grapheme-tophoneme conversion system which takes as its input the spelling of words,
A Language-Independent, Data-Oriented Architecture for Grapheme-to-Phoneme Conversion Walter Daelemans and Antal van den Bosch Proceedings ESCA-IEEE speech synthesis conference, New York, September 1994
More informationAnalysis of Emotion Recognition System through Speech Signal Using KNN & GMM Classifier
IOSR Journal of Electronics and Communication Engineering (IOSR-JECE) e-issn: 2278-2834,p- ISSN: 2278-8735.Volume 10, Issue 2, Ver.1 (Mar - Apr.2015), PP 55-61 www.iosrjournals.org Analysis of Emotion
More informationAUTOMATIC DETECTION OF PROLONGED FRICATIVE PHONEMES WITH THE HIDDEN MARKOV MODELS APPROACH 1. INTRODUCTION
JOURNAL OF MEDICAL INFORMATICS & TECHNOLOGIES Vol. 11/2007, ISSN 1642-6037 Marek WIŚNIEWSKI *, Wiesława KUNISZYK-JÓŹKOWIAK *, Elżbieta SMOŁKA *, Waldemar SUSZYŃSKI * HMM, recognition, speech, disorders
More informationSpeaker Identification by Comparison of Smart Methods. Abstract
Journal of mathematics and computer science 10 (2014), 61-71 Speaker Identification by Comparison of Smart Methods Ali Mahdavi Meimand Amin Asadi Majid Mohamadi Department of Electrical Department of Computer
More informationEvolution of Symbolisation in Chimpanzees and Neural Nets
Evolution of Symbolisation in Chimpanzees and Neural Nets Angelo Cangelosi Centre for Neural and Adaptive Systems University of Plymouth (UK) a.cangelosi@plymouth.ac.uk Introduction Animal communication
More informationHuman Emotion Recognition From Speech
RESEARCH ARTICLE OPEN ACCESS Human Emotion Recognition From Speech Miss. Aparna P. Wanare*, Prof. Shankar N. Dandare *(Department of Electronics & Telecommunication Engineering, Sant Gadge Baba Amravati
More informationSpeech Emotion Recognition Using Support Vector Machine
Speech Emotion Recognition Using Support Vector Machine Yixiong Pan, Peipei Shen and Liping Shen Department of Computer Technology Shanghai JiaoTong University, Shanghai, China panyixiong@sjtu.edu.cn,
More informationA student diagnosing and evaluation system for laboratory-based academic exercises
A student diagnosing and evaluation system for laboratory-based academic exercises Maria Samarakou, Emmanouil Fylladitakis and Pantelis Prentakis Technological Educational Institute (T.E.I.) of Athens
More informationFUZZY EXPERT. Dr. Kasim M. Al-Aubidy. Philadelphia University. Computer Eng. Dept February 2002 University of Damascus-Syria
FUZZY EXPERT SYSTEMS 16-18 18 February 2002 University of Damascus-Syria Dr. Kasim M. Al-Aubidy Computer Eng. Dept. Philadelphia University What is Expert Systems? ES are computer programs that emulate
More informationDesign Of An Automatic Speaker Recognition System Using MFCC, Vector Quantization And LBG Algorithm
Design Of An Automatic Speaker Recognition System Using MFCC, Vector Quantization And LBG Algorithm Prof. Ch.Srinivasa Kumar Prof. and Head of department. Electronics and communication Nalanda Institute
More informationAnalysis of Speech Recognition Models for Real Time Captioning and Post Lecture Transcription
Analysis of Speech Recognition Models for Real Time Captioning and Post Lecture Transcription Wilny Wilson.P M.Tech Computer Science Student Thejus Engineering College Thrissur, India. Sindhu.S Computer
More informationA study of speaker adaptation for DNN-based speech synthesis
A study of speaker adaptation for DNN-based speech synthesis Zhizheng Wu, Pawel Swietojanski, Christophe Veaux, Steve Renals, Simon King The Centre for Speech Technology Research (CSTR) University of Edinburgh,
More informationQuickStroke: An Incremental On-line Chinese Handwriting Recognition System
QuickStroke: An Incremental On-line Chinese Handwriting Recognition System Nada P. Matić John C. Platt Λ Tony Wang y Synaptics, Inc. 2381 Bering Drive San Jose, CA 95131, USA Abstract This paper presents
More informationThe NICT/ATR speech synthesis system for the Blizzard Challenge 2008
The NICT/ATR speech synthesis system for the Blizzard Challenge 2008 Ranniery Maia 1,2, Jinfu Ni 1,2, Shinsuke Sakai 1,2, Tomoki Toda 1,3, Keiichi Tokuda 1,4 Tohru Shimizu 1,2, Satoshi Nakamura 1,2 1 National
More informationBODY LANGUAGE ANIMATION SYNTHESIS FROM PROSODY AN HONORS THESIS SUBMITTED TO THE DEPARTMENT OF COMPUTER SCIENCE OF STANFORD UNIVERSITY
BODY LANGUAGE ANIMATION SYNTHESIS FROM PROSODY AN HONORS THESIS SUBMITTED TO THE DEPARTMENT OF COMPUTER SCIENCE OF STANFORD UNIVERSITY Sergey Levine Principal Adviser: Vladlen Koltun Secondary Adviser:
More informationUNIDIRECTIONAL LONG SHORT-TERM MEMORY RECURRENT NEURAL NETWORK WITH RECURRENT OUTPUT LAYER FOR LOW-LATENCY SPEECH SYNTHESIS. Heiga Zen, Haşim Sak
UNIDIRECTIONAL LONG SHORT-TERM MEMORY RECURRENT NEURAL NETWORK WITH RECURRENT OUTPUT LAYER FOR LOW-LATENCY SPEECH SYNTHESIS Heiga Zen, Haşim Sak Google fheigazen,hasimg@google.com ABSTRACT Long short-term
More informationREVIEW OF CONNECTED SPEECH
Language Learning & Technology http://llt.msu.edu/vol8num1/review2/ January 2004, Volume 8, Number 1 pp. 24-28 REVIEW OF CONNECTED SPEECH Title Connected Speech (North American English), 2000 Platform
More informationOn the Formation of Phoneme Categories in DNN Acoustic Models
On the Formation of Phoneme Categories in DNN Acoustic Models Tasha Nagamine Department of Electrical Engineering, Columbia University T. Nagamine Motivation Large performance gap between humans and state-
More informationLearning Methods in Multilingual Speech Recognition
Learning Methods in Multilingual Speech Recognition Hui Lin Department of Electrical Engineering University of Washington Seattle, WA 98125 linhui@u.washington.edu Li Deng, Jasha Droppo, Dong Yu, and Alex
More informationQuarterly Progress and Status Report. VCV-sequencies in a preliminary text-to-speech system for female speech
Dept. for Speech, Music and Hearing Quarterly Progress and Status Report VCV-sequencies in a preliminary text-to-speech system for female speech Karlsson, I. and Neovius, L. journal: STL-QPSR volume: 35
More informationSARDNET: A Self-Organizing Feature Map for Sequences
SARDNET: A Self-Organizing Feature Map for Sequences Daniel L. James and Risto Miikkulainen Department of Computer Sciences The University of Texas at Austin Austin, TX 78712 dljames,risto~cs.utexas.edu
More informationAQUA: An Ontology-Driven Question Answering System
AQUA: An Ontology-Driven Question Answering System Maria Vargas-Vera, Enrico Motta and John Domingue Knowledge Media Institute (KMI) The Open University, Walton Hall, Milton Keynes, MK7 6AA, United Kingdom.
More informationKnowledge-Based - Systems
Knowledge-Based - Systems ; Rajendra Arvind Akerkar Chairman, Technomathematics Research Foundation and Senior Researcher, Western Norway Research institute Priti Srinivas Sajja Sardar Patel University
More informationAutoregressive product of multi-frame predictions can improve the accuracy of hybrid models
Autoregressive product of multi-frame predictions can improve the accuracy of hybrid models Navdeep Jaitly 1, Vincent Vanhoucke 2, Geoffrey Hinton 1,2 1 University of Toronto 2 Google Inc. ndjaitly@cs.toronto.edu,
More informationCourse Outline. Course Grading. Where to go for help. Academic Integrity. EE-589 Introduction to Neural Networks NN 1 EE
EE-589 Introduction to Neural Assistant Prof. Dr. Turgay IBRIKCI Room # 305 (322) 338 6868 / 139 Wensdays 9:00-12:00 Course Outline The course is divided in two parts: theory and practice. 1. Theory covers
More informationModule 12. Machine Learning. Version 2 CSE IIT, Kharagpur
Module 12 Machine Learning 12.1 Instructional Objective The students should understand the concept of learning systems Students should learn about different aspects of a learning system Students should
More informationEmotional Variation in Speech-Based Natural Language Generation
Emotional Variation in Speech-Based Natural Language Generation Michael Fleischman and Eduard Hovy USC Information Science Institute 4676 Admiralty Way Marina del Rey, CA 90292-6695 U.S.A.{fleisch, hovy}
More informationSpecification and Evaluation of Machine Translation Toy Systems - Criteria for laboratory assignments
Specification and Evaluation of Machine Translation Toy Systems - Criteria for laboratory assignments Cristina Vertan, Walther v. Hahn University of Hamburg, Natural Language Systems Division Hamburg,
More informationWord Segmentation of Off-line Handwritten Documents
Word Segmentation of Off-line Handwritten Documents Chen Huang and Sargur N. Srihari {chuang5, srihari}@cedar.buffalo.edu Center of Excellence for Document Analysis and Recognition (CEDAR), Department
More informationAn Interactive Intelligent Language Tutor Over The Internet
An Interactive Intelligent Language Tutor Over The Internet Trude Heift Linguistics Department and Language Learning Centre Simon Fraser University, B.C. Canada V5A1S6 E-mail: heift@sfu.ca Abstract: This
More informationSystem Implementation for SemEval-2017 Task 4 Subtask A Based on Interpolated Deep Neural Networks
System Implementation for SemEval-2017 Task 4 Subtask A Based on Interpolated Deep Neural Networks 1 Tzu-Hsuan Yang, 2 Tzu-Hsuan Tseng, and 3 Chia-Ping Chen Department of Computer Science and Engineering
More informationSpeech Recognition at ICSI: Broadcast News and beyond
Speech Recognition at ICSI: Broadcast News and beyond Dan Ellis International Computer Science Institute, Berkeley CA Outline 1 2 3 The DARPA Broadcast News task Aspects of ICSI
More informationEnglish Language and Applied Linguistics. Module Descriptions 2017/18
English Language and Applied Linguistics Module Descriptions 2017/18 Level I (i.e. 2 nd Yr.) Modules Please be aware that all modules are subject to availability. If you have any questions about the modules,
More informationA New Perspective on Combining GMM and DNN Frameworks for Speaker Adaptation
A New Perspective on Combining GMM and DNN Frameworks for Speaker Adaptation SLSP-2016 October 11-12 Natalia Tomashenko 1,2,3 natalia.tomashenko@univ-lemans.fr Yuri Khokhlov 3 khokhlov@speechpro.com Yannick
More informationPh.D in Advance Machine Learning (computer science) PhD submitted, degree to be awarded on convocation, sept B.Tech in Computer science and
Name Qualification Sonia Thomas Ph.D in Advance Machine Learning (computer science) PhD submitted, degree to be awarded on convocation, sept. 2016. M.Tech in Computer science and Engineering. B.Tech in
More informationCambridgeshire Community Services NHS Trust: delivering excellence in children and young people s health services
Normal Language Development Community Paediatric Audiology Cambridgeshire Community Services NHS Trust: delivering excellence in children and young people s health services Language develops unconsciously
More informationRobust Speech Recognition using DNN-HMM Acoustic Model Combining Noise-aware training with Spectral Subtraction
INTERSPEECH 2015 Robust Speech Recognition using DNN-HMM Acoustic Model Combining Noise-aware training with Spectral Subtraction Akihiro Abe, Kazumasa Yamamoto, Seiichi Nakagawa Department of Computer
More informationINPE São José dos Campos
INPE-5479 PRE/1778 MONLINEAR ASPECTS OF DATA INTEGRATION FOR LAND COVER CLASSIFICATION IN A NEDRAL NETWORK ENVIRONNENT Maria Suelena S. Barros Valter Rodrigues INPE São José dos Campos 1993 SECRETARIA
More informationModeling function word errors in DNN-HMM based LVCSR systems
Modeling function word errors in DNN-HMM based LVCSR systems Melvin Jose Johnson Premkumar, Ankur Bapna and Sree Avinash Parchuri Department of Computer Science Department of Electrical Engineering Stanford
More informationClient Psychology and Motivation for Personal Trainers
Client Psychology and Motivation for Personal Trainers Unit 4 Communication and interpersonal skills Lesson 4 Active listening: part 2 Step 1 Lesson aims In this lesson, we will: Define and describe the
More informationAGENDA LEARNING THEORIES LEARNING THEORIES. Advanced Learning Theories 2/22/2016
AGENDA Advanced Learning Theories Alejandra J. Magana, Ph.D. admagana@purdue.edu Introduction to Learning Theories Role of Learning Theories and Frameworks Learning Design Research Design Dual Coding Theory
More informationSeminar - Organic Computing
Seminar - Organic Computing Self-Organisation of OC-Systems Markus Franke 25.01.2006 Typeset by FoilTEX Timetable 1. Overview 2. Characteristics of SO-Systems 3. Concern with Nature 4. Design-Concepts
More informationFlorida Reading Endorsement Alignment Matrix Competency 1
Florida Reading Endorsement Alignment Matrix Competency 1 Reading Endorsement Guiding Principle: Teachers will understand and teach reading as an ongoing strategic process resulting in students comprehending
More informationModeling function word errors in DNN-HMM based LVCSR systems
Modeling function word errors in DNN-HMM based LVCSR systems Melvin Jose Johnson Premkumar, Ankur Bapna and Sree Avinash Parchuri Department of Computer Science Department of Electrical Engineering Stanford
More informationApplying Fuzzy Rule-Based System on FMEA to Assess the Risks on Project-Based Software Engineering Education
Journal of Software Engineering and Applications, 2017, 10, 591-604 http://www.scirp.org/journal/jsea ISSN Online: 1945-3124 ISSN Print: 1945-3116 Applying Fuzzy Rule-Based System on FMEA to Assess the
More informationCross Language Information Retrieval
Cross Language Information Retrieval RAFFAELLA BERNARDI UNIVERSITÀ DEGLI STUDI DI TRENTO P.ZZA VENEZIA, ROOM: 2.05, E-MAIL: BERNARDI@DISI.UNITN.IT Contents 1 Acknowledgment.............................................
More informationPython Machine Learning
Python Machine Learning Unlock deeper insights into machine learning with this vital guide to cuttingedge predictive analytics Sebastian Raschka [ PUBLISHING 1 open source I community experience distilled
More informationSpeech Recognition using Acoustic Landmarks and Binary Phonetic Feature Classifiers
Speech Recognition using Acoustic Landmarks and Binary Phonetic Feature Classifiers October 31, 2003 Amit Juneja Department of Electrical and Computer Engineering University of Maryland, College Park,
More informationAbstractions and the Brain
Abstractions and the Brain Brian D. Josephson Department of Physics, University of Cambridge Cavendish Lab. Madingley Road Cambridge, UK. CB3 OHE bdj10@cam.ac.uk http://www.tcm.phy.cam.ac.uk/~bdj10 ABSTRACT
More informationArtificial Neural Networks written examination
1 (8) Institutionen för informationsteknologi Olle Gällmo Universitetsadjunkt Adress: Lägerhyddsvägen 2 Box 337 751 05 Uppsala Artificial Neural Networks written examination Monday, May 15, 2006 9 00-14
More informationSINGLE DOCUMENT AUTOMATIC TEXT SUMMARIZATION USING TERM FREQUENCY-INVERSE DOCUMENT FREQUENCY (TF-IDF)
SINGLE DOCUMENT AUTOMATIC TEXT SUMMARIZATION USING TERM FREQUENCY-INVERSE DOCUMENT FREQUENCY (TF-IDF) Hans Christian 1 ; Mikhael Pramodana Agus 2 ; Derwin Suhartono 3 1,2,3 Computer Science Department,
More informationCalibration of Confidence Measures in Speech Recognition
Submitted to IEEE Trans on Audio, Speech, and Language, July 2010 1 Calibration of Confidence Measures in Speech Recognition Dong Yu, Senior Member, IEEE, Jinyu Li, Member, IEEE, Li Deng, Fellow, IEEE
More informationPhonological Processing for Urdu Text to Speech System
Phonological Processing for Urdu Text to Speech System Sarmad Hussain Center for Research in Urdu Language Processing, National University of Computer and Emerging Sciences, B Block, Faisal Town, Lahore,
More informationIntroduction and survey
INTELLIGENT USER INTERFACES Introduction and survey (Draft version!) Ehlert, Patrick Research Report DKS03-01 / ICE 01 Version 0.91, February 2003 Mediamatics / Data and Knowledge Systems group Department
More informationThree Different Modes of Avatars as Virtual Lecturers in E-learning Interfaces: A Comparative Usability Study
8 The Open Virtual Reality Journal, 2010, 2, 8-17 Open Access Three Different Modes of Avatars as Virtual Lecturers in E-learning Interfaces: A Comparative Usability Study Marwan Alseid* and Dimitrios
More informationVoice conversion through vector quantization
J. Acoust. Soc. Jpn.(E)11, 2 (1990) Voice conversion through vector quantization Masanobu Abe, Satoshi Nakamura, Kiyohiro Shikano, and Hisao Kuwabara A TR Interpreting Telephony Research Laboratories,
More informationIEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 17, NO. 3, MARCH
IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 17, NO. 3, MARCH 2009 423 Adaptive Multimodal Fusion by Uncertainty Compensation With Application to Audiovisual Speech Recognition George
More informationFunctional Mark-up for Behaviour Planning: Theory and Practice
Functional Mark-up for Behaviour Planning: Theory and Practice 1. Introduction Brigitte Krenn +±, Gregor Sieber + + Austrian Research Institute for Artificial Intelligence Freyung 6, 1010 Vienna, Austria
More informationIntra-talker Variation: Audience Design Factors Affecting Lexical Selections
Tyler Perrachione LING 451-0 Proseminar in Sound Structure Prof. A. Bradlow 17 March 2006 Intra-talker Variation: Audience Design Factors Affecting Lexical Selections Abstract Although the acoustic and
More informationAnnotation and Taxonomy of Gestures in Lecture Videos
Annotation and Taxonomy of Gestures in Lecture Videos John R. Zhang Kuangye Guo Cipta Herwana John R. Kender Columbia University New York, NY 10027, USA {jrzhang@cs., kg2372@, cjh2148@, jrk@cs.}columbia.edu
More informationACCOMMODATIONS FOR STUDENTS WITH DISABILITIES
0/9/204 205 ACCOMMODATIONS FOR STUDENTS WITH DISABILITIES TEA Student Assessment Division September 24, 204 TETN 485 DISCLAIMER These slides have been prepared and approved by the Student Assessment Division
More informationSTUDIES WITH FABRICATED SWITCHBOARD DATA: EXPLORING SOURCES OF MODEL-DATA MISMATCH
STUDIES WITH FABRICATED SWITCHBOARD DATA: EXPLORING SOURCES OF MODEL-DATA MISMATCH Don McAllaster, Larry Gillick, Francesco Scattone, Mike Newman Dragon Systems, Inc. 320 Nevada Street Newton, MA 02160
More informationConversation Starters: Using Spatial Context to Initiate Dialogue in First Person Perspective Games
Conversation Starters: Using Spatial Context to Initiate Dialogue in First Person Perspective Games David B. Christian, Mark O. Riedl and R. Michael Young Liquid Narrative Group Computer Science Department
More informationSemi-Supervised GMM and DNN Acoustic Model Training with Multi-system Combination and Confidence Re-calibration
INTERSPEECH 2013 Semi-Supervised GMM and DNN Acoustic Model Training with Multi-system Combination and Confidence Re-calibration Yan Huang, Dong Yu, Yifan Gong, and Chaojun Liu Microsoft Corporation, One
More informationVisual CP Representation of Knowledge
Visual CP Representation of Knowledge Heather D. Pfeiffer and Roger T. Hartley Department of Computer Science New Mexico State University Las Cruces, NM 88003-8001, USA email: hdp@cs.nmsu.edu and rth@cs.nmsu.edu
More informationComputerized Adaptive Psychological Testing A Personalisation Perspective
Psychology and the internet: An European Perspective Computerized Adaptive Psychological Testing A Personalisation Perspective Mykola Pechenizkiy mpechen@cc.jyu.fi Introduction Mixed Model of IRT and ES
More informationLanguage-driven nonverbal communication in a bilingual. Conversational Agents
Language-driven nonverbal communication in a bilingual conversational agent Scott A. King, Alistair Knott and Brendan McCane Dept of Computer Science University of Otago PO Box 56 Dunedin New Zealand +64
More informationSIE: Speech Enabled Interface for E-Learning
SIE: Speech Enabled Interface for E-Learning Shikha M.Tech Student Lovely Professional University, Phagwara, Punjab INDIA ABSTRACT In today s world, e-learning is very important and popular. E- learning
More informationSpeech Segmentation Using Probabilistic Phonetic Feature Hierarchy and Support Vector Machines
Speech Segmentation Using Probabilistic Phonetic Feature Hierarchy and Support Vector Machines Amit Juneja and Carol Espy-Wilson Department of Electrical and Computer Engineering University of Maryland,
More informationClass-Discriminative Weighted Distortion Measure for VQ-Based Speaker Identification
Class-Discriminative Weighted Distortion Measure for VQ-Based Speaker Identification Tomi Kinnunen and Ismo Kärkkäinen University of Joensuu, Department of Computer Science, P.O. Box 111, 80101 JOENSUU,
More informationAssignment 1: Predicting Amazon Review Ratings
Assignment 1: Predicting Amazon Review Ratings 1 Dataset Analysis Richard Park r2park@acsmail.ucsd.edu February 23, 2015 The dataset selected for this assignment comes from the set of Amazon reviews for
More informationSpeech Synthesis in Noisy Environment by Enhancing Strength of Excitation and Formant Prominence
INTERSPEECH September,, San Francisco, USA Speech Synthesis in Noisy Environment by Enhancing Strength of Excitation and Formant Prominence Bidisha Sharma and S. R. Mahadeva Prasanna Department of Electronics
More informationThe 9 th International Scientific Conference elearning and software for Education Bucharest, April 25-26, / X
The 9 th International Scientific Conference elearning and software for Education Bucharest, April 25-26, 2013 10.12753/2066-026X-13-154 DATA MINING SOLUTIONS FOR DETERMINING STUDENT'S PROFILE Adela BÂRA,
More informationInternational Journal of Advanced Networking Applications (IJANA) ISSN No. :
International Journal of Advanced Networking Applications (IJANA) ISSN No. : 0975-0290 34 A Review on Dysarthric Speech Recognition Megha Rughani Department of Electronics and Communication, Marwadi Educational
More informationWHEN THERE IS A mismatch between the acoustic
808 IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 14, NO. 3, MAY 2006 Optimization of Temporal Filters for Constructing Robust Features in Speech Recognition Jeih-Weih Hung, Member,
More informationSAM - Sensors, Actuators and Microcontrollers in Mobile Robots
Coordinating unit: Teaching unit: Academic year: Degree: ECTS credits: 2017 230 - ETSETB - Barcelona School of Telecommunications Engineering 710 - EEL - Department of Electronic Engineering BACHELOR'S
More informationBody-Conducted Speech Recognition and its Application to Speech Support System
Body-Conducted Speech Recognition and its Application to Speech Support System 4 Shunsuke Ishimitsu Hiroshima City University Japan 1. Introduction In recent years, speech recognition systems have been
More informationAn Architecture to Develop Multimodal Educative Applications with Chatbots
International Journal of Advanced Robotic Systems ARTICLE An Architecture to Develop Multimodal Educative Applications with Chatbots Regular Paper David Griol 1,* and Zoraida Callejas 2 1 Department of
More informationCWIS 23,3. Nikolaos Avouris Human Computer Interaction Group, University of Patras, Patras, Greece
The current issue and full text archive of this journal is available at wwwemeraldinsightcom/1065-0741htm CWIS 138 Synchronous support and monitoring in web-based educational systems Christos Fidas, Vasilios
More informationRendezvous with Comet Halley Next Generation of Science Standards
Next Generation of Science Standards 5th Grade 6 th Grade 7 th Grade 8 th Grade 5-PS1-3 Make observations and measurements to identify materials based on their properties. MS-PS1-4 Develop a model that
More informationInternational Journal of Computational Intelligence and Informatics, Vol. 1 : No. 4, January - March 2012
Text-independent Mono and Cross-lingual Speaker Identification with the Constraint of Limited Data Nagaraja B G and H S Jayanna Department of Information Science and Engineering Siddaganga Institute of
More informationCourse Law Enforcement II. Unit I Careers in Law Enforcement
Course Law Enforcement II Unit I Careers in Law Enforcement Essential Question How does communication affect the role of the public safety professional? TEKS 130.294(c) (1)(A)(B)(C) Prior Student Learning
More informationAnalysis of Hybrid Soft and Hard Computing Techniques for Forex Monitoring Systems
Analysis of Hybrid Soft and Hard Computing Techniques for Forex Monitoring Systems Ajith Abraham School of Business Systems, Monash University, Clayton, Victoria 3800, Australia. Email: ajith.abraham@ieee.org
More informationTest Effort Estimation Using Neural Network
J. Software Engineering & Applications, 2010, 3: 331-340 doi:10.4236/jsea.2010.34038 Published Online April 2010 (http://www.scirp.org/journal/jsea) 331 Chintala Abhishek*, Veginati Pavan Kumar, Harish
More informationXinyu Tang. Education. Research Interests. Honors and Awards. Professional Experience
Xinyu Tang Parasol Laboratory Department of Computer Science Texas A&M University, TAMU 3112 College Station, TX 77843-3112 phone:(979)847-8835 fax: (979)458-0425 email: xinyut@tamu.edu url: http://parasol.tamu.edu/people/xinyut
More informationRadius STEM Readiness TM
Curriculum Guide Radius STEM Readiness TM While today s teens are surrounded by technology, we face a stark and imminent shortage of graduates pursuing careers in Science, Technology, Engineering, and
More informationProceedings of Meetings on Acoustics
Proceedings of Meetings on Acoustics Volume 19, 2013 http://acousticalsociety.org/ ICA 2013 Montreal Montreal, Canada 2-7 June 2013 Speech Communication Session 2aSC: Linking Perception and Production
More informationUnvoiced Landmark Detection for Segment-based Mandarin Continuous Speech Recognition
Unvoiced Landmark Detection for Segment-based Mandarin Continuous Speech Recognition Hua Zhang, Yun Tang, Wenju Liu and Bo Xu National Laboratory of Pattern Recognition Institute of Automation, Chinese
More informationELA/ELD Standards Correlation Matrix for ELD Materials Grade 1 Reading
ELA/ELD Correlation Matrix for ELD Materials Grade 1 Reading The English Language Arts (ELA) required for the one hour of English-Language Development (ELD) Materials are listed in Appendix 9-A, Matrix
More informationExperiments with SMS Translation and Stochastic Gradient Descent in Spanish Text Author Profiling
Experiments with SMS Translation and Stochastic Gradient Descent in Spanish Text Author Profiling Notebook for PAN at CLEF 2013 Andrés Alfonso Caurcel Díaz 1 and José María Gómez Hidalgo 2 1 Universidad
More informationGuru: A Computer Tutor that Models Expert Human Tutors
Guru: A Computer Tutor that Models Expert Human Tutors Andrew Olney 1, Sidney D'Mello 2, Natalie Person 3, Whitney Cade 1, Patrick Hays 1, Claire Williams 1, Blair Lehman 1, and Art Graesser 1 1 University
More informationRule discovery in Web-based educational systems using Grammar-Based Genetic Programming
Data Mining VI 205 Rule discovery in Web-based educational systems using Grammar-Based Genetic Programming C. Romero, S. Ventura, C. Hervás & P. González Universidad de Córdoba, Campus Universitario de
More informationPhonetic- and Speaker-Discriminant Features for Speaker Recognition. Research Project
Phonetic- and Speaker-Discriminant Features for Speaker Recognition by Lara Stoll Research Project Submitted to the Department of Electrical Engineering and Computer Sciences, University of California
More informationCHANCERY SMS 5.0 STUDENT SCHEDULING
CHANCERY SMS 5.0 STUDENT SCHEDULING PARTICIPANT WORKBOOK VERSION: 06/04 CSL - 12148 Student Scheduling Chancery SMS 5.0 : Student Scheduling... 1 Course Objectives... 1 Course Agenda... 1 Topic 1: Overview
More informationThe Perception of Nasalized Vowels in American English: An Investigation of On-line Use of Vowel Nasalization in Lexical Access
The Perception of Nasalized Vowels in American English: An Investigation of On-line Use of Vowel Nasalization in Lexical Access Joyce McDonough 1, Heike Lenhert-LeHouiller 1, Neil Bardhan 2 1 Linguistics
More informationSpeaker Recognition. Speaker Diarization and Identification
Speaker Recognition Speaker Diarization and Identification A dissertation submitted to the University of Manchester for the degree of Master of Science in the Faculty of Engineering and Physical Sciences
More information