Towards an Evolutionary Computational Approach to Articulatory Vocal Synthesis with PRAAT
|
|
- Lindsey Underwood
- 5 years ago
- Views:
Transcription
1 Towards an Evolutionary Computational Approach to Articulatory Vocal Synthesis with PRAAT Jared Drayton and Eduardo Miranda Interdisciplinary Centre for Computer Music Research, Plymouth University, UK Abstract. This paper presents our current work into developing an evolutionary computing approach to articulatory speech synthesis. Specifically, we implement genetic algorithms to find optimised parameter combinations for the re-synthesis of a vowel using the articulatory synthesiser PRAAT. Our framework analyses the target sound using Fast Fourier Transform (FFT) to obtain formant information, which is then harnessed in a fitness function applied to a real valued genetic algorithm using a generation size of 75 sounds over 50 generations. In this paper, we present three differently configured genetic algorithms (GAs) and offer a comparison of their suitability for elevating the average fitness of the re-synthesised sounds. Keywords: Articulatory Vocal Synthesis, Vocal Synthesis, Evolutionary Computing, Speech, PRAAT, Genetic Algorithms 1 Introduction Computing technology has advanced at a rapid frequency over the last eighty years. As computers are becoming more ubiquitous in our everyday lives, the need to communicate with our technology is increasing. Speech synthesis is the artificial production of human speech and features in an increasing amount of our digital devices. We can see the use of speech synthesis in a wide span of technologies, ranging from car GPS navigation to video games. Currently, there are three main approaches to artificially producing speech: concatenative synthesis, formant synthesis and articulatory synthesis. Out of these three, concatenative synthesis is the approach that dominates. Concatenative speech synthesis is a sound synthesis approach where small sound units of pre-recorded speech are selected from a database, and sequenced together to produce a target sound or sound sequence. This approach currently offers the highest amount of intelligibility and naturalness when compared to the other techniques available. Because the technique relies on the arranging of sound recordings from human speakers, it bypasses some of the drawbacks inherent in other methods; for example, the unnatural timbre of formant synthesis,
2 or an imperfect physical model used in articulatory synthesis. However, there are a number of limitations on concatenative synthesis systems that result from its reliance on pre-recorded speech. The corpus of sounds that concatenative synthesis relies on is finite, and the segments themselves cannot be modified extensively without negatively impacting the quality and naturalness of the sound. This severely limits the capacity to modify prosody in relation to the text given. Therefore, to account for different types of prosody, it must be accounted for in the creation of the original corpus. Articulatory synthesis is widely considered to have the biggest potential out of all current speech synthesis techniques [1]. However, as it stands, articulatory speech synthesis is largely unexploited and undeveloped. This is largely attributed to the difficulty of producing a robust articulatory Text To Speech (TTS) system that can perform on a par with existing concatenative solutions. This is due to the highly complex and non-linear relationship between parameters and the resultant sound. There have been a number of different approaches attempted for extracting vocal tract area functions, or articulatory movements. These range from using methods of imaging the vocal apparatus during speech (using machines such as an X-Ray [2] or MRI [3]) to attaching sensors to the articulators themselves. Inversion of parameters from the original audio has also been attempted [4]. In this paper we present a framework for developing an evolutionary computing approach to articulatory speech synthesis together with some initial results. The primary motivation for this research is to explore approaches to an automatic system of obtaining vocal tract area functions from recorded speech data. This is highly desirable in furthering the field of articulatory synthesis and the field of speech synthesis in general. 2 Background 2.1 Articulatory Synthesis Articulatory synthesis is a physical modelling approach to sound synthesis. These physical models emulate the physiology of the human vocal apparatus. They simulate how air exhaled from the lungs is modified by the glottis and larynx, then propagated through the vocal tract and further modified by the articulators such as the tongue and lips. Control of this synthesis method is achieved by passing numerical values to parameters that correspond to individual muscles or muscles groupings. Therefore, any set of parameter values can be thought of as describing an articulatory configuration or articulatory movement i.e. describing a vocal area tract function. There are a number of different approaches when it comes to the design of articulatory synthesisers. Some synthesisers favour the use of a simplified periodic signal in place of physically modelling the larynx. This allows the fundamental frequency or pitch to be defined manually, and decrease the complexity of the model. By not attempting to simulate the lungs and larynx, the realism in terms
3 Fig. 1. Mid-Sagittal View of the Human Vocal Apparatus of phonetic quality is reduced. Breathing patterns also have a great impact on prosody, and are also essential for accurately modelling of fricatives and plosive speech sounds. 2.2 Evolutionary Computation Within the field of evolutionary computing, there is a group of heuristic search techniques known as evolutionary algorithms that draw inspiration from the neodarwinian paradigm of evolution and survival of the fittest. These evolutionary algorithms work on an iterative process of generating solutions and testing their fitness and suitability using a fitness funtion, then combining genetic material from the fittest candidates. This is done by using genetic operators such as selection, crossover and mutation. Genetic Algorithms were developed by John Holland and put forward in the seminal text Adaptation in Natural and Artificial Systems [5]. They have been employed in a variety of different optimisation tasks [6], especially in tasks where the search space is large and not well understood. The approach of using evolutionary computing for non-linear sound synthesis applications is not a new concept, and has been explored by a number of researchers. Several different EC techniques are given in Evolutionary Computer Music [7] for musical applications, with chapters 5-7 specifically implementing GA s. Parameter matching with Frequency Modulation (FM) synthesis has also been explored [8], [9].
4 3 Methods The articulatory synthesiser used in this project is the PRAAT articulatory synthesiser [10]. PRAAT is a multi-functional software package with tools for a large range of speech analysis and synthesis tasks developed by Paul Boersma and David Weenink [11]. Whilst having a fully-fledged graphical user interface, PRAAT also provides the ability to use its own scripting language, allowing the majority of operations to be executed autonomously. This functionality allows a genetic algorithm to be implemented in conjunction with PRAAT without a great deal of retrofitting that would be required with other available synthesisers such as VTDemo or Vocal Tract Lab 2. Additionally the provided analysis tools for speech make the integration of an appropriate fitness function highly convenient, and minimises the need for using external tools. The physical model constraints are configured to use an adult male speaker. Control of the synthesiser is done by passing a configuration file that contains a list of all parameters for the model. The model used in PRAAT has 29 parameters that can be specified. Therefore the encoding approach taken in this GA is a real value representation, with each individual stored as a vector of 28 numbers. Each number of the vector represents a parameter and can take any value in the range -1 x 1. Where p 1 = Interarytenoid, p 2 = Cricothyroid, p 3 = Vocalis, p 4 = Thyroarytenoid etc. [ p1 p 2 p 3 p 4... p n ] Therefore a randomly generated individual may be initialised with parameter values such as Interarytenoid = 0.82, Cricothyroid = -0.2, Vocalis = -0.48, Thyroarytenoid = 0.1 etc. [ 0.82, 0.2, 0.48, pn ] Only one parameter uses prior knowledge. The lungs are set to a predefined value of 0.15 at the beginning of the articulation, then 0.0 at 0.1 seconds. PRAAT automatically interpolates values between these two discrete settings. The reasoning behind this choice is that unlike the other parameters, the lungs parameter needs to be changed over time to provide energy or excitation of the vocal folds necessary for phonation. This parameter is kept the same for every individual generated and is not altered by any GA operations, hence the reason individuals are represented as vectors with a length of 28, and not 29. The fitness function is implemented by using a FFT for analysis of features of the target sound. Four frequencies are extracted from each sound. The first is the fundamental frequency or pitch of the sound. The next three frequencies are the first three formants produced. This analysis is performed on the target vowel sound, and then subsequently performed on each individual sound in every generation. Fitness is based on the differences between the four frequencies in the target sound, and the respective frequencies in each individual. A penalty is introduced for each formant that is not present in the candidate s solution, which replaces the difference in frequency with a large arbitrary value (10,000).
5 The natural state of the fitness function in this application is a minimisation function, as the goal is to minimise the differences between features of two sounds rather than maximise some sort of profit. It is therefore necessary to scale the fitness for each individual in order to implement a fitness proportional selection scheme. This is achieved through dividing one by each candidate s fitness. Because of the inherent use of stochastic processes in Genetic Algorithms, any analysis of results must take this into account. Each experiment is run multiple times to ensure that there is no bias due to the stochastic process. Pseudorandom numbers are used for any stochastic processes and are provided by the built-in random function in Python, which uses a Mersenne twist algorithm. Mutation of allele values is done using a Gaussian distribution where µ = 0 and σ = Any mutation that results in allele values outside of the constraints -1 x 1 are rounded to 1.0 or Results Three differently configured genetic algorithms are presented, with the results displayed using a performance graph, as shown in Fig.2, 3 and 4. These graphs show the average fitness of the population at every generation, and the fitness of the best candidate from each population at every generation. All experiments are carried out with the same size populations and number of generations, with number of generations set to 50 and a population size of 75. The generation at 0 on each performance graph is the initial randomly generated population without having any genetic operations performed. This would be a random voice configuration. 4.1 Experiment 1 - Elitism Operator The first run is not strictly a genetic algorithm, as there is no exchange of genetic material between individuals generated in each population. It is more akin to a hybrid parallel evolutionary strategy. This first experiment was a proof of concept to demonstrate the ability that the fitness function worked as it should, and that it could differentiate. As displayed in Fig.2, the results of this show the average fitness of each population steadily increasing. This exploratory measure confirms the basic viability and behaviour in this domain. 4.2 Experiment 2 - Fitness Proportional Selection with One Point Crossover This experiment sees the introduction of fitness proportional selection (FPS), combined with one point crossover for exchange of genetic material between candidate solutions. An example of one point crossover is shown below. Two parent candidates are selected by making two calls to the FPS function which returns a candidate for each call. A random crossover point is generated and used to combine values
6 Fig. 2. Performance Graph of Using Only an Elitism Operator from both parents before and after this point. For this example a shortened range of 10 parameters is used. P arent1 [ 0.25, 0.32, 0.97, 0.6, 0.23, 0.5, 0.31, 0.89, 0.4, 0.93 ] P arent2 [ 0.4, 0.24, 0.64, 0.35, 0.51, 0.7, 0.93, 0.19, 0.83, 0.18 ] After one point crossover with a value of three, an offspring or child candidate is created by combining the first three allele values from parent 1 and then the last seven values from parent 2. This then creates a new solution containing genetic material from both parent candidates. Offspring [ 0.25, 0.32, 0.97, 0.35, 0.51, 0.7, 0.93, 0.19, 0.83, 0.18 ] This process if then repeated until a new population has been generated. In Fig.3, it shows the average fitness converges much quicker than when using just elitism. However there is stagnation of genetic diversity from around the 23rd generation where both the average fitness and best candidate of each generation shows very little change. 4.3 Experiment 3 - Fitness Proportional Selection with One Point Crossover and Mutation Here, the FPS operator is kept for selection. A mutation operator is also implemented with each allele value having a probability of 0.1 to mutate.
7 Fig. 3. Performance Graph Using Fitness Proportional Selection and One Point Crossover As it can be observed in Fig.4 there is a rapid improvement in average fitness, after which it settles down into smaller fluctuations. The fitness of the best candidate at each point improves more slowly early on, but more consistently. The voice model is clearly converging towards the target sound. 5 Discussions 5.1 Observations It is clear that the elitism operator - because there is no combination of candidates - causes the average of each population to move steadily towards the target, but will not generate new candidates outside of a random search. Removing the elitism operator and replacing it with a fitness proportional selection operator, as done in Experiment 2, causes the rapid increase in the optimisation of the average population but leads to a stagnation of genetic diversity. With the incorporation of the mutation operator in experiment 3, the average fitness fluctuated more than in the previous experiments, but this also produced the best results with respect to the fittest candidate in each population. In general GAs seem to be able to optimise the PRAAT parameters.
8 Fig. 4. Performance graph using Fitness Proportional Selection, One Point Crossover and Mutation operators. 5.2 Future Work As this is an early work in progress there are several areas identified for extensive improvement and future research, these will be focused on the following. Fitness Function: The fitness function is a crucial aspect of any genetic algorithm. This is especially true when using a multimodal one. As it is the only metric that can guide the search process, it is therefore imperative that it accurately represents the suitability of a candidate solution. If the fitness of an individual is miss-represented, then regions of the search space may be exploited that are not conducive to finding good solutions. A number of shortcomings are clear with the current fitness function. For example the current analysis takes a FFT using a window size equal to the entire length of the sound. This does not account for things such as vocal breaks, intensity of phonation, modulation of pitch etc. Genetic Operator Additions: The fitness proportional selection scheme, when applied to other optimisation tasks, has been found to be in certain cases an inferior operator. Therefore, a Rank-Based Selection to bring a change in selection pressure will be implemented. Trials with different crossover operators will also be explored, such as uniform crossover and two point crossover. Genetic Algorithm Parameters: The relationship between number of generations, population size, and mutation rate will all have a large impact
9 on convergence and population diversity. As the synthesis of each individual is computationally expensive, minimising the total number of individuals in a run is desirable. Further experiments with different mutation rates, population sizes and number of generations need to be taken to ascertain optimal values. To conclude, our results indicate that GAs are a viable method of optimisation for the parameters of PRAAT, and therefore articulatory synthesis. A substantial number of improvements have been identified, which when implemented may improve the robustness and effectiveness of the genetic algorithm for use in mapping sounds to articulatory configurations. References 1. Shadle, C., Damper, R.: Prospects for articulatory synthesis: A position paper. In: 4th ISCA Tutorial and Research Workshop. (2002) 2. Schroeter, J., Sondhi, M.: Techniques for estimating vocal-tract shapes from the speech signal. IEEE Transactions on Speech and Audio Processing 2(1) (1994) Kim, Y.c., Kim, J., Proctor, M., Toutios, A., Nayak, K., Lee, S., Narayanan, S.: Toward Automatic Vocal Tract Area Function Estimation from Accelerated Threedimensional Magnetic Resonance Imaging. In: ISCA Workshop on Speech Production in Automatic Speech Recognition, Lyon, France (2013) Busset, J., Laprie, Y., Cnrs, L., Botanique, J.: Acoustic-to-articulatory inversion by analysis-by-synthesis using cepstral coefficients. In: ICA - 21st International Congress on Acoustics. Volume (2013) 5. Holland, J.H.: Adaptation in natural and artificial systems: An introductory analysis with applications to biology, control, and artificial intelligence. U Michigan Press (1975) 6. Goldberg, D.E., Others: Genetic algorithms in search, optimization, and machine learning. Volume 412. Addison-wesley Reading Menlo Park (1989) 7. Miranda, E.R., Al Biles, J.: Evolutionary computer music. Springer (2007) 8. Horner, A., Beauchamp, J., Haken, L.: Machine tongues XVI: Genetic algorithms and their application to FM matching synthesis. Computer Music Journal (1993) Mitchell, T.J.: An exploration of evolutionary computation applied to frequency modulation audio synthesis parameter optimisation. PhD thesis, University of the West of England (2010) 10. Boersma, P.: Praat, a system for doing phonetics by computer. Glot international 5(9/10) (2001) Boersma, P.: Functional phonology: Formalizing the interactions between articulatory and perceptual drives. Holland Academic Graphics/IFOTT (1998)
Laboratorio di Intelligenza Artificiale e Robotica
Laboratorio di Intelligenza Artificiale e Robotica A.A. 2008-2009 Outline 2 Machine Learning Unsupervised Learning Supervised Learning Reinforcement Learning Genetic Algorithms Genetics-Based Machine Learning
More informationLaboratorio di Intelligenza Artificiale e Robotica
Laboratorio di Intelligenza Artificiale e Robotica A.A. 2008-2009 Outline 2 Machine Learning Unsupervised Learning Supervised Learning Reinforcement Learning Genetic Algorithms Genetics-Based Machine Learning
More informationClass-Discriminative Weighted Distortion Measure for VQ-Based Speaker Identification
Class-Discriminative Weighted Distortion Measure for VQ-Based Speaker Identification Tomi Kinnunen and Ismo Kärkkäinen University of Joensuu, Department of Computer Science, P.O. Box 111, 80101 JOENSUU,
More informationHuman Emotion Recognition From Speech
RESEARCH ARTICLE OPEN ACCESS Human Emotion Recognition From Speech Miss. Aparna P. Wanare*, Prof. Shankar N. Dandare *(Department of Electronics & Telecommunication Engineering, Sant Gadge Baba Amravati
More informationAUTOMATIC DETECTION OF PROLONGED FRICATIVE PHONEMES WITH THE HIDDEN MARKOV MODELS APPROACH 1. INTRODUCTION
JOURNAL OF MEDICAL INFORMATICS & TECHNOLOGIES Vol. 11/2007, ISSN 1642-6037 Marek WIŚNIEWSKI *, Wiesława KUNISZYK-JÓŹKOWIAK *, Elżbieta SMOŁKA *, Waldemar SUSZYŃSKI * HMM, recognition, speech, disorders
More informationModule 12. Machine Learning. Version 2 CSE IIT, Kharagpur
Module 12 Machine Learning 12.1 Instructional Objective The students should understand the concept of learning systems Students should learn about different aspects of a learning system Students should
More informationPhonetics. The Sound of Language
Phonetics. The Sound of Language 1 The Description of Sounds Fromkin & Rodman: An Introduction to Language. Fort Worth etc., Harcourt Brace Jovanovich Read: Chapter 5, (p. 176ff.) (or the corresponding
More information1. REFLEXES: Ask questions about coughing, swallowing, of water as fast as possible (note! Not suitable for all
Human Communication Science Chandler House, 2 Wakefield Street London WC1N 1PF http://www.hcs.ucl.ac.uk/ ACOUSTICS OF SPEECH INTELLIGIBILITY IN DYSARTHRIA EUROPEAN MASTER S S IN CLINICAL LINGUISTICS UNIVERSITY
More informationDesign Of An Automatic Speaker Recognition System Using MFCC, Vector Quantization And LBG Algorithm
Design Of An Automatic Speaker Recognition System Using MFCC, Vector Quantization And LBG Algorithm Prof. Ch.Srinivasa Kumar Prof. and Head of department. Electronics and communication Nalanda Institute
More informationAudible and visible speech
Building sensori-motor prototypes from audiovisual exemplars Gérard BAILLY Institut de la Communication Parlée INPG & Université Stendhal 46, avenue Félix Viallet, 383 Grenoble Cedex, France web: http://www.icp.grenet.fr/bailly
More informationQuarterly Progress and Status Report. VCV-sequencies in a preliminary text-to-speech system for female speech
Dept. for Speech, Music and Hearing Quarterly Progress and Status Report VCV-sequencies in a preliminary text-to-speech system for female speech Karlsson, I. and Neovius, L. journal: STL-QPSR volume: 35
More informationSeminar - Organic Computing
Seminar - Organic Computing Self-Organisation of OC-Systems Markus Franke 25.01.2006 Typeset by FoilTEX Timetable 1. Overview 2. Characteristics of SO-Systems 3. Concern with Nature 4. Design-Concepts
More informationSpeaker Identification by Comparison of Smart Methods. Abstract
Journal of mathematics and computer science 10 (2014), 61-71 Speaker Identification by Comparison of Smart Methods Ali Mahdavi Meimand Amin Asadi Majid Mohamadi Department of Electrical Department of Computer
More informationLearning Methods for Fuzzy Systems
Learning Methods for Fuzzy Systems Rudolf Kruse and Andreas Nürnberger Department of Computer Science, University of Magdeburg Universitätsplatz, D-396 Magdeburg, Germany Phone : +49.39.67.876, Fax : +49.39.67.8
More informationA study of speaker adaptation for DNN-based speech synthesis
A study of speaker adaptation for DNN-based speech synthesis Zhizheng Wu, Pawel Swietojanski, Christophe Veaux, Steve Renals, Simon King The Centre for Speech Technology Research (CSTR) University of Edinburgh,
More informationSpeech Recognition using Acoustic Landmarks and Binary Phonetic Feature Classifiers
Speech Recognition using Acoustic Landmarks and Binary Phonetic Feature Classifiers October 31, 2003 Amit Juneja Department of Electrical and Computer Engineering University of Maryland, College Park,
More informationSpeech Synthesis in Noisy Environment by Enhancing Strength of Excitation and Formant Prominence
INTERSPEECH September,, San Francisco, USA Speech Synthesis in Noisy Environment by Enhancing Strength of Excitation and Formant Prominence Bidisha Sharma and S. R. Mahadeva Prasanna Department of Electronics
More informationUnvoiced Landmark Detection for Segment-based Mandarin Continuous Speech Recognition
Unvoiced Landmark Detection for Segment-based Mandarin Continuous Speech Recognition Hua Zhang, Yun Tang, Wenju Liu and Bo Xu National Laboratory of Pattern Recognition Institute of Automation, Chinese
More informationThe dilemma of Saussurean communication
ELSEVIER BioSystems 37 (1996) 31-38 The dilemma of Saussurean communication Michael Oliphant Deparlment of Cognitive Science, University of California, San Diego, CA, USA Abstract A Saussurean communication
More informationSpeech Emotion Recognition Using Support Vector Machine
Speech Emotion Recognition Using Support Vector Machine Yixiong Pan, Peipei Shen and Liping Shen Department of Computer Technology Shanghai JiaoTong University, Shanghai, China panyixiong@sjtu.edu.cn,
More informationConsonants: articulation and transcription
Phonology 1: Handout January 20, 2005 Consonants: articulation and transcription 1 Orientation phonetics [G. Phonetik]: the study of the physical and physiological aspects of human sound production and
More informationInternational Journal of Advanced Networking Applications (IJANA) ISSN No. :
International Journal of Advanced Networking Applications (IJANA) ISSN No. : 0975-0290 34 A Review on Dysarthric Speech Recognition Megha Rughani Department of Electronics and Communication, Marwadi Educational
More informationCooperative evolutive concept learning: an empirical study
Cooperative evolutive concept learning: an empirical study Filippo Neri University of Piemonte Orientale Dipartimento di Scienze e Tecnologie Avanzate Piazza Ambrosoli 5, 15100 Alessandria AL, Italy Abstract
More informationThe Perception of Nasalized Vowels in American English: An Investigation of On-line Use of Vowel Nasalization in Lexical Access
The Perception of Nasalized Vowels in American English: An Investigation of On-line Use of Vowel Nasalization in Lexical Access Joyce McDonough 1, Heike Lenhert-LeHouiller 1, Neil Bardhan 2 1 Linguistics
More informationWhile you are waiting... socrative.com, room number SIMLANG2016
While you are waiting... socrative.com, room number SIMLANG2016 Simulating Language Lecture 4: When will optimal signalling evolve? Simon Kirby simon@ling.ed.ac.uk T H E U N I V E R S I T Y O H F R G E
More informationSpeaker recognition using universal background model on YOHO database
Aalborg University Master Thesis project Speaker recognition using universal background model on YOHO database Author: Alexandre Majetniak Supervisor: Zheng-Hua Tan May 31, 2011 The Faculties of Engineering,
More informationSpeech Recognition at ICSI: Broadcast News and beyond
Speech Recognition at ICSI: Broadcast News and beyond Dan Ellis International Computer Science Institute, Berkeley CA Outline 1 2 3 The DARPA Broadcast News task Aspects of ICSI
More informationPerceptual scaling of voice identity: common dimensions for different vowels and speakers
DOI 10.1007/s00426-008-0185-z ORIGINAL ARTICLE Perceptual scaling of voice identity: common dimensions for different vowels and speakers Oliver Baumann Æ Pascal Belin Received: 15 February 2008 / Accepted:
More informationProceedings of Meetings on Acoustics
Proceedings of Meetings on Acoustics Volume 19, 2013 http://acousticalsociety.org/ ICA 2013 Montreal Montreal, Canada 2-7 June 2013 Speech Communication Session 2aSC: Linking Perception and Production
More informationLinking Task: Identifying authors and book titles in verbose queries
Linking Task: Identifying authors and book titles in verbose queries Anaïs Ollagnier, Sébastien Fournier, and Patrice Bellot Aix-Marseille University, CNRS, ENSAM, University of Toulon, LSIS UMR 7296,
More informationLearning Optimal Dialogue Strategies: A Case Study of a Spoken Dialogue Agent for
Learning Optimal Dialogue Strategies: A Case Study of a Spoken Dialogue Agent for Email Marilyn A. Walker Jeanne C. Fromer Shrikanth Narayanan walker@research.att.com jeannie@ai.mit.edu shri@research.att.com
More informationQuarterly Progress and Status Report. Voiced-voiceless distinction in alaryngeal speech - acoustic and articula
Dept. for Speech, Music and Hearing Quarterly Progress and Status Report Voiced-voiceless distinction in alaryngeal speech - acoustic and articula Nord, L. and Hammarberg, B. and Lundström, E. journal:
More informationPRAAT ON THE WEB AN UPGRADE OF PRAAT FOR SEMI-AUTOMATIC SPEECH ANNOTATION
PRAAT ON THE WEB AN UPGRADE OF PRAAT FOR SEMI-AUTOMATIC SPEECH ANNOTATION SUMMARY 1. Motivation 2. Praat Software & Format 3. Extended Praat 4. Prosody Tagger 5. Demo 6. Conclusions What s the story behind?
More informationOn Developing Acoustic Models Using HTK. M.A. Spaans BSc.
On Developing Acoustic Models Using HTK M.A. Spaans BSc. On Developing Acoustic Models Using HTK M.A. Spaans BSc. Delft, December 2004 Copyright c 2004 M.A. Spaans BSc. December, 2004. Faculty of Electrical
More informationDesigning a Rubric to Assess the Modelling Phase of Student Design Projects in Upper Year Engineering Courses
Designing a Rubric to Assess the Modelling Phase of Student Design Projects in Upper Year Engineering Courses Thomas F.C. Woodhall Masters Candidate in Civil Engineering Queen s University at Kingston,
More informationPython Machine Learning
Python Machine Learning Unlock deeper insights into machine learning with this vital guide to cuttingedge predictive analytics Sebastian Raschka [ PUBLISHING 1 open source I community experience distilled
More informationAnalysis of Emotion Recognition System through Speech Signal Using KNN & GMM Classifier
IOSR Journal of Electronics and Communication Engineering (IOSR-JECE) e-issn: 2278-2834,p- ISSN: 2278-8735.Volume 10, Issue 2, Ver.1 (Mar - Apr.2015), PP 55-61 www.iosrjournals.org Analysis of Emotion
More informationSoftware Development: Programming Paradigms (SCQF level 8)
Higher National Unit Specification General information Unit code: HL9V 35 Superclass: CB Publication date: May 2017 Source: Scottish Qualifications Authority Version: 01 Unit purpose This unit is intended
More informationInternational Journal of Computational Intelligence and Informatics, Vol. 1 : No. 4, January - March 2012
Text-independent Mono and Cross-lingual Speaker Identification with the Constraint of Limited Data Nagaraja B G and H S Jayanna Department of Information Science and Engineering Siddaganga Institute of
More informationWHEN THERE IS A mismatch between the acoustic
808 IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 14, NO. 3, MAY 2006 Optimization of Temporal Filters for Constructing Robust Features in Speech Recognition Jeih-Weih Hung, Member,
More informationEvolution of Symbolisation in Chimpanzees and Neural Nets
Evolution of Symbolisation in Chimpanzees and Neural Nets Angelo Cangelosi Centre for Neural and Adaptive Systems University of Plymouth (UK) a.cangelosi@plymouth.ac.uk Introduction Animal communication
More informationEli Yamamoto, Satoshi Nakamura, Kiyohiro Shikano. Graduate School of Information Science, Nara Institute of Science & Technology
ISCA Archive SUBJECTIVE EVALUATION FOR HMM-BASED SPEECH-TO-LIP MOVEMENT SYNTHESIS Eli Yamamoto, Satoshi Nakamura, Kiyohiro Shikano Graduate School of Information Science, Nara Institute of Science & Technology
More informationSpeaker Recognition. Speaker Diarization and Identification
Speaker Recognition Speaker Diarization and Identification A dissertation submitted to the University of Manchester for the degree of Master of Science in the Faculty of Engineering and Physical Sciences
More informationThe IRISA Text-To-Speech System for the Blizzard Challenge 2017
The IRISA Text-To-Speech System for the Blizzard Challenge 2017 Pierre Alain, Nelly Barbot, Jonathan Chevelu, Gwénolé Lecorvé, Damien Lolive, Claude Simon, Marie Tahon IRISA, University of Rennes 1 (ENSSAT),
More informationKnowledge-Based - Systems
Knowledge-Based - Systems ; Rajendra Arvind Akerkar Chairman, Technomathematics Research Foundation and Senior Researcher, Western Norway Research institute Priti Srinivas Sajja Sardar Patel University
More informationReinforcement Learning by Comparing Immediate Reward
Reinforcement Learning by Comparing Immediate Reward Punit Pandey DeepshikhaPandey Dr. Shishir Kumar Abstract This paper introduces an approach to Reinforcement Learning Algorithm by comparing their immediate
More informationAxiom 2013 Team Description Paper
Axiom 2013 Team Description Paper Mohammad Ghazanfari, S Omid Shirkhorshidi, Farbod Samsamipour, Hossein Rahmatizadeh Zagheli, Mohammad Mahdavi, Payam Mohajeri, S Abbas Alamolhoda Robotics Scientific Association
More informationModeling function word errors in DNN-HMM based LVCSR systems
Modeling function word errors in DNN-HMM based LVCSR systems Melvin Jose Johnson Premkumar, Ankur Bapna and Sree Avinash Parchuri Department of Computer Science Department of Electrical Engineering Stanford
More informationMultiagent Simulation of Learning Environments
Multiagent Simulation of Learning Environments Elizabeth Sklar and Mathew Davies Dept of Computer Science Columbia University New York, NY 10027 USA sklar,mdavies@cs.columbia.edu ABSTRACT One of the key
More informationA Pipelined Approach for Iterative Software Process Model
A Pipelined Approach for Iterative Software Process Model Ms.Prasanthi E R, Ms.Aparna Rathi, Ms.Vardhani J P, Mr.Vivek Krishna Electronics and Radar Development Establishment C V Raman Nagar, Bangalore-560093,
More informationBusiness Analytics and Information Tech COURSE NUMBER: 33:136:494 COURSE TITLE: Data Mining and Business Intelligence
Business Analytics and Information Tech COURSE NUMBER: 33:136:494 COURSE TITLE: Data Mining and Business Intelligence COURSE DESCRIPTION This course presents computing tools and concepts for all stages
More informationRule discovery in Web-based educational systems using Grammar-Based Genetic Programming
Data Mining VI 205 Rule discovery in Web-based educational systems using Grammar-Based Genetic Programming C. Romero, S. Ventura, C. Hervás & P. González Universidad de Córdoba, Campus Universitario de
More informationA comparison of spectral smoothing methods for segment concatenation based speech synthesis
D.T. Chappell, J.H.L. Hansen, "Spectral Smoothing for Speech Segment Concatenation, Speech Communication, Volume 36, Issues 3-4, March 2002, Pages 343-373. A comparison of spectral smoothing methods for
More informationLecture 1: Basic Concepts of Machine Learning
Lecture 1: Basic Concepts of Machine Learning Cognitive Systems - Machine Learning Ute Schmid (lecture) Johannes Rabold (practice) Based on slides prepared March 2005 by Maximilian Röglinger, updated 2010
More informationMajor Milestones, Team Activities, and Individual Deliverables
Major Milestones, Team Activities, and Individual Deliverables Milestone #1: Team Semester Proposal Your team should write a proposal that describes project objectives, existing relevant technology, engineering
More informationArtificial Neural Networks written examination
1 (8) Institutionen för informationsteknologi Olle Gällmo Universitetsadjunkt Adress: Lägerhyddsvägen 2 Box 337 751 05 Uppsala Artificial Neural Networks written examination Monday, May 15, 2006 9 00-14
More informationOn the Formation of Phoneme Categories in DNN Acoustic Models
On the Formation of Phoneme Categories in DNN Acoustic Models Tasha Nagamine Department of Electrical Engineering, Columbia University T. Nagamine Motivation Large performance gap between humans and state-
More informationA Neural Network GUI Tested on Text-To-Phoneme Mapping
A Neural Network GUI Tested on Text-To-Phoneme Mapping MAARTEN TROMPPER Universiteit Utrecht m.f.a.trompper@students.uu.nl Abstract Text-to-phoneme (T2P) mapping is a necessary step in any speech synthesis
More informationEGRHS Course Fair. Science & Math AP & IB Courses
EGRHS Course Fair Science & Math AP & IB Courses Science Courses: AP Physics IB Physics SL IB Physics HL AP Biology IB Biology HL AP Physics Course Description Course Description AP Physics C (Mechanics)
More informationISFA2008U_120 A SCHEDULING REINFORCEMENT LEARNING ALGORITHM
Proceedings of 28 ISFA 28 International Symposium on Flexible Automation Atlanta, GA, USA June 23-26, 28 ISFA28U_12 A SCHEDULING REINFORCEMENT LEARNING ALGORITHM Amit Gil, Helman Stern, Yael Edan, and
More informationEvolutive Neural Net Fuzzy Filtering: Basic Description
Journal of Intelligent Learning Systems and Applications, 2010, 2: 12-18 doi:10.4236/jilsa.2010.21002 Published Online February 2010 (http://www.scirp.org/journal/jilsa) Evolutive Neural Net Fuzzy Filtering:
More informationWord Segmentation of Off-line Handwritten Documents
Word Segmentation of Off-line Handwritten Documents Chen Huang and Sargur N. Srihari {chuang5, srihari}@cedar.buffalo.edu Center of Excellence for Document Analysis and Recognition (CEDAR), Department
More informationOn-Line Data Analytics
International Journal of Computer Applications in Engineering Sciences [VOL I, ISSUE III, SEPTEMBER 2011] [ISSN: 2231-4946] On-Line Data Analytics Yugandhar Vemulapalli #, Devarapalli Raghu *, Raja Jacob
More informationSoftprop: Softmax Neural Network Backpropagation Learning
Softprop: Softmax Neural Networ Bacpropagation Learning Michael Rimer Computer Science Department Brigham Young University Provo, UT 84602, USA E-mail: mrimer@axon.cs.byu.edu Tony Martinez Computer Science
More informationSpecification and Evaluation of Machine Translation Toy Systems - Criteria for laboratory assignments
Specification and Evaluation of Machine Translation Toy Systems - Criteria for laboratory assignments Cristina Vertan, Walther v. Hahn University of Hamburg, Natural Language Systems Division Hamburg,
More informationDocument number: 2013/ Programs Committee 6/2014 (July) Agenda Item 42.0 Bachelor of Engineering with Honours in Software Engineering
Document number: 2013/0006139 Programs Committee 6/2014 (July) Agenda Item 42.0 Bachelor of Engineering with Honours in Software Engineering Program Learning Outcomes Threshold Learning Outcomes for Engineering
More informationThe NICT/ATR speech synthesis system for the Blizzard Challenge 2008
The NICT/ATR speech synthesis system for the Blizzard Challenge 2008 Ranniery Maia 1,2, Jinfu Ni 1,2, Shinsuke Sakai 1,2, Tomoki Toda 1,3, Keiichi Tokuda 1,4 Tohru Shimizu 1,2, Satoshi Nakamura 1,2 1 National
More informationTD(λ) and Q-Learning Based Ludo Players
TD(λ) and Q-Learning Based Ludo Players Majed Alhajry, Faisal Alvi, Member, IEEE and Moataz Ahmed Abstract Reinforcement learning is a popular machine learning technique whose inherent self-learning ability
More informationUnit purpose and aim. Level: 3 Sub-level: Unit 315 Credit value: 6 Guided learning hours: 50
Unit Title: Game design concepts Level: 3 Sub-level: Unit 315 Credit value: 6 Guided learning hours: 50 Unit purpose and aim This unit helps learners to familiarise themselves with the more advanced aspects
More informationModeling function word errors in DNN-HMM based LVCSR systems
Modeling function word errors in DNN-HMM based LVCSR systems Melvin Jose Johnson Premkumar, Ankur Bapna and Sree Avinash Parchuri Department of Computer Science Department of Electrical Engineering Stanford
More informationIntegrating simulation into the engineering curriculum: a case study
Integrating simulation into the engineering curriculum: a case study Baidurja Ray and Rajesh Bhaskaran Sibley School of Mechanical and Aerospace Engineering, Cornell University, Ithaca, New York, USA E-mail:
More informationMASTER OF SCIENCE (M.S.) MAJOR IN COMPUTER SCIENCE
Master of Science (M.S.) Major in Computer Science 1 MASTER OF SCIENCE (M.S.) MAJOR IN COMPUTER SCIENCE Major Program The programs in computer science are designed to prepare students for doctoral research,
More informationAlpha provides an overall measure of the internal reliability of the test. The Coefficient Alphas for the STEP are:
Every individual is unique. From the way we look to how we behave, speak, and act, we all do it differently. We also have our own unique methods of learning. Once those methods are identified, it can make
More informationLearning Structural Correspondences Across Different Linguistic Domains with Synchronous Neural Language Models
Learning Structural Correspondences Across Different Linguistic Domains with Synchronous Neural Language Models Stephan Gouws and GJ van Rooyen MIH Medialab, Stellenbosch University SOUTH AFRICA {stephan,gvrooyen}@ml.sun.ac.za
More informationSoftware Maintenance
1 What is Software Maintenance? Software Maintenance is a very broad activity that includes error corrections, enhancements of capabilities, deletion of obsolete capabilities, and optimization. 2 Categories
More informationCourses in English. Application Development Technology. Artificial Intelligence. 2017/18 Spring Semester. Database access
The courses availability depends on the minimum number of registered students (5). If the course couldn t start, students can still complete it in the form of project work and regular consultations with
More informationGACE Computer Science Assessment Test at a Glance
GACE Computer Science Assessment Test at a Glance Updated May 2017 See the GACE Computer Science Assessment Study Companion for practice questions and preparation resources. Assessment Name Computer Science
More informationMandarin Lexical Tone Recognition: The Gating Paradigm
Kansas Working Papers in Linguistics, Vol. 0 (008), p. 8 Abstract Mandarin Lexical Tone Recognition: The Gating Paradigm Yuwen Lai and Jie Zhang University of Kansas Research on spoken word recognition
More informationCircuit Simulators: A Revolutionary E-Learning Platform
Circuit Simulators: A Revolutionary E-Learning Platform Mahi Itagi Padre Conceicao College of Engineering, Verna, Goa, India. itagimahi@gmail.com Akhil Deshpande Gogte Institute of Technology, Udyambag,
More informationLip reading: Japanese vowel recognition by tracking temporal changes of lip shape
Lip reading: Japanese vowel recognition by tracking temporal changes of lip shape Koshi Odagiri 1, and Yoichi Muraoka 1 1 Graduate School of Fundamental/Computer Science and Engineering, Waseda University,
More informationLevel 6. Higher Education Funding Council for England (HEFCE) Fee for 2017/18 is 9,250*
Programme Specification: Undergraduate For students starting in Academic Year 2017/2018 1. Course Summary Names of programme(s) and award title(s) Award type Mode of study Framework of Higher Education
More informationSpecification of the Verity Learning Companion and Self-Assessment Tool
Specification of the Verity Learning Companion and Self-Assessment Tool Sergiu Dascalu* Daniela Saru** Ryan Simpson* Justin Bradley* Eva Sarwar* Joohoon Oh* * Department of Computer Science ** Dept. of
More informationReducing Features to Improve Bug Prediction
Reducing Features to Improve Bug Prediction Shivkumar Shivaji, E. James Whitehead, Jr., Ram Akella University of California Santa Cruz {shiv,ejw,ram}@soe.ucsc.edu Sunghun Kim Hong Kong University of Science
More informationBluetooth mlearning Applications for the Classroom of the Future
Bluetooth mlearning Applications for the Classroom of the Future Tracey J. Mehigan, Daniel C. Doolan, Sabin Tabirca Department of Computer Science, University College Cork, College Road, Cork, Ireland
More informationSemi-Supervised GMM and DNN Acoustic Model Training with Multi-system Combination and Confidence Re-calibration
INTERSPEECH 2013 Semi-Supervised GMM and DNN Acoustic Model Training with Multi-system Combination and Confidence Re-calibration Yan Huang, Dong Yu, Yifan Gong, and Chaojun Liu Microsoft Corporation, One
More informationAutomatic Pronunciation Checker
Institut für Technische Informatik und Kommunikationsnetze Eidgenössische Technische Hochschule Zürich Swiss Federal Institute of Technology Zurich Ecole polytechnique fédérale de Zurich Politecnico federale
More informationA SURVEY OF FUZZY COGNITIVE MAP LEARNING METHODS
A SURVEY OF FUZZY COGNITIVE MAP LEARNING METHODS Wociech Stach, Lukasz Kurgan, and Witold Pedrycz Department of Electrical and Computer Engineering University of Alberta Edmonton, Alberta T6G 2V4, Canada
More informationPRODUCT COMPLEXITY: A NEW MODELLING COURSE IN THE INDUSTRIAL DESIGN PROGRAM AT THE UNIVERSITY OF TWENTE
INTERNATIONAL CONFERENCE ON ENGINEERING AND PRODUCT DESIGN EDUCATION 6 & 7 SEPTEMBER 2012, ARTESIS UNIVERSITY COLLEGE, ANTWERP, BELGIUM PRODUCT COMPLEXITY: A NEW MODELLING COURSE IN THE INDUSTRIAL DESIGN
More informationLEGO MINDSTORMS Education EV3 Coding Activities
LEGO MINDSTORMS Education EV3 Coding Activities s t e e h s k r o W t n e d Stu LEGOeducation.com/MINDSTORMS Contents ACTIVITY 1 Performing a Three Point Turn 3-6 ACTIVITY 2 Written Instructions for a
More informationProblems of the Arabic OCR: New Attitudes
Problems of the Arabic OCR: New Attitudes Prof. O.Redkin, Dr. O.Bernikova Department of Asian and African Studies, St. Petersburg State University, St Petersburg, Russia Abstract - This paper reviews existing
More informationA GENERIC SPLIT PROCESS MODEL FOR ASSET MANAGEMENT DECISION-MAKING
A GENERIC SPLIT PROCESS MODEL FOR ASSET MANAGEMENT DECISION-MAKING Yong Sun, a * Colin Fidge b and Lin Ma a a CRC for Integrated Engineering Asset Management, School of Engineering Systems, Queensland
More informationDesigning a Computer to Play Nim: A Mini-Capstone Project in Digital Design I
Session 1793 Designing a Computer to Play Nim: A Mini-Capstone Project in Digital Design I John Greco, Ph.D. Department of Electrical and Computer Engineering Lafayette College Easton, PA 18042 Abstract
More informationSOFTWARE EVALUATION TOOL
SOFTWARE EVALUATION TOOL Kyle Higgins Randall Boone University of Nevada Las Vegas rboone@unlv.nevada.edu Higgins@unlv.nevada.edu N.B. This form has not been fully validated and is still in development.
More informationSARDNET: A Self-Organizing Feature Map for Sequences
SARDNET: A Self-Organizing Feature Map for Sequences Daniel L. James and Risto Miikkulainen Department of Computer Sciences The University of Texas at Austin Austin, TX 78712 dljames,risto~cs.utexas.edu
More informationExpressive speech synthesis: a review
Int J Speech Technol (2013) 16:237 260 DOI 10.1007/s10772-012-9180-2 Expressive speech synthesis: a review D. Govind S.R. Mahadeva Prasanna Received: 31 May 2012 / Accepted: 11 October 2012 / Published
More informationSpeech Segmentation Using Probabilistic Phonetic Feature Hierarchy and Support Vector Machines
Speech Segmentation Using Probabilistic Phonetic Feature Hierarchy and Support Vector Machines Amit Juneja and Carol Espy-Wilson Department of Electrical and Computer Engineering University of Maryland,
More informationAnalyzing the Usage of IT in SMEs
IBIMA Publishing Communications of the IBIMA http://www.ibimapublishing.com/journals/cibima/cibima.html Vol. 2010 (2010), Article ID 208609, 10 pages DOI: 10.5171/2010.208609 Analyzing the Usage of IT
More informationLearning Methods in Multilingual Speech Recognition
Learning Methods in Multilingual Speech Recognition Hui Lin Department of Electrical Engineering University of Washington Seattle, WA 98125 linhui@u.washington.edu Li Deng, Jasha Droppo, Dong Yu, and Alex
More informationSelf Study Report Computer Science
Computer Science undergraduate students have access to undergraduate teaching, and general computing facilities in three buildings. Two large classrooms are housed in the Davis Centre, which hold about
More informationUniversity of Groningen. Systemen, planning, netwerken Bosman, Aart
University of Groningen Systemen, planning, netwerken Bosman, Aart IMPORTANT NOTE: You are advised to consult the publisher's version (publisher's PDF) if you wish to cite from it. Please check the document
More information