Speech Synthesis: An Alternative Approach to a Different Problem

Size: px
Start display at page:

Download "Speech Synthesis: An Alternative Approach to a Different Problem"

Transcription

1 Speech Synthesis: An Alternative Approach to a Different Problem Hans Kull, Member, IEEE Abstract Current speech synthesis applications and tools are built to generate speech from text automatically, without need for human intervention Whilst speaking to people in the multimedia and games industry, I became aware of a demand for speech synthesis applications where users are able to manipulate the speech generated in various manners This paper describes the design of the user interface and the current prototype that was developed to address these issues The prototype shows how the user can set and change voices and manipulate text, pronunciation and stress, prosody and volume of the speech generated Furthermore it shows how the user can modify segments of the speech to produce additional effects like echo or telephone The software architecture which shows how this functionality was implemented is presented too The last chapter describes the speech engine under development for the definitive program version and it also presents ideas on how to express emotions like happiness, fear or anger Index Terms Speech Synthesis, Text-to-Speech, Speech Processing, Acoustic Signal Processing I INTRODUCTION IMAGINE you intend to produce a radio feature For every role in your play you need to find an actor with the desired voice You need to arrange a production time that suits all actors, you need to find a sound studio and a technician to operate it, and of course this date has to fit into your own schedule too, so you can direct the play This sixty-year-old approach has become very expensive That is the reason why so few new radio features are produced nowadays The multimedia, games and other industries face similar problems when it comes to include speech into their products This has lead to a culture of avoiding or at least minimising the need of speech in multimedia applications The games Myst and Riven (both trademarks of Cyan Inc) of Brøderbund Software, Inc are famous examples for this approach Imagine you could do that all on your computer All you need to have is a software package in which you can set up your actors (voices), import the text, assign the text passages Fig 1 Speech synthesiser user interface Hans Kull is with Informatic Technologies Pty Ltd, Geelong, Vic, Australia (telephone , kull@inmaticcom) ISBN to the voices and start directing the play Figure 1 illustrates what is meant by directing the play The user is presented with an editor that allows him to manipulate text, phonetics, prosody and volume He can press the play button and check the generated speech If he is not totally satisfied, he can modify whatever he needs to change, play it again and so forth, until he has worked himself through

2 the entire text, in a similar manner as he previously did with his actors There are more possibilities for manipulating the output of the program For example effects like phone or echo can be added, and in subsequent sections ways in which the user can specify emotions like happiness, anger and fear will be described However, before going into further detail an overview of the components of the application is provided II PROGRAM COMPONENTS A Overview The concept behind the application is described in [1] The most important parts of our speech synthesiser program are the speech editor and the speech synthesiser If the user chooses to play a part of the text, then the speech editor passes speaker parameters, prosody and pronunciation information to the speech synthesiser This in turn generates the speech signal and directs it to the audio device of the computer or into a file, depending on the user s output options For the editor to work properly there are helper modules needed The user is supported with as much default functionality as possible, so he need not deal with trivialities and can concentrate on more important tasks Dictionary Text Import Speaker Editor Speech Engine Fig 2 Overall Application Architecture Prosody Generation Effects Mixer B The Dictionary Tool The dictionary tool allows the user to import dictionaries, create new dictionaries from scratch or by modifying an existing one, and he can change pronunciation and basic prosody for every word in the dictionary For words with more than one pronunciation like read or lead he can define additional pronunciation and prosody pairs and he can store hints on the use of a particular pair depending on the given situation The most common uses of the dictionary tool is to add new words to a given dictionary and to create a new dictionary from an existing one To define effects like accents or dialects, the user can copy a dictionary and then modify the pronunciation of all the words in it by entering modification rules As an overly simplified example, he could replace the pronunciation of th in English (ie θ ) by the pronunciation for z (s) to generate a German accent The user can modify pronunciation and prosody of a single word in the editor directly and he can always store his modification in the dictionary or keep it local to this particular instance in the text An important function of the dictionary tool is to provide default pronunciation and prosody for every word entered in the editor If the dictionary finds more than one pronunciation of a word, then it has to decide on which one to choose On the other hand, if it does not find a given word, then it has to generate pronunciation and prosody based on rules which are language-dependent Therefore, every dictionary stores its base language and its modifiers C The Speaker Tool This tool is used to define and modify speakers A speaker is defined by voice parameters (see chapter III) A dictionary is also assigned to the speaker To define a new speaker, the user defines a voice, usually by selecting a standard voice (child, young female, elderly female, young male, and elderly male) and then modifying its parameters The user can just modify the basic parameters like pitch, speed and volume, or he can go into the extended dialog and modify all parameters he wants to To produce good feedback, the user can play a standard text after every modification he made Extended mode is helpful as well to create unnatural voices for example comic figures, robots, computers or aliens A further extension of the speaker tool could be a tool, which allows the creation of new voices from natural voices The speaker would have to talk for a certain amount of time into a microphone Depending on the type of speech synthesiser used, the recorded speech would then be analysed, the speaker parameters extracted and the results used to create a new speaker D Prosody Generation At the end of every sentence entered into the editor, the basic prosody of the sentence (given by the basic prosody of the words) is modified to produce the default prosody of the sentence To do this standard techniques are used as described in the literature eg in [2] and [3] E Text Import The text import tool is not only to import plain texts, it helps as well to import texts that contain speaker information already, like for example a play In this case one can specify, that a predefined name given at the beginning of a paragraph should translate into a speaker object with the same name As another example the user can specify that all text of a given font should translate into headings Headings are parts in the text that are not passed to the speech engine and therefore remain silent However, if it is necessary you can ask the editor to pass the heading information to the synthesiser too A predefined speaker, the narrator, then speaks the headings F The Editor Only headings do not have a speaker assigned in the editor To all other text a speaker is assigned first As soon as a particular word knows its speaker, it knows the dictionary it

3 belongs to The dictionary then delivers the phonetic and prosodic information needed to complete the word s information At the end of every sentence the prosody generator is called This device modifies the basic prosody provided by the dictionary for every word After all assignments of speakers to text, the speech synthesiser can be provided with enough information to produce good quality speech But this is not good enough for the intended user of our application He wants to make his mark on the spoken text and let the speaker give much more expression to parts of the spoken text than what can be generated automatically The user can edit not only the text, but the phonetics and the prosody too For words with more than one pronunciation, the user can look them up and simply select from them If he wants to modify the phonetics of a word, he can decide whether he wants this pronunciation stored as a replacement of the existing one in the dictionary or as an additional one If the word is not stored in the dictionary and its phonetics and prosody was generated by rule, then he can store it in the dictionary too The user can change the prosody as well He can change pitch and duration of every note within a given range Furthermore, the user can change the volume In our prototype volume is visualised by the font size Italics stand for whispering Prosody and volume control gives the user the possibility to address problems that come from different meanings of a statement like: I want this error to be fixed today! which has a completely different meaning depending on whether you stress for example this or today Emotions are specified in a similar way In our prototype they are visualised by a coloured background of the text Blue stands for happiness, green for jealousy, red for anger and yellow for fear G Effects Additional effects, like background noise, talk or music, echo or a filter to emulate a telephone line can be defined too Although some of these effects could be added at a later stage, ie after generation of the speech signal, this functionality is provided in the editor, to enable proper synchronisation of speech and effects H Mixer The mixer is a post-processing stage to the speech synthesiser Effects like echo or filters are post-processing stages to the speech signal, additional sources like background noise are added and in the mixer tool it is possible to adjust their volume Effects and the mixer are not implemented in the prototype It would be possible to use a standard tool readily available on the market to do their job This functionality will be provided to make sure everything is properly synchronised For Internet use a stand alone speech synthesiser is planned This program will contain the mixer too, although with no user interface provided The intent of this program is to provide the client with an application, which turns the data stream passed to the speech synthesiser into audio output on the client s computer Therefore additional effects and sound information like background noise and it s volume information will be passed on to this stand alone synthesiser as well as the speech information III THE SPEECH SYNTHESISER A Current Technologies Currently there are two principal technologies used to generate automated speech, speech concatenation and formant synthesis Speech synthesis based on concatenation uses recorded pieces of real speech In Text-to-Speech applications, these recorded pieces are short utterances, usually containing combinations of two phonemes Simply put, the synthesiser then for example concatenates the utterances for /ha/ and /at/ to create the utterance for /hat/ Arguably the best synthesiser based on this technology is AT&T s new Text-to-Speech synthesiser, see wwwnaturalvoicescom Formant synthesis on the other hand uses a mathematical model of the human vocal tract to create speech One of the well-known models is the Klatt-Synthesiser [4] on which for example the DECtalk speech synthesiser is based [5] However, there is no longer a clear distinction between the two technologies, as we will see later B Comparison 1) Basic Functionality At first glance both technologies seem to be suitable for our purpose Both have the same pre-processing stages, consisting of text parsing, letter-sound translation and prosody generation In the application, these parts are done in the editor to give the user maximum control over the speech generated Existing speech synthesis software packages do these processing steps themselves automatically and existing development kits like the Microsoft Speech SDK or the AT&T Speech SDK give little control over this process This means that a specialist speech synthesis has to be developed too, but the fact that these pre-processing steps are common to both technologies indicates that both technologies can be used for the final synthesis steps 2) Voice Creation One important property a speech synthesiser has to provide in our application is the creation and modification of voices In a speech concatenation synthesiser, recorded speech is used to extract the utterances needed For AT&T s speech synthesiser it takes approximately 40 hours of recorded speech to reproduce a specific voice [6] Although they hope this could be reduced to a few hours, this still requires too much effort On the other hand, for our purpose it would be good enough in most cases, if the new voice in some way resembles the original one, without any need for the listener to identify the original speaker In a formant synthesiser voices are stored as parameter sets New voices can be created by modifying an existing parameter set This sounds easier than it is, because a voice is described

4 by many parameters that are not completely independent Therefore, modifying just one parameter can lead to a very unnatural sounding voice To create a completely new voice from recorded speech is perhaps achieved by the use of a modified version of a speaker identification algorithm as described in [7], which could be used to provide the parameters needed 3) Emotions Although our prototype currently allows the user to enter emotions, there is no functionality implemented yet in the synthesiser to process this information Part of the necessary modifications could be made in the preprocessing stage Sadness or depression for example is expressed by a monotone low voice This is easily achieved by simply modifying prosody and volume For example sadness or depression is characterised by the speaker letting his head hang down and almost speaking to himself, as opposed to standing straight and talking with a smile when he is happy The positions of head and chest or the smile, all change the speaker parameters So if these changes are known, it is possible to modify the speaker parameters in formant synthesis On the other hand, with a concatenation synthesiser (as used in the prototype) it could prove to be very difficult to generate the desired result However, expression of emotions with the voice alone has its limitations I believe it will work fine as long as the emotion expressed is supported by the spoken text It will most likely still work if you want to tell a joke with a sad voice However, to produce a paradoxical message, eg a sad text spoken with a happy voice has its limitations, because in life such messages depend not only on the auditory information but on the body language as well C Hybrid Models In a hybrid model, rather than storing phonemes or other parts of speech as a signal, the parameters of these signals are stored These parameters are usually extracted from real spoken text Then, instead of concatenating the speech signals the signal is generate for every unit by means of its parameters (see Fig 3) and rules are used to modify the parameters in the transitional steps In an additional processing step after the parameter extraction, the parameters are normalised Since the use of a hybrid model is intended, this normalisation process could prove crucial, since only a good standardisation will allow the modification of the parameters with the speaker- the prosodyand other information D Implementation with Sinusoidal Synthesis As described in [7], in some respect sinusoidal synthesis has similarities to the filter bank representation used in the vocoder However, since the use of discrete Fourier transform (DFT) renders a highly adaptive filter bank, I prefer to use the basic idea of this method, with adaptations to our needs Fig 3 describes the basic architecture of the synthesiser we intend to implement Parameters are generated from phonetic and prosody information, the speaker model and from modifiers which for example express emotions for every frame This step is described in more detail in the next chapter Phonetics, Prosody Noise Amplit Phase 1 Frequ 1 Amplit 1 Phase n Frequ n Amplit n Parameter Generation Speaker Frame-toframe interpol interpolation Phase unwrap and interpol interpolation Phase unwrap and interpol interpolation Modifiers Fig 3 Speech Synthesiser Architecture Noise Gen Sine Gen Sine Gen Speech Output The parameters then are used to generate a set of n phase, frequency and amplitude triples For vowels the duration information obtained from the melody line is used to determine the length of the frame For every frame and parameter triple, the information is unwrapped and interpolated A sine wave generator gets the frequency and phase and generates the sine wave which is amplified by the amplitude amount Every triple thus defines a signal which then is summed up to produce the synthetic speech output signal To avoid discontinuities at the boundaries of the frames some provisions must be made to smoothly interpolate the parameters from one frame to the next For this purpose, sine wave tracks for frequencies are established, where every frequency of the current frame is attached to the closest match of the previous one If it is not possible to establish a good match, then a track may die and later a new one is born How these tracks are constructed is described eg in [7], but there are other ways of interpolation, see for example [8] or [9] A more difficult problem then is the matching of the phases Although computational expensive, it is intended to use cubic phase interpolation since this method produces the best results These transition procedures are described in detail in [10] E Parameter Generation 1) The Normalisation Procedure To understand the concept of parameter generation, I have to explain first how I intend to normalise the frame parameters

5 for every frame As mentioned earlier, normalisation is crucial to our synthesiser Fig 4 gives an example of the speech parameters extracted for a particular frame The extraction process is described in detail eg in [7] For every peak value a parameter triple is generated Frequency Fig 4 Spectral Magnitude of a Speech Element For this purpose normalisation has to be so, that the frames belonging to a speech element store the parameters in a form which is independent of the speaker To do this, it is intended to record the speech elements from a natural speaker and then to normalise them by means of the speaker parameters Furthermore, from the speaker parameters a standardisation window will be created, see Fig 5 F1 Fig 5 Standardisation Window F2 Frequency For every frequency the amplitude evaluated after recording is then divided by the respective amplitude of the standardisation window Then the frequencies are shifted so that the first formant F1 is at 100Hz It is obvious, that for the same speaker one will get the same frequencies and amplitudes back by shifting the frequencies F3 back so that the 100Hz frequency goes to the first formant, and afterwards by multiplying for every frequency the amplitudes by the respective amplitude of the standardisation window 2) Parameter Generation As we have seen, the input to parameter generation is the standardised frame parameters, obtained via the phonetics from the frames table, the speaker model, the prosody and the modifiers In a first step, for every frame the speaker parameters are modified To match the frequency given by the prosody, the first formant is shifted to the frequency given by the prosody The adaptation for emotions will be more complex, but will essentially be the modification of all the formants in frequency and amplitude, except for the first one, where the prosody takes precedence Using the modified speaker parameters the modifier window is built the same way as the standardisation window for normalisation was built Then the frequencies are shifted and the amplitudes multiplied to get the final parameters There is no need to use random noise for standard speech synthesis, see [7], but other authors for example [11] apply such a model, although they admit that it is not valid from a speech production point of view I am still considering which model to use However, in both cases there will be some need for a noise generator to express whispering There the amplitude for all frequencies is reduced and random noise is added ACKNOWLEDGEMENTS The author thanks Adam Pitts and Peter Brdar for their valuable input REFERENCES [1] H Kull, Device And Method For Digital Voice Processing, PCT patent application, international publication number WO 00/16310, 2000 [2] R Linggard, Electronic synthesis of speech, Cambridge: Cambridge University Press, pp [3] Dutoit, Thierry, An introduction to text-to-speech synthesis, Dordrecht, Boston: Kluwer Academic Publishers, 1997 [4] DH Klatt, "Software for a Cascade/Parallel Formant Synthesizer," Journal of the Acoustical Society of America, vol 67, pp , 1980 [5] W J Hallahan, DECtalk Software: Text-to-Speech Technology and Implementation, Digital Technical Journal Vol 7 No pp 5-19 [6] E Vonderheld, Speech Synthesis Offers Realism For Voices of Computers, Automobiles and Yes, Even VCR,s, The Institute, vol 26 No 3, March 2002 [7] T F Qualtieri, Discrete-Time Speech Signal Processing, Upper Saddle River, NJ: Prentice Hall 2002 [8] F Valerio and O Böffard, A Hybrid Model for Text-to-Speech Synthesis, IEEE Trans On Speech and Audio Processing, vol 6 no 5, pp , September 1998 [9] M Banbrook, S McLaughlin and I Mann, Speech Characterisation and Synthesis by Nonlinear Methods, IEEE Trans On Speech and Audio Processing, vol 7 no 1, pp 1-17, January 1999 [10] R J McAulay and TF Qualtieri, Speech Analysis-Synthesis based on a Sinusoidal Representation, IEEE Trans Acoustics, Speech and Signal Processing, vol 34 no 4, pp , August 1986 [11] Y Stylianou, Applying the Harmonic Plus Noise Model in Concatenative Speech Synthesis, IEEE Trans On Speech and Audio Processing, vol 9 no 1, pp 21-29, January 2001

Human Emotion Recognition From Speech

Human Emotion Recognition From Speech RESEARCH ARTICLE OPEN ACCESS Human Emotion Recognition From Speech Miss. Aparna P. Wanare*, Prof. Shankar N. Dandare *(Department of Electronics & Telecommunication Engineering, Sant Gadge Baba Amravati

More information

Quarterly Progress and Status Report. VCV-sequencies in a preliminary text-to-speech system for female speech

Quarterly Progress and Status Report. VCV-sequencies in a preliminary text-to-speech system for female speech Dept. for Speech, Music and Hearing Quarterly Progress and Status Report VCV-sequencies in a preliminary text-to-speech system for female speech Karlsson, I. and Neovius, L. journal: STL-QPSR volume: 35

More information

Speech Emotion Recognition Using Support Vector Machine

Speech Emotion Recognition Using Support Vector Machine Speech Emotion Recognition Using Support Vector Machine Yixiong Pan, Peipei Shen and Liping Shen Department of Computer Technology Shanghai JiaoTong University, Shanghai, China panyixiong@sjtu.edu.cn,

More information

Analysis of Emotion Recognition System through Speech Signal Using KNN & GMM Classifier

Analysis of Emotion Recognition System through Speech Signal Using KNN & GMM Classifier IOSR Journal of Electronics and Communication Engineering (IOSR-JECE) e-issn: 2278-2834,p- ISSN: 2278-8735.Volume 10, Issue 2, Ver.1 (Mar - Apr.2015), PP 55-61 www.iosrjournals.org Analysis of Emotion

More information

REVIEW OF CONNECTED SPEECH

REVIEW OF CONNECTED SPEECH Language Learning & Technology http://llt.msu.edu/vol8num1/review2/ January 2004, Volume 8, Number 1 pp. 24-28 REVIEW OF CONNECTED SPEECH Title Connected Speech (North American English), 2000 Platform

More information

A study of speaker adaptation for DNN-based speech synthesis

A study of speaker adaptation for DNN-based speech synthesis A study of speaker adaptation for DNN-based speech synthesis Zhizheng Wu, Pawel Swietojanski, Christophe Veaux, Steve Renals, Simon King The Centre for Speech Technology Research (CSTR) University of Edinburgh,

More information

Design Of An Automatic Speaker Recognition System Using MFCC, Vector Quantization And LBG Algorithm

Design Of An Automatic Speaker Recognition System Using MFCC, Vector Quantization And LBG Algorithm Design Of An Automatic Speaker Recognition System Using MFCC, Vector Quantization And LBG Algorithm Prof. Ch.Srinivasa Kumar Prof. and Head of department. Electronics and communication Nalanda Institute

More information

Speech Synthesis in Noisy Environment by Enhancing Strength of Excitation and Formant Prominence

Speech Synthesis in Noisy Environment by Enhancing Strength of Excitation and Formant Prominence INTERSPEECH September,, San Francisco, USA Speech Synthesis in Noisy Environment by Enhancing Strength of Excitation and Formant Prominence Bidisha Sharma and S. R. Mahadeva Prasanna Department of Electronics

More information

Speech Recognition at ICSI: Broadcast News and beyond

Speech Recognition at ICSI: Broadcast News and beyond Speech Recognition at ICSI: Broadcast News and beyond Dan Ellis International Computer Science Institute, Berkeley CA Outline 1 2 3 The DARPA Broadcast News task Aspects of ICSI

More information

International Journal of Computational Intelligence and Informatics, Vol. 1 : No. 4, January - March 2012

International Journal of Computational Intelligence and Informatics, Vol. 1 : No. 4, January - March 2012 Text-independent Mono and Cross-lingual Speaker Identification with the Constraint of Limited Data Nagaraja B G and H S Jayanna Department of Information Science and Engineering Siddaganga Institute of

More information

SIE: Speech Enabled Interface for E-Learning

SIE: Speech Enabled Interface for E-Learning SIE: Speech Enabled Interface for E-Learning Shikha M.Tech Student Lovely Professional University, Phagwara, Punjab INDIA ABSTRACT In today s world, e-learning is very important and popular. E- learning

More information

WiggleWorks Software Manual PDF0049 (PDF) Houghton Mifflin Harcourt Publishing Company

WiggleWorks Software Manual PDF0049 (PDF) Houghton Mifflin Harcourt Publishing Company WiggleWorks Software Manual PDF0049 (PDF) Houghton Mifflin Harcourt Publishing Company Table of Contents Welcome to WiggleWorks... 3 Program Materials... 3 WiggleWorks Teacher Software... 4 Logging In...

More information

Class-Discriminative Weighted Distortion Measure for VQ-Based Speaker Identification

Class-Discriminative Weighted Distortion Measure for VQ-Based Speaker Identification Class-Discriminative Weighted Distortion Measure for VQ-Based Speaker Identification Tomi Kinnunen and Ismo Kärkkäinen University of Joensuu, Department of Computer Science, P.O. Box 111, 80101 JOENSUU,

More information

Expressive speech synthesis: a review

Expressive speech synthesis: a review Int J Speech Technol (2013) 16:237 260 DOI 10.1007/s10772-012-9180-2 Expressive speech synthesis: a review D. Govind S.R. Mahadeva Prasanna Received: 31 May 2012 / Accepted: 11 October 2012 / Published

More information

AUTOMATIC DETECTION OF PROLONGED FRICATIVE PHONEMES WITH THE HIDDEN MARKOV MODELS APPROACH 1. INTRODUCTION

AUTOMATIC DETECTION OF PROLONGED FRICATIVE PHONEMES WITH THE HIDDEN MARKOV MODELS APPROACH 1. INTRODUCTION JOURNAL OF MEDICAL INFORMATICS & TECHNOLOGIES Vol. 11/2007, ISSN 1642-6037 Marek WIŚNIEWSKI *, Wiesława KUNISZYK-JÓŹKOWIAK *, Elżbieta SMOŁKA *, Waldemar SUSZYŃSKI * HMM, recognition, speech, disorders

More information

A GENERIC SPLIT PROCESS MODEL FOR ASSET MANAGEMENT DECISION-MAKING

A GENERIC SPLIT PROCESS MODEL FOR ASSET MANAGEMENT DECISION-MAKING A GENERIC SPLIT PROCESS MODEL FOR ASSET MANAGEMENT DECISION-MAKING Yong Sun, a * Colin Fidge b and Lin Ma a a CRC for Integrated Engineering Asset Management, School of Engineering Systems, Queensland

More information

A comparison of spectral smoothing methods for segment concatenation based speech synthesis

A comparison of spectral smoothing methods for segment concatenation based speech synthesis D.T. Chappell, J.H.L. Hansen, "Spectral Smoothing for Speech Segment Concatenation, Speech Communication, Volume 36, Issues 3-4, March 2002, Pages 343-373. A comparison of spectral smoothing methods for

More information

have to be modeled) or isolated words. Output of the system is a grapheme-tophoneme conversion system which takes as its input the spelling of words,

have to be modeled) or isolated words. Output of the system is a grapheme-tophoneme conversion system which takes as its input the spelling of words, A Language-Independent, Data-Oriented Architecture for Grapheme-to-Phoneme Conversion Walter Daelemans and Antal van den Bosch Proceedings ESCA-IEEE speech synthesis conference, New York, September 1994

More information

Speaker Recognition. Speaker Diarization and Identification

Speaker Recognition. Speaker Diarization and Identification Speaker Recognition Speaker Diarization and Identification A dissertation submitted to the University of Manchester for the degree of Master of Science in the Faculty of Engineering and Physical Sciences

More information

Revisiting the role of prosody in early language acquisition. Megha Sundara UCLA Phonetics Lab

Revisiting the role of prosody in early language acquisition. Megha Sundara UCLA Phonetics Lab Revisiting the role of prosody in early language acquisition Megha Sundara UCLA Phonetics Lab Outline Part I: Intonation has a role in language discrimination Part II: Do English-learning infants have

More information

Voice conversion through vector quantization

Voice conversion through vector quantization J. Acoust. Soc. Jpn.(E)11, 2 (1990) Voice conversion through vector quantization Masanobu Abe, Satoshi Nakamura, Kiyohiro Shikano, and Hisao Kuwabara A TR Interpreting Telephony Research Laboratories,

More information

Eli Yamamoto, Satoshi Nakamura, Kiyohiro Shikano. Graduate School of Information Science, Nara Institute of Science & Technology

Eli Yamamoto, Satoshi Nakamura, Kiyohiro Shikano. Graduate School of Information Science, Nara Institute of Science & Technology ISCA Archive SUBJECTIVE EVALUATION FOR HMM-BASED SPEECH-TO-LIP MOVEMENT SYNTHESIS Eli Yamamoto, Satoshi Nakamura, Kiyohiro Shikano Graduate School of Information Science, Nara Institute of Science & Technology

More information

Speaker Identification by Comparison of Smart Methods. Abstract

Speaker Identification by Comparison of Smart Methods. Abstract Journal of mathematics and computer science 10 (2014), 61-71 Speaker Identification by Comparison of Smart Methods Ali Mahdavi Meimand Amin Asadi Majid Mohamadi Department of Electrical Department of Computer

More information

Mandarin Lexical Tone Recognition: The Gating Paradigm

Mandarin Lexical Tone Recognition: The Gating Paradigm Kansas Working Papers in Linguistics, Vol. 0 (008), p. 8 Abstract Mandarin Lexical Tone Recognition: The Gating Paradigm Yuwen Lai and Jie Zhang University of Kansas Research on spoken word recognition

More information

Specification of the Verity Learning Companion and Self-Assessment Tool

Specification of the Verity Learning Companion and Self-Assessment Tool Specification of the Verity Learning Companion and Self-Assessment Tool Sergiu Dascalu* Daniela Saru** Ryan Simpson* Justin Bradley* Eva Sarwar* Joohoon Oh* * Department of Computer Science ** Dept. of

More information

LEGO MINDSTORMS Education EV3 Coding Activities

LEGO MINDSTORMS Education EV3 Coding Activities LEGO MINDSTORMS Education EV3 Coding Activities s t e e h s k r o W t n e d Stu LEGOeducation.com/MINDSTORMS Contents ACTIVITY 1 Performing a Three Point Turn 3-6 ACTIVITY 2 Written Instructions for a

More information

CEFR Overall Illustrative English Proficiency Scales

CEFR Overall Illustrative English Proficiency Scales CEFR Overall Illustrative English Proficiency s CEFR CEFR OVERALL ORAL PRODUCTION Has a good command of idiomatic expressions and colloquialisms with awareness of connotative levels of meaning. Can convey

More information

Speaker recognition using universal background model on YOHO database

Speaker recognition using universal background model on YOHO database Aalborg University Master Thesis project Speaker recognition using universal background model on YOHO database Author: Alexandre Majetniak Supervisor: Zheng-Hua Tan May 31, 2011 The Faculties of Engineering,

More information

WHEN THERE IS A mismatch between the acoustic

WHEN THERE IS A mismatch between the acoustic 808 IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 14, NO. 3, MAY 2006 Optimization of Temporal Filters for Constructing Robust Features in Speech Recognition Jeih-Weih Hung, Member,

More information

Unvoiced Landmark Detection for Segment-based Mandarin Continuous Speech Recognition

Unvoiced Landmark Detection for Segment-based Mandarin Continuous Speech Recognition Unvoiced Landmark Detection for Segment-based Mandarin Continuous Speech Recognition Hua Zhang, Yun Tang, Wenju Liu and Bo Xu National Laboratory of Pattern Recognition Institute of Automation, Chinese

More information

Individual Component Checklist L I S T E N I N G. for use with ONE task ENGLISH VERSION

Individual Component Checklist L I S T E N I N G. for use with ONE task ENGLISH VERSION L I S T E N I N G Individual Component Checklist for use with ONE task ENGLISH VERSION INTRODUCTION This checklist has been designed for use as a practical tool for describing ONE TASK in a test of listening.

More information

MOODLE 2.0 GLOSSARY TUTORIALS

MOODLE 2.0 GLOSSARY TUTORIALS BEGINNING TUTORIALS SECTION 1 TUTORIAL OVERVIEW MOODLE 2.0 GLOSSARY TUTORIALS The glossary activity module enables participants to create and maintain a list of definitions, like a dictionary, or to collect

More information

21st Century Community Learning Center

21st Century Community Learning Center 21st Century Community Learning Center Grant Overview This Request for Proposal (RFP) is designed to distribute funds to qualified applicants pursuant to Title IV, Part B, of the Elementary and Secondary

More information

Body-Conducted Speech Recognition and its Application to Speech Support System

Body-Conducted Speech Recognition and its Application to Speech Support System Body-Conducted Speech Recognition and its Application to Speech Support System 4 Shunsuke Ishimitsu Hiroshima City University Japan 1. Introduction In recent years, speech recognition systems have been

More information

Document number: 2013/ Programs Committee 6/2014 (July) Agenda Item 42.0 Bachelor of Engineering with Honours in Software Engineering

Document number: 2013/ Programs Committee 6/2014 (July) Agenda Item 42.0 Bachelor of Engineering with Honours in Software Engineering Document number: 2013/0006139 Programs Committee 6/2014 (July) Agenda Item 42.0 Bachelor of Engineering with Honours in Software Engineering Program Learning Outcomes Threshold Learning Outcomes for Engineering

More information

Learning Methods for Fuzzy Systems

Learning Methods for Fuzzy Systems Learning Methods for Fuzzy Systems Rudolf Kruse and Andreas Nürnberger Department of Computer Science, University of Magdeburg Universitätsplatz, D-396 Magdeburg, Germany Phone : +49.39.67.876, Fax : +49.39.67.8

More information

Data Fusion Models in WSNs: Comparison and Analysis

Data Fusion Models in WSNs: Comparison and Analysis Proceedings of 2014 Zone 1 Conference of the American Society for Engineering Education (ASEE Zone 1) Data Fusion s in WSNs: Comparison and Analysis Marwah M Almasri, and Khaled M Elleithy, Senior Member,

More information

QuickStroke: An Incremental On-line Chinese Handwriting Recognition System

QuickStroke: An Incremental On-line Chinese Handwriting Recognition System QuickStroke: An Incremental On-line Chinese Handwriting Recognition System Nada P. Matić John C. Platt Λ Tony Wang y Synaptics, Inc. 2381 Bering Drive San Jose, CA 95131, USA Abstract This paper presents

More information

SECTION 12 E-Learning (CBT) Delivery Module

SECTION 12 E-Learning (CBT) Delivery Module SECTION 12 E-Learning (CBT) Delivery Module Linking a CBT package (file or URL) to an item of Set Training 2 Linking an active Redkite Question Master assessment 2 to the end of a CBT package Removing

More information

Adler Graduate School

Adler Graduate School Adler Graduate School Richfield, Minnesota AGS Course 500 Principles of Research 1. Course Designation and Identifier 1.1 Adler Graduate School 1.2 Course Number: 500 1.3 Research 1.4 Three (3) credits

More information

Improved Effects of Word-Retrieval Treatments Subsequent to Addition of the Orthographic Form

Improved Effects of Word-Retrieval Treatments Subsequent to Addition of the Orthographic Form Orthographic Form 1 Improved Effects of Word-Retrieval Treatments Subsequent to Addition of the Orthographic Form The development and testing of word-retrieval treatments for aphasia has generally focused

More information

Robot manipulations and development of spatial imagery

Robot manipulations and development of spatial imagery Robot manipulations and development of spatial imagery Author: Igor M. Verner, Technion Israel Institute of Technology, Haifa, 32000, ISRAEL ttrigor@tx.technion.ac.il Abstract This paper considers spatial

More information

On the Formation of Phoneme Categories in DNN Acoustic Models

On the Formation of Phoneme Categories in DNN Acoustic Models On the Formation of Phoneme Categories in DNN Acoustic Models Tasha Nagamine Department of Electrical Engineering, Columbia University T. Nagamine Motivation Large performance gap between humans and state-

More information

The NICT/ATR speech synthesis system for the Blizzard Challenge 2008

The NICT/ATR speech synthesis system for the Blizzard Challenge 2008 The NICT/ATR speech synthesis system for the Blizzard Challenge 2008 Ranniery Maia 1,2, Jinfu Ni 1,2, Shinsuke Sakai 1,2, Tomoki Toda 1,3, Keiichi Tokuda 1,4 Tohru Shimizu 1,2, Satoshi Nakamura 1,2 1 National

More information

Stimulating Techniques in Micro Teaching. Puan Ng Swee Teng Ketua Program Kursus Lanjutan U48 Kolej Sains Kesihatan Bersekutu, SAS, Ulu Kinta

Stimulating Techniques in Micro Teaching. Puan Ng Swee Teng Ketua Program Kursus Lanjutan U48 Kolej Sains Kesihatan Bersekutu, SAS, Ulu Kinta Stimulating Techniques in Micro Teaching Puan Ng Swee Teng Ketua Program Kursus Lanjutan U48 Kolej Sains Kesihatan Bersekutu, SAS, Ulu Kinta Learning Objectives General Objectives: At the end of the 2

More information

Chamilo 2.0: A Second Generation Open Source E-learning and Collaboration Platform

Chamilo 2.0: A Second Generation Open Source E-learning and Collaboration Platform Chamilo 2.0: A Second Generation Open Source E-learning and Collaboration Platform doi:10.3991/ijac.v3i3.1364 Jean-Marie Maes University College Ghent, Ghent, Belgium Abstract Dokeos used to be one of

More information

Unit purpose and aim. Level: 3 Sub-level: Unit 315 Credit value: 6 Guided learning hours: 50

Unit purpose and aim. Level: 3 Sub-level: Unit 315 Credit value: 6 Guided learning hours: 50 Unit Title: Game design concepts Level: 3 Sub-level: Unit 315 Credit value: 6 Guided learning hours: 50 Unit purpose and aim This unit helps learners to familiarise themselves with the more advanced aspects

More information

Speech Recognition using Acoustic Landmarks and Binary Phonetic Feature Classifiers

Speech Recognition using Acoustic Landmarks and Binary Phonetic Feature Classifiers Speech Recognition using Acoustic Landmarks and Binary Phonetic Feature Classifiers October 31, 2003 Amit Juneja Department of Electrical and Computer Engineering University of Maryland, College Park,

More information

BOOK INFORMATION SHEET. For all industries including Versions 4 to x 196 x 20 mm 300 x 209 x 20 mm 0.7 kg 1.1kg

BOOK INFORMATION SHEET. For all industries including Versions 4 to x 196 x 20 mm 300 x 209 x 20 mm 0.7 kg 1.1kg BOOK INFORMATION SHEET TITLE & Project Planning & Control Using Primavera P6 TM SUBTITLE PUBLICATION DATE 6 May 2010 NAME OF AUTHOR Paul E Harris ISBN s 978-1-921059-33-9 978-1-921059-34-6 BINDING B5 A4

More information

How to make successful presentations in English Part 2

How to make successful presentations in English Part 2 Young Researchers Seminar 2013 Young Researchers Seminar 2011 Lyon, France, June 5-7, 2013 DTU, Denmark, June 8-10, 2011 How to make successful presentations in English Part 2 Witold Olpiński PRESENTATION

More information

Rhythm-typology revisited.

Rhythm-typology revisited. DFG Project BA 737/1: "Cross-language and individual differences in the production and perception of syllabic prominence. Rhythm-typology revisited." Rhythm-typology revisited. B. Andreeva & W. Barry Jacques

More information

Modern TTS systems. CS 294-5: Statistical Natural Language Processing. Types of Modern Synthesis. TTS Architecture. Text Normalization

Modern TTS systems. CS 294-5: Statistical Natural Language Processing. Types of Modern Synthesis. TTS Architecture. Text Normalization CS 294-5: Statistical Natural Language Processing Speech Synthesis Lecture 22: 12/4/05 Modern TTS systems 1960 s first full TTS Umeda et al (1968) 1970 s Joe Olive 1977 concatenation of linearprediction

More information

Online Marking of Essay-type Assignments

Online Marking of Essay-type Assignments Online Marking of Essay-type Assignments Eva Heinrich, Yuanzhi Wang Institute of Information Sciences and Technology Massey University Palmerston North, New Zealand E.Heinrich@massey.ac.nz, yuanzhi_wang@yahoo.com

More information

Cross Language Information Retrieval

Cross Language Information Retrieval Cross Language Information Retrieval RAFFAELLA BERNARDI UNIVERSITÀ DEGLI STUDI DI TRENTO P.ZZA VENEZIA, ROOM: 2.05, E-MAIL: BERNARDI@DISI.UNITN.IT Contents 1 Acknowledgment.............................................

More information

Modeling function word errors in DNN-HMM based LVCSR systems

Modeling function word errors in DNN-HMM based LVCSR systems Modeling function word errors in DNN-HMM based LVCSR systems Melvin Jose Johnson Premkumar, Ankur Bapna and Sree Avinash Parchuri Department of Computer Science Department of Electrical Engineering Stanford

More information

Circuit Simulators: A Revolutionary E-Learning Platform

Circuit Simulators: A Revolutionary E-Learning Platform Circuit Simulators: A Revolutionary E-Learning Platform Mahi Itagi Padre Conceicao College of Engineering, Verna, Goa, India. itagimahi@gmail.com Akhil Deshpande Gogte Institute of Technology, Udyambag,

More information

Proceedings of Meetings on Acoustics

Proceedings of Meetings on Acoustics Proceedings of Meetings on Acoustics Volume 19, 2013 http://acousticalsociety.org/ ICA 2013 Montreal Montreal, Canada 2-7 June 2013 Speech Communication Session 2aSC: Linking Perception and Production

More information

Phonological and Phonetic Representations: The Case of Neutralization

Phonological and Phonetic Representations: The Case of Neutralization Phonological and Phonetic Representations: The Case of Neutralization Allard Jongman University of Kansas 1. Introduction The present paper focuses on the phenomenon of phonological neutralization to consider

More information

1 Use complex features of a word processing application to a given brief. 2 Create a complex document. 3 Collaborate on a complex document.

1 Use complex features of a word processing application to a given brief. 2 Create a complex document. 3 Collaborate on a complex document. National Unit specification General information Unit code: HA6M 46 Superclass: CD Publication date: May 2016 Source: Scottish Qualifications Authority Version: 02 Unit purpose This Unit is designed to

More information

Eyebrows in French talk-in-interaction

Eyebrows in French talk-in-interaction Eyebrows in French talk-in-interaction Aurélie Goujon 1, Roxane Bertrand 1, Marion Tellier 1 1 Aix Marseille Université, CNRS, LPL UMR 7309, 13100, Aix-en-Provence, France Goujon.aurelie@gmail.com Roxane.bertrand@lpl-aix.fr

More information

Application of Virtual Instruments (VIs) for an enhanced learning environment

Application of Virtual Instruments (VIs) for an enhanced learning environment Application of Virtual Instruments (VIs) for an enhanced learning environment Philip Smyth, Dermot Brabazon, Eilish McLoughlin Schools of Mechanical and Physical Sciences Dublin City University Ireland

More information

Introduction to Moodle

Introduction to Moodle Center for Excellence in Teaching and Learning Mr. Philip Daoud Introduction to Moodle Beginner s guide Center for Excellence in Teaching and Learning / Teaching Resource This manual is part of a serious

More information

Segregation of Unvoiced Speech from Nonspeech Interference

Segregation of Unvoiced Speech from Nonspeech Interference Technical Report OSU-CISRC-8/7-TR63 Department of Computer Science and Engineering The Ohio State University Columbus, OH 4321-1277 FTP site: ftp.cse.ohio-state.edu Login: anonymous Directory: pub/tech-report/27

More information

Major Milestones, Team Activities, and Individual Deliverables

Major Milestones, Team Activities, and Individual Deliverables Major Milestones, Team Activities, and Individual Deliverables Milestone #1: Team Semester Proposal Your team should write a proposal that describes project objectives, existing relevant technology, engineering

More information

Appendix L: Online Testing Highlights and Script

Appendix L: Online Testing Highlights and Script Online Testing Highlights and Script for Fall 2017 Ohio s State Tests Administrations Test administrators must use this document when administering Ohio s State Tests online. It includes step-by-step directions,

More information

Speech Segmentation Using Probabilistic Phonetic Feature Hierarchy and Support Vector Machines

Speech Segmentation Using Probabilistic Phonetic Feature Hierarchy and Support Vector Machines Speech Segmentation Using Probabilistic Phonetic Feature Hierarchy and Support Vector Machines Amit Juneja and Carol Espy-Wilson Department of Electrical and Computer Engineering University of Maryland,

More information

BODY LANGUAGE ANIMATION SYNTHESIS FROM PROSODY AN HONORS THESIS SUBMITTED TO THE DEPARTMENT OF COMPUTER SCIENCE OF STANFORD UNIVERSITY

BODY LANGUAGE ANIMATION SYNTHESIS FROM PROSODY AN HONORS THESIS SUBMITTED TO THE DEPARTMENT OF COMPUTER SCIENCE OF STANFORD UNIVERSITY BODY LANGUAGE ANIMATION SYNTHESIS FROM PROSODY AN HONORS THESIS SUBMITTED TO THE DEPARTMENT OF COMPUTER SCIENCE OF STANFORD UNIVERSITY Sergey Levine Principal Adviser: Vladlen Koltun Secondary Adviser:

More information

Modeling function word errors in DNN-HMM based LVCSR systems

Modeling function word errors in DNN-HMM based LVCSR systems Modeling function word errors in DNN-HMM based LVCSR systems Melvin Jose Johnson Premkumar, Ankur Bapna and Sree Avinash Parchuri Department of Computer Science Department of Electrical Engineering Stanford

More information

2014 Free Spirit Publishing. All rights reserved.

2014 Free Spirit Publishing. All rights reserved. Elizabeth Verdick Illustrated by Marieka Heinlen Text copyright 2004 by Elizabeth Verdick Illustrations copyright 2004 by Marieka Heinlen All rights reserved under International and Pan-American Copyright

More information

Perceptual scaling of voice identity: common dimensions for different vowels and speakers

Perceptual scaling of voice identity: common dimensions for different vowels and speakers DOI 10.1007/s00426-008-0185-z ORIGINAL ARTICLE Perceptual scaling of voice identity: common dimensions for different vowels and speakers Oliver Baumann Æ Pascal Belin Received: 15 February 2008 / Accepted:

More information

Understanding and Supporting Dyslexia Godstone Village School. January 2017

Understanding and Supporting Dyslexia Godstone Village School. January 2017 Understanding and Supporting Dyslexia Godstone Village School January 2017 By then end of the session I will: Have a greater understanding of Dyslexia and the ways in which children can be affected by

More information

Different Requirements Gathering Techniques and Issues. Javaria Mushtaq

Different Requirements Gathering Techniques and Issues. Javaria Mushtaq 835 Different Requirements Gathering Techniques and Issues Javaria Mushtaq Abstract- Project management is now becoming a very important part of our software industries. To handle projects with success

More information

Consonants: articulation and transcription

Consonants: articulation and transcription Phonology 1: Handout January 20, 2005 Consonants: articulation and transcription 1 Orientation phonetics [G. Phonetik]: the study of the physical and physiological aspects of human sound production and

More information

Level 6. Higher Education Funding Council for England (HEFCE) Fee for 2017/18 is 9,250*

Level 6. Higher Education Funding Council for England (HEFCE) Fee for 2017/18 is 9,250* Programme Specification: Undergraduate For students starting in Academic Year 2017/2018 1. Course Summary Names of programme(s) and award title(s) Award type Mode of study Framework of Higher Education

More information

Atypical Prosodic Structure as an Indicator of Reading Level and Text Difficulty

Atypical Prosodic Structure as an Indicator of Reading Level and Text Difficulty Atypical Prosodic Structure as an Indicator of Reading Level and Text Difficulty Julie Medero and Mari Ostendorf Electrical Engineering Department University of Washington Seattle, WA 98195 USA {jmedero,ostendor}@uw.edu

More information

Role of Pausing in Text-to-Speech Synthesis for Simultaneous Interpretation

Role of Pausing in Text-to-Speech Synthesis for Simultaneous Interpretation Role of Pausing in Text-to-Speech Synthesis for Simultaneous Interpretation Vivek Kumar Rangarajan Sridhar, John Chen, Srinivas Bangalore, Alistair Conkie AT&T abs - Research 180 Park Avenue, Florham Park,

More information

Guidelines for blind and partially sighted candidates

Guidelines for blind and partially sighted candidates Revised August 2006 Guidelines for blind and partially sighted candidates Our policy In addition to the specific provisions described below, we are happy to consider each person individually if their needs

More information

OPTIMIZATINON OF TRAINING SETS FOR HEBBIAN-LEARNING- BASED CLASSIFIERS

OPTIMIZATINON OF TRAINING SETS FOR HEBBIAN-LEARNING- BASED CLASSIFIERS OPTIMIZATINON OF TRAINING SETS FOR HEBBIAN-LEARNING- BASED CLASSIFIERS Václav Kocian, Eva Volná, Michal Janošek, Martin Kotyrba University of Ostrava Department of Informatics and Computers Dvořákova 7,

More information

AGENDA LEARNING THEORIES LEARNING THEORIES. Advanced Learning Theories 2/22/2016

AGENDA LEARNING THEORIES LEARNING THEORIES. Advanced Learning Theories 2/22/2016 AGENDA Advanced Learning Theories Alejandra J. Magana, Ph.D. admagana@purdue.edu Introduction to Learning Theories Role of Learning Theories and Frameworks Learning Design Research Design Dual Coding Theory

More information

Field Experience Management 2011 Training Guides

Field Experience Management 2011 Training Guides Field Experience Management 2011 Training Guides Page 1 of 40 Contents Introduction... 3 Helpful Resources Available on the LiveText Conference Visitors Pass... 3 Overview... 5 Development Model for FEM...

More information

Houghton Mifflin Online Assessment System Walkthrough Guide

Houghton Mifflin Online Assessment System Walkthrough Guide Houghton Mifflin Online Assessment System Walkthrough Guide Page 1 Copyright 2007 by Houghton Mifflin Company. All Rights Reserved. No part of this document may be reproduced or transmitted in any form

More information

THE MULTIVOC TEXT-TO-SPEECH SYSTEM

THE MULTIVOC TEXT-TO-SPEECH SYSTEM THE MULTVOC TEXT-TO-SPEECH SYSTEM Olivier M. Emorine and Pierre M. Martin Cap Sogeti nnovation Grenoble Research Center Avenue du Vieux Chene, ZRST 38240 Meylan, FRANCE ABSTRACT n this paper we introduce

More information

The Perception of Nasalized Vowels in American English: An Investigation of On-line Use of Vowel Nasalization in Lexical Access

The Perception of Nasalized Vowels in American English: An Investigation of On-line Use of Vowel Nasalization in Lexical Access The Perception of Nasalized Vowels in American English: An Investigation of On-line Use of Vowel Nasalization in Lexical Access Joyce McDonough 1, Heike Lenhert-LeHouiller 1, Neil Bardhan 2 1 Linguistics

More information

Noise-Adaptive Perceptual Weighting in the AMR-WB Encoder for Increased Speech Loudness in Adverse Far-End Noise Conditions

Noise-Adaptive Perceptual Weighting in the AMR-WB Encoder for Increased Speech Loudness in Adverse Far-End Noise Conditions 26 24th European Signal Processing Conference (EUSIPCO) Noise-Adaptive Perceptual Weighting in the AMR-WB Encoder for Increased Speech Loudness in Adverse Far-End Noise Conditions Emma Jokinen Department

More information

MULTIMEDIA Motion Graphics for Multimedia

MULTIMEDIA Motion Graphics for Multimedia MULTIMEDIA 210 - Motion Graphics for Multimedia INTRODUCTION Welcome to Digital Editing! The main purpose of this course is to introduce you to the basic principles of motion graphics editing for multimedia

More information

Phonological Processing for Urdu Text to Speech System

Phonological Processing for Urdu Text to Speech System Phonological Processing for Urdu Text to Speech System Sarmad Hussain Center for Research in Urdu Language Processing, National University of Computer and Emerging Sciences, B Block, Faisal Town, Lahore,

More information

White Paper. The Art of Learning

White Paper. The Art of Learning The Art of Learning Based upon years of observation of adult learners in both our face-to-face classroom courses and using our Mentored Email 1 distance learning methodology, it is fascinating to see how

More information

Arabic Orthography vs. Arabic OCR

Arabic Orthography vs. Arabic OCR Arabic Orthography vs. Arabic OCR Rich Heritage Challenging A Much Needed Technology Mohamed Attia Having consistently been spoken since more than 2000 years and on, Arabic is doubtlessly the oldest among

More information

Multimedia Application Effective Support of Education

Multimedia Application Effective Support of Education Multimedia Application Effective Support of Education Eva Milková Faculty of Science, University od Hradec Králové, Hradec Králové, Czech Republic eva.mikova@uhk.cz Abstract Multimedia applications have

More information

READ 180 Next Generation Software Manual

READ 180 Next Generation Software Manual READ 180 Next Generation Software Manual including ereads For use with READ 180 Next Generation version 2.3 and Scholastic Achievement Manager version 2.3 or higher Copyright 2014 by Scholastic Inc. All

More information

Web as Corpus. Corpus Linguistics. Web as Corpus 1 / 1. Corpus Linguistics. Web as Corpus. web.pl 3 / 1. Sketch Engine. Corpus Linguistics

Web as Corpus. Corpus Linguistics. Web as Corpus 1 / 1. Corpus Linguistics. Web as Corpus. web.pl 3 / 1. Sketch Engine. Corpus Linguistics (L615) Markus Dickinson Department of Linguistics, Indiana University Spring 2013 The web provides new opportunities for gathering data Viable source of disposable corpora, built ad hoc for specific purposes

More information

UNIDIRECTIONAL LONG SHORT-TERM MEMORY RECURRENT NEURAL NETWORK WITH RECURRENT OUTPUT LAYER FOR LOW-LATENCY SPEECH SYNTHESIS. Heiga Zen, Haşim Sak

UNIDIRECTIONAL LONG SHORT-TERM MEMORY RECURRENT NEURAL NETWORK WITH RECURRENT OUTPUT LAYER FOR LOW-LATENCY SPEECH SYNTHESIS. Heiga Zen, Haşim Sak UNIDIRECTIONAL LONG SHORT-TERM MEMORY RECURRENT NEURAL NETWORK WITH RECURRENT OUTPUT LAYER FOR LOW-LATENCY SPEECH SYNTHESIS Heiga Zen, Haşim Sak Google fheigazen,hasimg@google.com ABSTRACT Long short-term

More information

A Neural Network GUI Tested on Text-To-Phoneme Mapping

A Neural Network GUI Tested on Text-To-Phoneme Mapping A Neural Network GUI Tested on Text-To-Phoneme Mapping MAARTEN TROMPPER Universiteit Utrecht m.f.a.trompper@students.uu.nl Abstract Text-to-phoneme (T2P) mapping is a necessary step in any speech synthesis

More information

BENCHMARKING OF FREE AUTHORING TOOLS FOR MULTIMEDIA COURSES DEVELOPMENT

BENCHMARKING OF FREE AUTHORING TOOLS FOR MULTIMEDIA COURSES DEVELOPMENT 36 Acta Electrotechnica et Informatica, Vol. 11, No. 3, 2011, 36 41, DOI: 10.2478/v10198-011-0033-8 BENCHMARKING OF FREE AUTHORING TOOLS FOR MULTIMEDIA COURSES DEVELOPMENT Peter KOŠČ *, Mária GAMCOVÁ **,

More information

Intra-talker Variation: Audience Design Factors Affecting Lexical Selections

Intra-talker Variation: Audience Design Factors Affecting Lexical Selections Tyler Perrachione LING 451-0 Proseminar in Sound Structure Prof. A. Bradlow 17 March 2006 Intra-talker Variation: Audience Design Factors Affecting Lexical Selections Abstract Although the acoustic and

More information

Stimulation for Interaction. 1. Is your character old or young? He/She is old/young/in-between OR a child/a teenager/a grown-up/an old person

Stimulation for Interaction. 1. Is your character old or young? He/She is old/young/in-between OR a child/a teenager/a grown-up/an old person Appendices for Sample Assessment Tasks (Part A) Appendi 1 Stimulation for Interaction Tell me about an interesting character in your book: 1. Is your character old or young? He/She is old/young/in-between

More information

Backwards Numbers: A Study of Place Value. Catherine Perez

Backwards Numbers: A Study of Place Value. Catherine Perez Backwards Numbers: A Study of Place Value Catherine Perez Introduction I was reaching for my daily math sheet that my school has elected to use and in big bold letters in a box it said: TO ADD NUMBERS

More information

PRAAT ON THE WEB AN UPGRADE OF PRAAT FOR SEMI-AUTOMATIC SPEECH ANNOTATION

PRAAT ON THE WEB AN UPGRADE OF PRAAT FOR SEMI-AUTOMATIC SPEECH ANNOTATION PRAAT ON THE WEB AN UPGRADE OF PRAAT FOR SEMI-AUTOMATIC SPEECH ANNOTATION SUMMARY 1. Motivation 2. Praat Software & Format 3. Extended Praat 4. Prosody Tagger 5. Demo 6. Conclusions What s the story behind?

More information

MULTILINGUAL INFORMATION ACCESS IN DIGITAL LIBRARY

MULTILINGUAL INFORMATION ACCESS IN DIGITAL LIBRARY MULTILINGUAL INFORMATION ACCESS IN DIGITAL LIBRARY Chen, Hsin-Hsi Department of Computer Science and Information Engineering National Taiwan University Taipei, Taiwan E-mail: hh_chen@csie.ntu.edu.tw Abstract

More information