Prosody, Phonology and Phonetics

Prosody, Phonology and Phonetics Series Editors Daniel J. Hirst CNRS Laboratoire Parole et Langage, Aix-en-Provence, France Qiuwu Ma School of Foreign Languages, Tongji University, Shanghai, China Hongwei Ding School of Foreign Languages, Tongji University, Shanghai, China

The series will publish studies in the general area of Speech Prosody with a particular (but non-exclusive) focus on the importance of phonetics and phonology in this field. The topic of speech prosody is today a far larger area of research than is often realised. The number of papers on the topic presented at large international conferences such as Interspeech and ICPhS is considerable and regularly increasing. The proposed book series would be the natural place to publish extended versions of papers presented at the Speech Prosody Conferences, in particular the papers presented in Special Sessions at the conference. This could potentially involve the publication of 3 or 4 volumes every two years ensuring a stable future for the book series. If such publications are produced fairly rapidly, they will in turn provide a strong incentive for the organisation of other special sessions at future Speech Prosody conferences. More information about this series at http://www.springer.com/series/11951

Keikichi Hirose Jianhua Tao Editors Speech Prosody in Speech Synthesis: Modeling and generation of prosody for high quality and flexible speech synthesis 2123

Editors Keikichi Hirose Graduate School of Information Science and Technology University of Tokyo Tokyo Japan Jianhua Tao Institute of Automation Chinese Academy of Sciences Beijing China ISSN 2197-8700 Prosody, Phonology and Phonetics ISBN 978-3-662-45257-8 DOI 10.1007/978-3-662-45258-5 ISSN 2197-8719 (electronic) ISBN 978-3-662-45258-5 (ebook) Library of Congress Control Number: 2014955166 Springer Berlin Heidelberg Dordrecht London Springer-Verlag Berlin Heidelberg 2015 This work is subject to copyright. All rights are reserved by the Publisher, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission or information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed. The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use. The publisher, the authors and the editors are safe to assume that the advice and information in this book are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or the editors give a warranty, express or implied, with respect to the material contained herein or for any errors or omissions that may have been made. Printed on acid-free paper Springer is part of Springer Science+Business Media (www.springer.com)

Contents Part I Modeling of Prosody 1 ProZed: A Speech Prosody Editor for Linguists, Using Analysis-by-Synthesis... 3 Daniel J. Hirst 2 Degrees of Freedom in Prosody Modeling... 19 Yi Xu and Santitham Prom-on 3 Extraction, Analysis and Synthesis of Fujisaki model Parameters... 35 Hansjörg Mixdorff 4 Probabilistic Modeling of Pitch Contours Toward Prosody Synthesis and Conversion... 49 Hirokazu Kameoka Part II Para- and Non-Linguistic Issues of Prosody 5 Communicative Speech Synthesis as Pan-Linguistic Prosody Control 73 Yoshinori Sagisaka and Yoko Greenberg 6 Mandarin Stress Analysis and Prediction for Speech Synthesis... 83 Ya Li and Jianhua Tao 7 Expressivity in Interactive Speech Synthesis; Some Paralinguistic and Nonlinguistic Issues of Speech Prosody for Conversational Dialogue Systems... 97 Nick Campbell and Ya Li 8 Temporally Variable Multi attribute Morphing of Arbitrarily Many Voices for Exploratory Research of Speech Prosody... 109 Hideki Kawahara v

vi Contents Part III Control of Prosody in Speech Synthesis 9 Statistical Models for Dealing with Discontinuity of Fundamental Frequency... 123 Kai Yu 10 Use of Generation Process Model for Improved Control of Fundamental Frequency Contours in HMM-Based Speech Synthesis... 145 Keikichi Hirose 11 Tone Nucleus Model for Emotional Mandarin Speech Synthesis... 161 Miaomiao Wang 12 Emphasis, Word Prominence, and Continuous Wavelet Transform in the Control of HMM-Based Synthesis... 173 Martti Vainio, Antti Suni and Daniel Aalto 13 Exploiting Alternatives for Text-To-Speech Synthesis: From Machine to Human... 189 Nicolas Obin, Christophe Veaux and Pierre Lanchantin 14 Prosody Control and Variation Enhancement Techniques for HMM-Based Expressive Speech Synthesis... 203 Takao Kobayashi

Contributors Daniel Aalto University of Helsinki, Helsinki, Finland Nick Campbell Trinity College Dublin, The University of Dublin, Dublin, Ireland Yoko Greenberg Waseda University, Tokyo, Japan Keikichi Hirose The University of Tokyo, Tokyo, Japan Daniel J. Hirst CNRS & Aix-Marseille University, Aix-en-Provence, France Tongji University, Shanghai, China Hirokazu Kameoka The University of Tokyo, Tokyo, Japan/NTT Communication Science Laboratories, Atsugi, Japan Hideki Kawahara Wakayama University, Wakayama, Japan Takao Kobayashi Tokyo Institute of Technology, Tokyo, Japan Pierre Lanchantin Cambridge University, Cambridge, UK Ya Li Institute ofautomation, ChineseAcademy of Sciences, Beijing, China/Trinity College Dublin, The University of Dublin, Dublin, Ireland Hansjörg Mixdorff Beuth-Hochschule für Technik Berlin, Berlin, Germany Nicolas Obin IRCAM, UMR STMS IRCAM-CNRS-UPMC, Paris, France Santitham Prom-on King Mongkut s University of Technology Thonburi, Thailand Yoshinori Sagisaka Waseda University, Tokyo, Japan Antti Suni University of Helsinki, Helsinki, Finland Jianhua Tao Institute ofautomation, ChineseAcademy of Sciences, Beijing, China Martti Vainio University of Helsinki, Helsinki, Finland Christophe Veaux Centre for Speech Technology Research, Edinburgh, UK vii

viii Contributors Miaomiao Wang Toshiba China R&D Center, Beijing, China Yi Xu University College London, London, UK Kai Yu Shanghai Jiao Tong University, Shanghai, China