Resources Author's for Indian copylanguages

Size: px

Start display at page:

Download "Resources Author's for Indian copylanguages"

Derek Black
6 years ago
Views:

1 1/ 23 Resources for Indian languages Arun Baby, Anju Leela Thomas, Nishanthi N L, and TTS Consortium Indian Institute of Technology Madras, India September 12, 2016

2 Roadmap Outline The need for Indian language corpora Introduction Data collection Text selection and correction Speaker selection Recording Summary of the text corpus Voice building Common Label Set Parsing and unified parser Hybrid segmentation Pruning HTS Android applications Conclusion and future work Acknowledgement References 2/ 23

3 The need for Indian languages corpora 3/ 23 The amount of work in speech domain for Indian languages is comparatively lower than that of other languages A database of speech audio files and corresponding text transcriptions Consortium effort

4 Introduction 4/ 23 Creating a corpus for Indian languages is a time taking process Mainly because of its diversity and lack of resources An initiative was taken by DeiTY, Ministry of Information Technology, India to sponsor the development of TTS in regional languages Two voices for each language(male and female) are recorded 40 hours of data per language is collected

5 Data collection 5/ 23 Text selection and correction Speaker selection Recording Summary of the text corpus

6 Text selection and correction 6/ 23 Text in various Indian languages are collected from newspapers, websites, blogs, etc with the help of web crawlers Text from different domains like children stories, literature, science, tourism, etc was also collected manually Manual correction to get rid of transcription errors (if any) Chosen text is easy to read, covers the most commonly used words and phrases in a language and has maximum syllable coverage

7 Speaker selection 7/ 23 2 voice talents (1 male and 1 female) are selected Single speaker data limits the variations and change in voice quality Voice which seems pleasant to listen, as well as amenable to signal processing is chosen

8 Recording 8/ 23 Carried out in a special environment which is free from noise and echo Done by professional speakers(male and female) to maintain constant pitch and prevent stress phenomenon To avoid the fatigue of the speaker, a break is given every 45 minutes The recorded sentences are split at the sentence level Type of recording is mono, with a sampling rate of 48KHz and the number of bits per sample is 16

9 Summary of the text corpus 9/ 23 Table 1 : Summary of the corpus Female Male Languages English Mono English Mono Duration in hours Assamese Number of words Number of sentences Duration in hours Bengali Number of words Number of sentences Duration in hours Bodo Number of words Number of sentences Duration in hours Gujarati Number of words Number of sentences Duration in hours Hindi Number of words Number of sentences Duration in hours Kannada Number of words NumberThis ofmay sentences not be the final version

10 Summary of the text corpus 10/ 23 Table 2 : Summary of the corpus Languages Female Male English Mono English Mono Duration in hours Malayalam Number of words Number of sentences Duration in hours Manipuri Number of words Number of sentences Duration in hours Marathi Number of words Number of sentences Duration in hours Odia Number of words Number of sentences Duration in hours Rajasthani Number of words Number of sentences Duration in hours Tamil Number of words Number of sentences Duration in hours Telugu Number of words NumberThis ofmay sentences not be the final version

11 Voice building 11/ 23 Common Label Set Parsing and unified parser Hybrid segmentation Pruning HTS

12 Common Label Set Capitalizes on the acoustic similarity of Indian languages 1 Standardized representation for phonemes across different Indian languages Devised using the Latin-1 script 1 B Ramani, S Lilly Christina, G Anushiya Rachel, V Sherlin Solomi, Mahesh Kumar Nandwana, Anusha Prakash, S Aswin Shanmugam, This may Raghava not be the final Krishnan, version. S Kishore, K Samudravijaya, et al. A common attribute based unified hts framework for speech synthesis in Indian languages. In 8th ISCA Workshop on Speech Synthesis, pages , / 23

13 Parsing and unified parser Traditional parsing approach uses the respective language s rules to parse the word into corresponding phones 2 Unified approach uses the generic language structure of Indian languages Unify the languages based on the Common Label Set Converts UTF-8 text to Common Label Set, applies letter-to-sound rules and generates the corresponding phoneme sequences 2 Arun Baby, Nishanthi N L, Anju This may Leela not be Thomas, the final version. and Hema A Murthy. A unified parser for developing indian language text to speech synthesizers. In International Conference on Text, Speech and Dialogue. Springer, / 23

14 Hybrid segmentation Manual correction is a monotonous task Flat-start initialization of monophone HMMs, Embedded reestimation and Forced-Viterbi alignment are the three steps used in conventional segmentation This model does not indicate the boundary positions Use of short term energy (STE) as a measure to determine the syllable boundaries 3 Boundaries of the syllables are corrected with group delay and spectral flux 3 S Aswin Shanmugam and Hema Murthy. A hybrid approach to segmentation of speech using group delay processing and hmm based embedded reestimation. presentation in INTERSPEECH, / 23

15 Pruning Process of discarding badly segmented units 4 Duration, average f0 and STE are the cues taken into consideration Helps in the correction of segmentation errors and also in maintaining acoustic continuity in the voice 4 K Raghava Krishnan. Prosodic This analysis may not be of the Indian final version. languages and its application to text to speech synthesis. M S Thesis, Department of Electrical Engineering, IIT Madras, India, July / 23

16 HTS A statistical parametric approach 5 Parametric representation of speech by extracting the spectral and excitation features from the database 5 Keiichi Tokuda, Takayoshi Yoshimura, Takashi Masuko, Takao Kobayashi, and Tadashi Kitamura. Speech parameter generation This may not algorithms be the final version. for hmm-based speech synthesis. In Acoustics, Speech, and Signal Processing, ICASSP00. Proceed- ings IEEE International Conference on, volume 3, pages IEEE, / 23

17 Android applications 17/ 23 Three Android applications were developed 6 Tamil TTS app - for Tamil text-to-speech synthesis Hindi TTS app - for Hindi text-to-speech synthesis Indic TTS app - for text-to-speech synthesis of 13 Indian languages Apps are available for download in the Indic TTS website 6 IIT Madras. Indic tts - android apps. androidapp.php.

18 Conclusion and future work 18/ 23 The data is hosted on the web Available to all groups of people working for corpus generation and research activities. Data is still being collected

19 Download statistics 19/ 23 Download statistics (as on 12th September,2016) Figure 1 : Download statistics

20 Acknowledgement 20/ 23 Funded by Department of Information Technology, Ministry of Communication and Technology, Government of India Figure 2 : Consortium members

21 References 21/ 23 IIT Madras. Indic tts. SS Agrawal, Sunita Arora, and Karunesh Arora. Towards design, development and standardization of speech corpora for developing Indian language tts system. COCOSDA-2005, Dec, pages 68, 2005 Arun Baby, Nishanthi N L, Anju Leela Thomas, and Hema A Murthy. A unified parser for developing Indian language text to speech synthesizers. In International Conference on Text, Speech and Dialogue. Springer, 2016 S Aswin Shanmugam and Hema Murthy. A hybrid approach to segmentation of speech using group delay processing and hmm based embedded reestimation. presentation in INTERSPEECH, 2014

22 Questions 22/ 23 Questions???

23 Thank you 23/ 23 Thank you

The NICT/ATR speech synthesis system for the Blizzard Challenge 2008

The NICT/ATR speech synthesis system for the Blizzard Challenge 2008 Ranniery Maia 1,2, Jinfu Ni 1,2, Shinsuke Sakai 1,2, Tomoki Toda 1,3, Keiichi Tokuda 1,4 Tohru Shimizu 1,2, Satoshi Nakamura 1,2 1 National