Gender Classification by Speech Analysis

Similar documents
Analysis of Emotion Recognition System through Speech Signal Using KNN & GMM Classifier

AUTOMATIC DETECTION OF PROLONGED FRICATIVE PHONEMES WITH THE HIDDEN MARKOV MODELS APPROACH 1. INTRODUCTION

International Journal of Computational Intelligence and Informatics, Vol. 1 : No. 4, January - March 2012

Speech Emotion Recognition Using Support Vector Machine

Speech Segmentation Using Probabilistic Phonetic Feature Hierarchy and Support Vector Machines

Speech Recognition using Acoustic Landmarks and Binary Phonetic Feature Classifiers

Class-Discriminative Weighted Distortion Measure for VQ-Based Speaker Identification

Design Of An Automatic Speaker Recognition System Using MFCC, Vector Quantization And LBG Algorithm

A Comparison of DHMM and DTW for Isolated Digits Recognition System of Arabic Language

The Perception of Nasalized Vowels in American English: An Investigation of On-line Use of Vowel Nasalization in Lexical Access

Speaker Recognition. Speaker Diarization and Identification

Segregation of Unvoiced Speech from Nonspeech Interference

Mandarin Lexical Tone Recognition: The Gating Paradigm

Speech Recognition at ICSI: Broadcast News and beyond

Speaker recognition using universal background model on YOHO database

Quarterly Progress and Status Report. VCV-sequencies in a preliminary text-to-speech system for female speech

Unvoiced Landmark Detection for Segment-based Mandarin Continuous Speech Recognition

Human Emotion Recognition From Speech

COMPUTER INTERFACES FOR TEACHING THE NINTENDO GENERATION

Speaker Identification by Comparison of Smart Methods. Abstract

School of Innovative Technologies and Engineering

Learning Methods in Multilingual Speech Recognition

THE RECOGNITION OF SPEECH BY MACHINE

A student diagnosing and evaluation system for laboratory-based academic exercises

Lecture Notes in Artificial Intelligence 4343

Automatic segmentation of continuous speech using minimum phase group delay functions

Circuit Simulators: A Revolutionary E-Learning Platform

Switchboard Language Model Improvement with Conversational Data from Gigaword

Linking Task: Identifying authors and book titles in verbose queries

CS Machine Learning

ACTL5103 Stochastic Modelling For Actuaries. Course Outline Semester 2, 2014

GACE Computer Science Assessment Test at a Glance

On-Line Data Analytics

A study of speaker adaptation for DNN-based speech synthesis

STA 225: Introductory Statistics (CT)

Australian Journal of Basic and Applied Sciences

Software Development: Programming Paradigms (SCQF level 8)

Lecture 1: Machine Learning Basics

SINGLE DOCUMENT AUTOMATIC TEXT SUMMARIZATION USING TERM FREQUENCY-INVERSE DOCUMENT FREQUENCY (TF-IDF)

PROJECT MANAGEMENT AND COMMUNICATION SKILLS DEVELOPMENT STUDENTS PERCEPTION ON THEIR LEARNING

English Language and Applied Linguistics. Module Descriptions 2017/18

Learning From the Past with Experiment Databases

SARDNET: A Self-Organizing Feature Map for Sequences

Rule Learning With Negation: Issues Regarding Effectiveness

(Includes a Detailed Analysis of Responses to Overall Satisfaction and Quality of Academic Advising Items) By Steve Chatman

Modeling function word errors in DNN-HMM based LVCSR systems

OCR for Arabic using SIFT Descriptors With Online Failure Prediction

Experiments with SMS Translation and Stochastic Gradient Descent in Spanish Text Author Profiling

Role of Pausing in Text-to-Speech Synthesis for Simultaneous Interpretation

A Case Study: News Classification Based on Term Frequency

Rule Learning with Negation: Issues Regarding Effectiveness

An Empirical Analysis of the Effects of Mexican American Studies Participation on Student Achievement within Tucson Unified School District

Support Vector Machines for Speaker and Language Recognition

Using dialogue context to improve parsing performance in dialogue systems

Python Machine Learning

A Privacy-Sensitive Approach to Modeling Multi-Person Conversations

Grade 6: Correlated to AGS Basic Math Skills

STUDIES WITH FABRICATED SWITCHBOARD DATA: EXPLORING SOURCES OF MODEL-DATA MISMATCH

Brief Home-Based Data Collection of Low Frequency Behaviors

Measurement & Analysis in the Real World

MULTILINGUAL INFORMATION ACCESS IN DIGITAL LIBRARY

The Use of Statistical, Computational and Modelling Tools in Higher Learning Institutions: A Case Study of the University of Dodoma

VOL. 3, NO. 5, May 2012 ISSN Journal of Emerging Trends in Computing and Information Sciences CIS Journal. All rights reserved.

Proceedings of Meetings on Acoustics

System Implementation for SemEval-2017 Task 4 Subtask A Based on Interpolated Deep Neural Networks

Educational Attainment

Emporia State University Degree Works Training User Guide Advisor

P. Belsis, C. Sgouropoulou, K. Sfikas, G. Pantziou, C. Skourlas, J. Varnas

Chinese Language Parsing with Maximum-Entropy-Inspired Parser

WHEN THERE IS A mismatch between the acoustic

International Journal of Advanced Networking Applications (IJANA) ISSN No. :

Entrepreneurial Discovery and the Demmert/Klein Experiment: Additional Evidence from Germany

Speech Translation for Triage of Emergency Phonecalls in Minority Languages

Applications of data mining algorithms to analysis of medical data

Probability and Statistics Curriculum Pacing Guide

Web as Corpus. Corpus Linguistics. Web as Corpus 1 / 1. Corpus Linguistics. Web as Corpus. web.pl 3 / 1. Sketch Engine. Corpus Linguistics

Interpreting ACER Test Results

Understanding and Interpreting the NRC s Data-Based Assessment of Research-Doctorate Programs in the United States (2010)

Preprint.

Unraveling symbolic number processing and the implications for its association with mathematics. Delphine Sasanguie

English for Specific Purposes World ISSN Issue 34, Volume 12, 2012 TITLE:

Shockwheat. Statistics 1, Activity 1

BAUM-WELCH TRAINING FOR SEGMENT-BASED SPEECH RECOGNITION. Han Shu, I. Lee Hetherington, and James Glass

Automatic Pronunciation Checker

Phonetic- and Speaker-Discriminant Features for Speaker Recognition. Research Project

Modeling user preferences and norms in context-aware systems

Millersville University Degree Works Training User Guide

CEFR Overall Illustrative English Proficiency Scales

EXTENSIVE READING AND CLIL (GIOVANNA RIVEZZI) Liceo Scientifico e Linguistico E. Bérard Aosta

A New Perspective on Combining GMM and DNN Frameworks for Speaker Adaptation

INSTRUCTOR USER MANUAL/HELP SECTION

Quarterly Progress and Status Report. Voiced-voiceless distinction in alaryngeal speech - acoustic and articula

Modeling function word errors in DNN-HMM based LVCSR systems

USER ADAPTATION IN E-LEARNING ENVIRONMENTS

CS 1103 Computer Science I Honors. Fall Instructor Muller. Syllabus

Developing True/False Test Sheet Generating System with Diagnosing Basic Cognitive Ability

1 st Quarter (September, October, November) August/September Strand Topic Standard Notes Reading for Literature

Exploration. CS : Deep Reinforcement Learning Sergey Levine

have to be modeled) or isolated words. Output of the system is a grapheme-tophoneme conversion system which takes as its input the spelling of words,

Ansys Tutorial Random Vibration

Transcription:

Gender Classification by Speech Analysis BhagyaLaxmi Jena 1, Abhishek Majhi 2, Beda Prakash Panigrahi 3 1 Asst. Professor, Electronics & Tele-communication Dept., Silicon Institute of Technology 2,3 Students of Electronics & Tele-communication Branch, Silicon Institute of Technology Abstract This paper is about a comparative investigation on speech signals to devise a gender classifier. Gender classification by speech analysis basically aims to predict the gender of the speaker by analyzing different parameters of the. This comparative investigation mainly concentrates on shorttime analysis of the speech signals. The analysis includes comparison of short-time average magnitude, short-time energy, short-time zero crossing rate and short-time auto-correlation values of male and female s. This quantitative comparison is implemented through MATLAB programming. A database consisting of s collected from many students, both male and female, of our college was created. The short-time analysis was performed on all the collected voice samples and the parameters were compared to establish a working principle for the gender classifier from speech. Keywords: short-time average magnitude, shorttime energy, short-time zero crossing rate, shorttime auto-correlation 1. Introduction At a linguistic level, speech can be viewed as a sequence of basic sound units called phonemes. A phoneme is a sound or group of different sounds perceived to have the same function by the speakers of a language. An example of a phoneme is /k/ sound in the words kit and skill. The same phoneme may give rise to many different sounds or allophones at the acoustic level, depending on the phonemes which surround it. Different speakers producing the same string of phonemes convey the same information yet sound different as a result of differences in dialect and vocal tract length and shape. 1.1 Speech Analysis The techniques used to process speech signals that can be broadly classified as either timedomain or frequency-domain analysis. In timedomain analysis, the measurements are performed directly on the speech signal to extract information. In frequency-domain analysis, the information is extracted after the frequency content of the speech signal computed to form the spectrum. 1.2 Gender Classifier Gender Classifier from speech is a part of automatic speech recognition system to enhance speaker adaptability and a part of automatic speaker recognition system. The need for gender classification from speech also arises in several situations such as sorting telephone calls by gender for gender sensitive surveys. It is also a part of modern voice password technology.

2. Short-Time Analysis Properties of a speech signal changes relatively slowly with time. Thus allows examination of a short-time duration of speech to extract parameters that are assumed to remain same for that time duration. This forms the basis of the short-time analysis. The speech signal is divided into many sub-signals of short-time duration by means of windowing technique. After splitting the large signal into many analysis frames with use of appropriate windows, each frame is analyzed and then a cumulative result is obtained. 2.1 Short-Time Average Magnitude Short-Time Average Magnitude (STAM) [7] is used for detecting the start point and end point of the speech signal. Short-Time Average Magnitude of a speech signal is given by M = x(m) w(n m) M n = Short-Time Average Magnitude 2.2 Short-Time Energy Short-Time Energy (STE) [4] is the energy associated with the signal in time domain. Short-Time Energy of a speech signal is given by E = [x(m)w(n m)] E n = Short-Time Energy 2.3Short-Time Zero Crossing Rate Zero Crossing Rate (ZCR) [4] is the rate of signchanges along a signal. Short-Time Zero Crossing Rate for a speech signal is given by Z = sgn[x(m)] sgn[x(m 1)] w(n m) Z n = Short-Time Zero Crossing Rate 1, x(m) 0 sgn[x(m)] = 1, x(m) < 0 2.4Short-Time Auto-correlation Short-Time Auto-correlation [1] for a speech signal is given by R (k) = [x(m)w(n m)] [x(m + k)w(n m k)] R n (k) = Short-Time Auto-correlation k = Sample time at which auto-correlation was calculated

3. Simulation & Results In this comparative investigation, we have used s from our database that contains s of many students, both male and female, of our college. Each of the voice samples collected contain one predefined sentence (Oh My God), spoken by only one speaker, speaking in English, with no other background sounds. The comparative shorttime analysis of male and female s was done using MATLAB software. It was observed that the average short-time energy value for female s was greater than that of male s for almost all the s in our database. The STE plots for a randomly selected female and male are shown in figure 1. It was also observed that the average shorttime zero crossing rate value for female voice samples was higher than that of male voice samples throughout our database. The shorttime ZCR plots for a randomly selected female and male are shown in figure 2. Figure-2(a): Short-Time Zero Crossing Rate plot of female Figure-2(b): Short-Time Zero Crossing Rate plot of male Figure-1(a): Short-Time Energy plot of female voice sample There was a significant difference in the shorttime average magnitude and short-time autocorrelation plots of the male and female voice samples. The STAM plots for a randomly selected female and male are shown in figure 3. Figure-1(b): Short-Time Energy plot of male

Figure-3: Short-Time Average Magnitude plots of female and male. Conclusion By comparing the parameters obtained by short-time analysis of the male and female s, it is observed that there is sufficient difference between the parameters. This difference in parameters can be used as the working principle of a Gender Classifier which predicts the gender of the speaker in a voice signal by analyzing it. Our long term goal is to implement a gender classifier that can automatically predict the gender of the speaker based on the above investigation. Figure-4(a): Short-Time Autocorrelation plot of female References [1] H. Harb, L. Chen, J. Auloge, Speech/ Music/ Silence and Gender Detection Algorithm [2] Vinay K. Ingle, John G. Prokakis, Digital Signal Processing Using MATLAB [3] Chiu Ying Lay, Ng Hian James, Gender Classification from Speech [4] Thomas F. Quatieri, Discrete-Time Speech Signal Processing Figure-4(b): Short-Time Autocorrelation plot of male All the convolutions computed during this analysis were based on FFT/IFFT algorithm [6] implemented in MATLAB software. Appropriate rectangular windows [3] were designed and used for the analysis. [5] John G. Proakis, Dimitris G. Manolakis, D. Sharma, Digital signal Processing, Principles, Algorithms and Applications [6] Douglas O shaughnessy, Speech Communications: Human and Machine [7] Lawrence R. Rabiner, Biing-Hwang Juang, Fundamentals of Speech Recongnition [8] Joseph Mariani, Language and Speech Processing

[9] Tanja Schultz, Speaker Characteristics [10] Thomas F. Quatieri, Discrete-Time Speech Signal Processing [11] Christian Muller, Speaker Classification: Fundamentals, Features and Methods [12] E. Parris, M. Carey, Language Independent Gender Indentification