On Low-level Cognitive Components of Speech

Similar documents
AUTOMATIC DETECTION OF PROLONGED FRICATIVE PHONEMES WITH THE HIDDEN MARKOV MODELS APPROACH 1. INTRODUCTION

Human Emotion Recognition From Speech

Class-Discriminative Weighted Distortion Measure for VQ-Based Speaker Identification

Analysis of Emotion Recognition System through Speech Signal Using KNN & GMM Classifier

Speech Emotion Recognition Using Support Vector Machine

Speech Segmentation Using Probabilistic Phonetic Feature Hierarchy and Support Vector Machines

Python Machine Learning

Modeling function word errors in DNN-HMM based LVCSR systems

DOMAIN MISMATCH COMPENSATION FOR SPEAKER RECOGNITION USING A LIBRARY OF WHITENERS. Elliot Singer and Douglas Reynolds

Modeling function word errors in DNN-HMM based LVCSR systems

WHEN THERE IS A mismatch between the acoustic

Probabilistic Latent Semantic Analysis

Affective Classification of Generic Audio Clips using Regression Models

International Journal of Computational Intelligence and Informatics, Vol. 1 : No. 4, January - March 2012

Learning Methods in Multilingual Speech Recognition

A study of speaker adaptation for DNN-based speech synthesis

Australian Journal of Basic and Applied Sciences

Learning Structural Correspondences Across Different Linguistic Domains with Synchronous Neural Language Models

Speaker Recognition. Speaker Diarization and Identification

Course Outline. Course Grading. Where to go for help. Academic Integrity. EE-589 Introduction to Neural Networks NN 1 EE

Phonetic- and Speaker-Discriminant Features for Speaker Recognition. Research Project

Likelihood-Maximizing Beamforming for Robust Hands-Free Speech Recognition

ADVANCES IN DEEP NEURAL NETWORK APPROACHES TO SPEAKER RECOGNITION

On the Formation of Phoneme Categories in DNN Acoustic Models

PREDICTING SPEECH RECOGNITION CONFIDENCE USING DEEP LEARNING WITH WORD IDENTITY AND SCORE FEATURES

A survey of multi-view machine learning

The 9 th International Scientific Conference elearning and software for Education Bucharest, April 25-26, / X

Unsupervised Learning of Word Semantic Embedding using the Deep Structured Semantic Model

Algebra 1, Quarter 3, Unit 3.1. Line of Best Fit. Overview

Support Vector Machines for Speaker and Language Recognition

A NOVEL SCHEME FOR SPEAKER RECOGNITION USING A PHONETICALLY-AWARE DEEP NEURAL NETWORK. Yun Lei Nicolas Scheffer Luciana Ferrer Mitchell McLaren

TRANSFER LEARNING IN MIR: SHARING LEARNED LATENT REPRESENTATIONS FOR MUSIC AUDIO CLASSIFICATION AND SIMILARITY

Speaker recognition using universal background model on YOHO database

Assignment 1: Predicting Amazon Review Ratings

ACOUSTIC EVENT DETECTION IN REAL LIFE RECORDINGS

Data Fusion Models in WSNs: Comparison and Analysis

Word Segmentation of Off-line Handwritten Documents

arxiv: v2 [cs.cv] 30 Mar 2017

Module 12. Machine Learning. Version 2 CSE IIT, Kharagpur

Full text of O L O W Science As Inquiry conference. Science as Inquiry

BUILDING CONTEXT-DEPENDENT DNN ACOUSTIC MODELS USING KULLBACK-LEIBLER DIVERGENCE-BASED STATE TYING

English Language and Applied Linguistics. Module Descriptions 2017/18

Robust Speech Recognition using DNN-HMM Acoustic Model Combining Noise-aware training with Spectral Subtraction

DOCTORAL SCHOOL TRAINING AND DEVELOPMENT PROGRAMME

Malicious User Suppression for Cooperative Spectrum Sensing in Cognitive Radio Networks using Dixon s Outlier Detection Method

Evidence for Reliability, Validity and Learning Effectiveness

MULTILINGUAL INFORMATION ACCESS IN DIGITAL LIBRARY

Speaker Identification by Comparison of Smart Methods. Abstract

A Comparison of DHMM and DTW for Isolated Digits Recognition System of Arabic Language

Automatic Speaker Recognition: Modelling, Feature Extraction and Effects of Clinical Environment

Generative models and adversarial training

Product Feature-based Ratings foropinionsummarization of E-Commerce Feedback Comments

OPTIMIZATINON OF TRAINING SETS FOR HEBBIAN-LEARNING- BASED CLASSIFIERS

Design Of An Automatic Speaker Recognition System Using MFCC, Vector Quantization And LBG Algorithm

IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 17, NO. 3, MARCH

Speech Recognition using Acoustic Landmarks and Binary Phonetic Feature Classifiers

Using Articulatory Features and Inferred Phonological Segments in Zero Resource Speech Processing

Experiments with SMS Translation and Stochastic Gradient Descent in Spanish Text Author Profiling

Semi-Supervised GMM and DNN Acoustic Model Training with Multi-system Combination and Confidence Re-calibration

USER ADAPTATION IN E-LEARNING ENVIRONMENTS

A Comparison of Two Text Representations for Sentiment Analysis

Segregation of Unvoiced Speech from Nonspeech Interference

Eli Yamamoto, Satoshi Nakamura, Kiyohiro Shikano. Graduate School of Information Science, Nara Institute of Science & Technology

Hiroyuki Tsunoda Tsurumi University Tsurumi, Tsurumi-ku, Yokohama , Japan

Speech Recognition at ICSI: Broadcast News and beyond

The NICT/ATR speech synthesis system for the Blizzard Challenge 2008

Speech Recognition by Indexing and Sequencing

The Effect of Discourse Markers on the Speaking Production of EFL Students. Iman Moradimanesh

A Neural Network GUI Tested on Text-To-Phoneme Mapping

Using Web Searches on Important Words to Create Background Sets for LSI Classification

A New Perspective on Combining GMM and DNN Frameworks for Speaker Adaptation

arxiv: v1 [cs.cl] 2 Apr 2017

Semi-supervised methods of text processing, and an application to medical concept extraction. Yacine Jernite Text-as-Data series September 17.

Edinburgh Research Explorer

Unvoiced Landmark Detection for Segment-based Mandarin Continuous Speech Recognition

(Sub)Gradient Descent

Honors Interdisciplinary Seminar

Generic Project Rubrics 4th Grade

Proceedings of Meetings on Acoustics

Given a directed graph G =(N A), where N is a set of m nodes and A. destination node, implying a direction for ow to follow. Arcs have limitations

CSC200: Lecture 4. Allan Borodin

New Jersey Institute of Technology Newark College of Engineering

Rule Learning With Negation: Issues Regarding Effectiveness

Machine Learning from Garden Path Sentences: The Application of Computational Linguistics

A Survey on Unsupervised Machine Learning Algorithms for Automation, Classification and Maintenance

A Biological Signal-Based Stress Monitoring Framework for Children Using Wearable Devices

Axiom 2013 Team Description Paper

Ontologies vs. classification systems

University of Toronto Mississauga Degree Level Expectations. Preamble

Characteristics of the Text Genre Realistic fi ction Text Structure

REVIEW OF CONNECTED SPEECH

Extracting Opinion Expressions and Their Polarities Exploration of Pipelines and Joint Models

Vocabulary Usage and Intelligibility in Learner Language

Circuit Simulators: A Revolutionary E-Learning Platform

STUDIES WITH FABRICATED SWITCHBOARD DATA: EXPLORING SOURCES OF MODEL-DATA MISMATCH

Humboldt-Universität zu Berlin

Autoregressive product of multi-frame predictions can improve the accuracy of hybrid models

Deep search. Enhancing a search bar using machine learning. Ilgün Ilgün & Cedric Reichenbach

Lecture 1: Machine Learning Basics

OCR for Arabic using SIFT Descriptors With Online Failure Prediction

Transcription:

Informatics and Mathematical Modelling / Intelligent Signal Processing On Low-level Cognitive Components of Speech Ling Feng Intelligent Signal Processing Informatics and Mathematical Modelling Technical University of Denmark www.imm.dtu.dk/~lf/

Outline Introduction - Cognitive Components Analysis: A definition - Machine Learning Tools Cognitive components Examples: - Text analysis - music genre - Speech (phoneme & speaker) Summary and Outlook

Cognitive Components Analysis What is Cognition? Cognition is the process involved in knowing, or the act of knowing, including perception and judgment. It includes every mental process that can be described as an experience of knowing as distinguished from an experience of feeling or of willing. [Encyclopædia Brittanica] What is Cognitive Component Analysis (COCA)? COCA is the process of unsupervised grouping of data such that the ensuing group structure is well-aligned with that resulting from human cognitive activity. L.K. Hansen, P. Ahrendt, and J. Larsen, Towards cognitive component analysis. In AKRR 05 International and Interdisciplinary Conference on Adaptive Knowledge Representation and Reasoning. Jun 2005.

Cognitive Components Analysis COCA: an intermediate level tool - Source separation low level - COCA intermediate level - Content detection high level Theoretical main points: The relation between supervised and unsupervised learning. Related to the discussion of the utility of unlabeled samples in supervised learning.

Machine Learning Tools Unsupervised Learning No separation of the training set into inputs and outputs pairs. `Self-organization - LSA Σ T = UΓU Z = U T L X - ICA j, t k = 1 Supervised Learning X = K A j, k S k, t The model includes mediating variables between inputs x and outputs y.

Text analysis - Vector Space Representation - LSA: a sparse linear mixture of independent topics in termdocument scatter plots - ICA: less than 10% classification error rate If the structure in the feature space is well aligned with the label structure we expect high utility of unlabeled data.

Music genre - Feature 13d MFCC frame =30ms overlap = 10ms - LSA A sparse linear mixture of independent context from a music database. If the structure in the feature space is well aligned with the label structure we expect high utility of unlabeled data.

Speech Feature - Short-term feature: frame size [10ms, 40ms] - Long-term feature `To reveal the semantic meaning of a signal, analysis over a much longer period is necessary, usually from one second to several tens seconds [Wang, Liu, Huang, 2000]. Energy Based Sparsification (EBS) ATTENTION! - EBS: retain the upper? % of normalized magnitudes - Attention: the process which gives rise to conscious awareness [Braisby, Gellatly, 2005]. `Attention appears to have surprising similarity with the development of invariant feature. Wang, Y., Liu, Z., Huang, J.-C., Multimedia Content Analysis, IEEE Signal Processing Magazine, Nov. 2000, 12-36 (2000). Braisby, N., Gellatly, A., Cognitive Psychology, OXFORD University Press, 2005

Speech - phoneme Phoneme The class of sounds that are consistently perceived as representing a certain minimal linguistic unit [Deller, Hansen, Proakis, 2000]. COCA on Phoneme - Feature 16d Cepstral Coefficients, frame=20ms, overlap=10ms - Energy based sparsification: retain upper 35% magnitude fractile - LSA (PCA) on sparsified features

S O F A

Speech - speaker Features - Basic feature: 12d MFCC - Long-term feature: 1 sec Energy based sparsification retain upper 1% magnitude fractile Data source Three speakers, F1, F2, M1 from ELSDSR. - Text-dependent - Text-independent

Text-dependent Speaker Recognition Training set: 52.5s Test set: 35.5s Phenomena: - The phoneme like sparse linear structure; - Offset between training and test sets; An interaction between the text content and the speaker identity!

Text-independent Speaker Recognition Training set: 32 s Test set: 20 s Phenomena: The generalizable ray structures of independent identities emanating from origin of the coordinate system without offsets.

Summary Summary and Outlook - Cognitive components can be found by unsupervised learning! generalizable features for phonemes with short-term features generalizable speaker specific sparse components with long-time features Outlook Should the ray structures likely be based on independence? - (Labeled) mixture of Factor Analysis p( x, y) = K k = 1 p( x k) p( y k) p( k)