Deep Ensemble Learning ABDELHAK LEMKHENTER 07/03/2017

Similar documents
Autoregressive product of multi-frame predictions can improve the accuracy of hybrid models

Semi-Supervised GMM and DNN Acoustic Model Training with Multi-system Combination and Confidence Re-calibration

Python Machine Learning

QuickStroke: An Incremental On-line Chinese Handwriting Recognition System

A study of speaker adaptation for DNN-based speech synthesis

ADVANCES IN DEEP NEURAL NETWORK APPROACHES TO SPEAKER RECOGNITION

Robust Speech Recognition using DNN-HMM Acoustic Model Combining Noise-aware training with Spectral Subtraction

Word Segmentation of Off-line Handwritten Documents

Modeling function word errors in DNN-HMM based LVCSR systems

Modeling function word errors in DNN-HMM based LVCSR systems

Unsupervised Learning of Word Semantic Embedding using the Deep Structured Semantic Model

PREDICTING SPEECH RECOGNITION CONFIDENCE USING DEEP LEARNING WITH WORD IDENTITY AND SCORE FEATURES

On the Formation of Phoneme Categories in DNN Acoustic Models

Training a Neural Network to Answer 8th Grade Science Questions Steven Hewitt, An Ju, Katherine Stasaski

SEMI-SUPERVISED ENSEMBLE DNN ACOUSTIC MODEL TRAINING

Knowledge Transfer in Deep Convolutional Neural Nets

arxiv: v1 [cs.lg] 15 Jun 2015

Course Outline. Course Grading. Where to go for help. Academic Integrity. EE-589 Introduction to Neural Networks NN 1 EE

A Simple VQA Model with a Few Tricks and Image Features from Bottom-up Attention

System Implementation for SemEval-2017 Task 4 Subtask A Based on Interpolated Deep Neural Networks

Cultivating DNN Diversity for Large Scale Video Labelling

Softprop: Softmax Neural Network Backpropagation Learning

Module 12. Machine Learning. Version 2 CSE IIT, Kharagpur

Noise-Adaptive Perceptual Weighting in the AMR-WB Encoder for Increased Speech Loudness in Adverse Far-End Noise Conditions

Human Emotion Recognition From Speech

Segregation of Unvoiced Speech from Nonspeech Interference

Segmental Conditional Random Fields with Deep Neural Networks as Acoustic Models for First-Pass Word Recognition

A New Perspective on Combining GMM and DNN Frameworks for Speaker Adaptation

Rule Learning With Negation: Issues Regarding Effectiveness

Calibration of Confidence Measures in Speech Recognition

have to be modeled) or isolated words. Output of the system is a grapheme-tophoneme conversion system which takes as its input the spelling of words,

WHEN THERE IS A mismatch between the acoustic

Learning From the Past with Experiment Databases

Speech Recognition at ICSI: Broadcast News and beyond

International Journal of Computational Intelligence and Informatics, Vol. 1 : No. 4, January - March 2012

Model Ensemble for Click Prediction in Bing Search Ads

Introduction to Ensemble Learning Featuring Successes in the Netflix Prize Competition

Deep search. Enhancing a search bar using machine learning. Ilgün Ilgün & Cedric Reichenbach

arxiv: v1 [cs.lg] 7 Apr 2015

Grounding Language for Interactive Task Learning

ReinForest: Multi-Domain Dialogue Management Using Hierarchical Policies and Knowledge Ontology

DIRECT ADAPTATION OF HYBRID DNN/HMM MODEL FOR FAST SPEAKER ADAPTATION IN LVCSR BASED ON SPEAKER CODE

A Review: Speech Recognition with Deep Learning Methods

BUILDING CONTEXT-DEPENDENT DNN ACOUSTIC MODELS USING KULLBACK-LEIBLER DIVERGENCE-BASED STATE TYING

HIERARCHICAL DEEP LEARNING ARCHITECTURE FOR 10K OBJECTS CLASSIFICATION

Lecture 1: Machine Learning Basics

Improvements to the Pruning Behavior of DNN Acoustic Models

Speech Emotion Recognition Using Support Vector Machine

TRANSFER LEARNING OF WEAKLY LABELLED AUDIO. Aleksandr Diment, Tuomas Virtanen

Semantic Segmentation with Histological Image Data: Cancer Cell vs. Stroma

Framewise Phoneme Classification with Bidirectional LSTM and Other Neural Network Architectures

Speaker Identification by Comparison of Smart Methods. Abstract

Activity Recognition from Accelerometer Data

INPE São José dos Campos

Class-Discriminative Weighted Distortion Measure for VQ-Based Speaker Identification

Lip reading: Japanese vowel recognition by tracking temporal changes of lip shape

ADVANCED MACHINE LEARNING WITH PYTHON BY JOHN HEARTY DOWNLOAD EBOOK : ADVANCED MACHINE LEARNING WITH PYTHON BY JOHN HEARTY PDF

Learning Methods for Fuzzy Systems

Likelihood-Maximizing Beamforming for Robust Hands-Free Speech Recognition

INVESTIGATION OF UNSUPERVISED ADAPTATION OF DNN ACOUSTIC MODELS WITH FILTER BANK INPUT

A Neural Network GUI Tested on Text-To-Phoneme Mapping

arxiv: v1 [cs.cv] 10 May 2017

Generative models and adversarial training

Evolutive Neural Net Fuzzy Filtering: Basic Description

ScienceDirect. A Framework for Clustering Cardiac Patient s Records Using Unsupervised Learning Techniques

SARDNET: A Self-Organizing Feature Map for Sequences

Experiments with SMS Translation and Stochastic Gradient Descent in Spanish Text Author Profiling

THE enormous growth of unstructured data, including

Dual-Memory Deep Learning Architectures for Lifelong Learning of Everyday Human Behaviors

IEEE/ACM TRANSACTIONS ON AUDIO, SPEECH AND LANGUAGE PROCESSING, VOL XXX, NO. XXX,

Evolution of Symbolisation in Chimpanzees and Neural Nets

Notes on The Sciences of the Artificial Adapted from a shorter document written for course (Deciding What to Design) 1

arxiv: v1 [cs.cl] 27 Apr 2016

Opinion on Private Garbage Collection in Scarborough Mixed

Using focal point learning to improve human machine tacit coordination

Learning Optimal Dialogue Strategies: A Case Study of a Spoken Dialogue Agent for

Rule Learning with Negation: Issues Regarding Effectiveness

A Deep Bag-of-Features Model for Music Auto-Tagging

Knowledge-Based - Systems

arxiv: v2 [cs.ir] 22 Aug 2016

SINGLE DOCUMENT AUTOMATIC TEXT SUMMARIZATION USING TERM FREQUENCY-INVERSE DOCUMENT FREQUENCY (TF-IDF)

An empirical study of learning speed in backpropagation

Machine Learning and Data Mining. Ensembles of Learners. Prof. Alexander Ihler

Speech Synthesis in Noisy Environment by Enhancing Strength of Excitation and Formant Prominence

Dialog-based Language Learning

Mining Association Rules in Student s Assessment Data

DOMAIN MISMATCH COMPENSATION FOR SPEAKER RECOGNITION USING A LIBRARY OF WHITENERS. Elliot Singer and Douglas Reynolds

Learning Structural Correspondences Across Different Linguistic Domains with Synchronous Neural Language Models

Learning Methods in Multilingual Speech Recognition

The 9 th International Scientific Conference elearning and software for Education Bucharest, April 25-26, / X

Specification and Evaluation of Machine Translation Toy Systems - Criteria for laboratory assignments

THE world surrounding us involves multiple modalities

OPTIMIZATINON OF TRAINING SETS FOR HEBBIAN-LEARNING- BASED CLASSIFIERS

Challenges in Deep Reinforcement Learning. Sergey Levine UC Berkeley

Artificial Neural Networks written examination

POS tagging of Chinese Buddhist texts using Recurrent Neural Networks

Product Feature-based Ratings foropinionsummarization of E-Commerce Feedback Comments

DNN ACOUSTIC MODELING WITH MODULAR MULTI-LINGUAL FEATURE EXTRACTION NETWORKS

Deep Facial Action Unit Recognition from Partially Labeled Data

UNIDIRECTIONAL LONG SHORT-TERM MEMORY RECURRENT NEURAL NETWORK WITH RECURRENT OUTPUT LAYER FOR LOW-LATENCY SPEECH SYNTHESIS. Heiga Zen, Haşim Sak

Transcription:

Deep Ensemble Learning ABDELHAK LEMKHENTER 07/03/2017

Presentation Outline 2 Ensemble Learning Stacking Boosting Simple Deep Ensemble Learning A heterogenous stack More Advanced Deep Ensemble Learning Multi-Resolution Stacking Deep Incremental Boosting

Ensemble Learning 3 An ensemble learning is the practice of training multiple estimators and combining them into one robust estimators. For ensemble learning to be efficient, the set of learners should be as diverse as possible : this allows each learner to capture a different pattern. Diversity can be obtain by : Using different hyperparameters when using the same base learner; Subsampling the training data (useful when we have too little data, or too much data ); Using different algorithms.

Different ensemble techniques 4 Ensemble learning includes various techniques. The most commonly used ones are the following : Stacking Boosting

Stacking 5 In stacking, we train a meta-learner to combine our base learners. The base learners are different machine learning algorithms. [1] Output Meta-learner Model 2 Model n Model 1 Model 3 Input

Boosting 6 In boosting, we iteratively combine a set of weak estimators using the same machine learning algorithm- into a strong learner. Weak learner only needs to be slightly better than random guessing.[2] Gradient boosting

Adaptive Boosting 7 At each iteration step: We train a weak learner using a sampling distribution D i ; We update D i by giving more weight to miss labeled data points.

Deep Learning and Ensemble Learning 8 The two field share some similar guide lines (symmetry breaking ~ increasing diversity). Deep Neural Networks have various architectures and many hyperparameters which make them a good candidate for creating diverse sets of learners.

Ensemble Deep Learning for Speech Recognition[4] 9 A simple Ensemble model by stacking 3 types of Neural Nets.

Evaluation of the model 10 Evaluation on the TIMIT phone recognition task : Training set : 462 speakers Dev set : 50 speakers Test set : 24 speakers

Monaural Speech Separation 11 Task of separating a speech signal of a target from background noise or an interfering a speech signal, using data from a single microphone. We will focus on three approach : A Masking method using a DNN; A Mapping method using a DNN; Multi-Resolution Stacking.

Masking based DNN 12 We are trying to predict the Ideal Ratio Mask, where each T-F unit encodes the ratio of the target signal over the mixed signal.

Mapping based DNN 13 In this approach, we are trying to learn how to directly the mixed signal to the target signal.

Multi-resolution stacking 14 Module n Postprocessing Output Module 1 Input Preprocessing

Preprocessing and post-processing 15 Preprocessing Postprocessing Mixed Signal Target signal Inverse STFT STFT Target signal in TF domain Phase of the mixed signal y n Estimated RM y n

A Learning module 16 Output of the previous module + spectra of the mixed signal Expanding features in resolution R1 Expanding features in resolution R2 Expanding features in resolution Rp DNN 1 DNN 2 DNN p RM 1 RM 2 RM p The last module only has one DNN

Feature expansion 17 For a given resolution R, For each frame m, we expand the input with window of size 2*R+1 centered around the frame m. This is done for each RM passed down from the previous module and for the magnitude spectra of the STFT of the mixed signal

Model evaluation 18 Training and test set are generated using the SSC,TIMIT and IEEE-TIMIT datasets. Three different settings are used : Same target and interfering speakers with : Different SNR lev els Randomly chosen SNR lev el Same target but using a different interfering speaker

Results 1/3 19 SSC TIMIT

Results 2/3 20 SSC TIMIT

Results 3/3 21

Deep Incremental Boosting 22 Deep Ensemble Learning requires training more Neural Nets DIB is a combination of deep learning, transfer learning and ensemble learning, suggested to tackle this issue.

Application of Transfer Learning 23

DIB 24

Benchmark 25 Mislabeling ratio Training time

DIB for Spoken digit recognition 26 Data set : Training set :10 digits x 10 utterance x 66 speakers (Male and female) Test set : 10 digits x 10 utterance x 33 speakers (Male and female)

Architecture and results 27 Architecture Conv2D : 64 2x2 MaxPooling :1x2 Conv2D :128 2x2 MaxPooling :1x2 For 0 to 8 Conv2D :64 2x2 Fully connected layer :128 Softmax output layer Results For 2 epoch per training One CNN : 0.932272727056 DIB :0.971818181818 If we use equivalent time (40 epoch) for the single CNN : 0.979545454545

Thank you for your attention 28

Reference 29 [1] Wolpert, D. H., (1992). Stacked Generalization, Neural Networks, 5, 241. [2] Y. Freund, R.E. Schapire, A short introduction to boosting, in: Proceedings of the Sixteenth International Joint Conference on Artificial Intelligence, Morgan Kaufmann, 1999, pp. 1401 1406. [2] Alan Mosca and George Magoulas. Deep incremental boosting. In Christoph Benzmuller, Geoff Sutcliffe, and Raul Rojas (eds.), GCAI 2016. 2nd Global Conference on Artificial Intelligence, volume 41 of EPiC Series in Computing, pp. 293 302. EasyChair, 2016a. [3] L. Deng and John Platt, Ensemble Deep Learning for Speech Recognition, Interspeech, 2014. [4] Zhang, Xiao-Lei, and Deliang Wang. "A Deep Ensemble Learning Method for Monaural Speech Separation." IEEE/ACM Transactions on Audio, Speech and Language Processing (TASLP) 24.5 (2016): 967-77. Web. 23 Jan. 2017 [5] Alan Mosca and George Magoulas. Deep incremental boosting. In Christoph Benzmuller, Geoff Sutcliffe, and Raul Rojas (eds.), GCAI 2016. 2nd Global Conference on Artificial Intelligence, volume 41 of EPiC Series in Computing, pp. 293 302. EasyChair, 2016a.