Deep Ensemble Learning ABDELHAK LEMKHENTER 07/03/2017

Size: px

Start display at page:

Download "Deep Ensemble Learning ABDELHAK LEMKHENTER 07/03/2017"

Sarah Shepherd
5 years ago
Views:

1 Deep Ensemble Learning ABDELHAK LEMKHENTER 07/03/2017

2 Presentation Outline 2 Ensemble Learning Stacking Boosting Simple Deep Ensemble Learning A heterogenous stack More Advanced Deep Ensemble Learning Multi-Resolution Stacking Deep Incremental Boosting

3 Ensemble Learning 3 An ensemble learning is the practice of training multiple estimators and combining them into one robust estimators. For ensemble learning to be efficient, the set of learners should be as diverse as possible : this allows each learner to capture a different pattern. Diversity can be obtain by : Using different hyperparameters when using the same base learner; Subsampling the training data (useful when we have too little data, or too much data ); Using different algorithms.

4 Different ensemble techniques 4 Ensemble learning includes various techniques. The most commonly used ones are the following : Stacking Boosting

5 Stacking 5 In stacking, we train a meta-learner to combine our base learners. The base learners are different machine learning algorithms. [1] Output Meta-learner Model 2 Model n Model 1 Model 3 Input

6 Boosting 6 In boosting, we iteratively combine a set of weak estimators using the same machine learning algorithm- into a strong learner. Weak learner only needs to be slightly better than random guessing.[2] Gradient boosting

7 Adaptive Boosting 7 At each iteration step: We train a weak learner using a sampling distribution D i ; We update D i by giving more weight to miss labeled data points.

8 Deep Learning and Ensemble Learning 8 The two field share some similar guide lines (symmetry breaking ~ increasing diversity). Deep Neural Networks have various architectures and many hyperparameters which make them a good candidate for creating diverse sets of learners.

9 Ensemble Deep Learning for Speech Recognition[4] 9 A simple Ensemble model by stacking 3 types of Neural Nets.

10 Evaluation of the model 10 Evaluation on the TIMIT phone recognition task : Training set : 462 speakers Dev set : 50 speakers Test set : 24 speakers

11 Monaural Speech Separation 11 Task of separating a speech signal of a target from background noise or an interfering a speech signal, using data from a single microphone. We will focus on three approach : A Masking method using a DNN; A Mapping method using a DNN; Multi-Resolution Stacking.

12 Masking based DNN 12 We are trying to predict the Ideal Ratio Mask, where each T-F unit encodes the ratio of the target signal over the mixed signal.

13 Mapping based DNN 13 In this approach, we are trying to learn how to directly the mixed signal to the target signal.

14 Multi-resolution stacking 14 Module n Postprocessing Output Module 1 Input Preprocessing

15 Preprocessing and post-processing 15 Preprocessing Postprocessing Mixed Signal Target signal Inverse STFT STFT Target signal in TF domain Phase of the mixed signal y n Estimated RM y n

16 A Learning module 16 Output of the previous module + spectra of the mixed signal Expanding features in resolution R1 Expanding features in resolution R2 Expanding features in resolution Rp DNN 1 DNN 2 DNN p RM 1 RM 2 RM p The last module only has one DNN

17 Feature expansion 17 For a given resolution R, For each frame m, we expand the input with window of size 2*R+1 centered around the frame m. This is done for each RM passed down from the previous module and for the magnitude spectra of the STFT of the mixed signal

18 Model evaluation 18 Training and test set are generated using the SSC,TIMIT and IEEE-TIMIT datasets. Three different settings are used : Same target and interfering speakers with : Different SNR lev els Randomly chosen SNR lev el Same target but using a different interfering speaker

19 Results 1/3 19 SSC TIMIT

20 Results 2/3 20 SSC TIMIT

21 Results 3/3 21

22 Deep Incremental Boosting 22 Deep Ensemble Learning requires training more Neural Nets DIB is a combination of deep learning, transfer learning and ensemble learning, suggested to tackle this issue.

23 Application of Transfer Learning 23

24 DIB 24

25 Benchmark 25 Mislabeling ratio Training time

26 DIB for Spoken digit recognition 26 Data set : Training set :10 digits x 10 utterance x 66 speakers (Male and female) Test set : 10 digits x 10 utterance x 33 speakers (Male and female)

27 Architecture and results 27 Architecture Conv2D : 64 2x2 MaxPooling :1x2 Conv2D :128 2x2 MaxPooling :1x2 For 0 to 8 Conv2D :64 2x2 Fully connected layer :128 Softmax output layer Results For 2 epoch per training One CNN : DIB : If we use equivalent time (40 epoch) for the single CNN :

28 Thank you for your attention 28

29 Reference 29 [1] Wolpert, D. H., (1992). Stacked Generalization, Neural Networks, 5, 241. [2] Y. Freund, R.E. Schapire, A short introduction to boosting, in: Proceedings of the Sixteenth International Joint Conference on Artificial Intelligence, Morgan Kaufmann, 1999, pp [2] Alan Mosca and George Magoulas. Deep incremental boosting. In Christoph Benzmuller, Geoff Sutcliffe, and Raul Rojas (eds.), GCAI nd Global Conference on Artificial Intelligence, volume 41 of EPiC Series in Computing, pp EasyChair, 2016a. [3] L. Deng and John Platt, Ensemble Deep Learning for Speech Recognition, Interspeech, [4] Zhang, Xiao-Lei, and Deliang Wang. "A Deep Ensemble Learning Method for Monaural Speech Separation." IEEE/ACM Transactions on Audio, Speech and Language Processing (TASLP) 24.5 (2016): Web. 23 Jan [5] Alan Mosca and George Magoulas. Deep incremental boosting. In Christoph Benzmuller, Geoff Sutcliffe, and Raul Rojas (eds.), GCAI nd Global Conference on Artificial Intelligence, volume 41 of EPiC Series in Computing, pp EasyChair, 2016a.

Autoregressive product of multi-frame predictions can improve the accuracy of hybrid models

Autoregressive product of multi-frame predictions can improve the accuracy of hybrid models Navdeep Jaitly 1, Vincent Vanhoucke 2, Geoffrey Hinton 1,2 1 University of Toronto 2 Google Inc. ndjaitly@cs.toronto.edu,