Sparse-coded Net Model and Applications

Similar documents
Python Machine Learning

Lecture 1: Machine Learning Basics

arxiv: v1 [cs.lg] 15 Jun 2015

Generative models and adversarial training

Human Emotion Recognition From Speech

(Sub)Gradient Descent

Autoregressive product of multi-frame predictions can improve the accuracy of hybrid models

A study of speaker adaptation for DNN-based speech synthesis

Module 12. Machine Learning. Version 2 CSE IIT, Kharagpur

Deep search. Enhancing a search bar using machine learning. Ilgün Ilgün & Cedric Reichenbach

A Simple VQA Model with a Few Tricks and Image Features from Bottom-up Attention

Unsupervised Learning of Word Semantic Embedding using the Deep Structured Semantic Model

System Implementation for SemEval-2017 Task 4 Subtask A Based on Interpolated Deep Neural Networks

Semi-Supervised GMM and DNN Acoustic Model Training with Multi-system Combination and Confidence Re-calibration

Semi-supervised methods of text processing, and an application to medical concept extraction. Yacine Jernite Text-as-Data series September 17.

Training a Neural Network to Answer 8th Grade Science Questions Steven Hewitt, An Ju, Katherine Stasaski

Word Segmentation of Off-line Handwritten Documents

A New Perspective on Combining GMM and DNN Frameworks for Speaker Adaptation

arxiv: v2 [cs.cv] 30 Mar 2017

CS Machine Learning

arxiv: v1 [cs.cv] 10 May 2017

HIERARCHICAL DEEP LEARNING ARCHITECTURE FOR 10K OBJECTS CLASSIFICATION

Speech Emotion Recognition Using Support Vector Machine

CSL465/603 - Machine Learning

BUILDING CONTEXT-DEPENDENT DNN ACOUSTIC MODELS USING KULLBACK-LEIBLER DIVERGENCE-BASED STATE TYING

Modeling function word errors in DNN-HMM based LVCSR systems

Analysis of Emotion Recognition System through Speech Signal Using KNN & GMM Classifier

PREDICTING SPEECH RECOGNITION CONFIDENCE USING DEEP LEARNING WITH WORD IDENTITY AND SCORE FEATURES

arxiv: v1 [cs.lg] 7 Apr 2015

Learning Structural Correspondences Across Different Linguistic Domains with Synchronous Neural Language Models

A Review: Speech Recognition with Deep Learning Methods

Modeling function word errors in DNN-HMM based LVCSR systems

WHEN THERE IS A mismatch between the acoustic

SEMI-SUPERVISED ENSEMBLE DNN ACOUSTIC MODEL TRAINING

TRANSFER LEARNING OF WEAKLY LABELLED AUDIO. Aleksandr Diment, Tuomas Virtanen

Active Learning. Yingyu Liang Computer Sciences 760 Fall

On the Formation of Phoneme Categories in DNN Acoustic Models

QuickStroke: An Incremental On-line Chinese Handwriting Recognition System

Machine Learning and Data Mining. Ensembles of Learners. Prof. Alexander Ihler

Multi-tasks Deep Learning Model for classifying MRI images of AD/MCI Patients

A Deep Bag-of-Features Model for Music Auto-Tagging

Model Ensemble for Click Prediction in Bing Search Ads

Deep Facial Action Unit Recognition from Partially Labeled Data

POS tagging of Chinese Buddhist texts using Recurrent Neural Networks

Calibration of Confidence Measures in Speech Recognition

Artificial Neural Networks written examination

Course Outline. Course Grading. Where to go for help. Academic Integrity. EE-589 Introduction to Neural Networks NN 1 EE

Experiments with SMS Translation and Stochastic Gradient Descent in Spanish Text Author Profiling

Deep Neural Network Language Models

DIRECT ADAPTATION OF HYBRID DNN/HMM MODEL FOR FAST SPEAKER ADAPTATION IN LVCSR BASED ON SPEAKER CODE

Dual-Memory Deep Learning Architectures for Lifelong Learning of Everyday Human Behaviors

Softprop: Softmax Neural Network Backpropagation Learning

THE world surrounding us involves multiple modalities

Semi-Supervised Face Detection

Distributed Learning of Multilingual DNN Feature Extractors using GPUs

FSL-BM: Fuzzy Supervised Learning with Binary Meta-Feature for Classification

ADVANCES IN DEEP NEURAL NETWORK APPROACHES TO SPEAKER RECOGNITION

Improvements to the Pruning Behavior of DNN Acoustic Models

Segmental Conditional Random Fields with Deep Neural Networks as Acoustic Models for First-Pass Word Recognition

DOMAIN MISMATCH COMPENSATION FOR SPEAKER RECOGNITION USING A LIBRARY OF WHITENERS. Elliot Singer and Douglas Reynolds

Product Feature-based Ratings foropinionsummarization of E-Commerce Feedback Comments

TRANSFER LEARNING IN MIR: SHARING LEARNED LATENT REPRESENTATIONS FOR MUSIC AUDIO CLASSIFICATION AND SIMILARITY

Assignment 1: Predicting Amazon Review Ratings

OPTIMIZATINON OF TRAINING SETS FOR HEBBIAN-LEARNING- BASED CLASSIFIERS

CLASSIFICATION OF TEXT DOCUMENTS USING INTEGER REPRESENTATION AND REGRESSION: AN INTEGRATED APPROACH

A survey of multi-view machine learning

Twitter Sentiment Classification on Sanders Data using Hybrid Approach

Second Exam: Natural Language Parsing with Neural Networks

The University of Amsterdam s Concept Detection System at ImageCLEF 2011

Taxonomy-Regularized Semantic Deep Convolutional Neural Networks

Speech Recognition at ICSI: Broadcast News and beyond

Exploration. CS : Deep Reinforcement Learning Sergey Levine

arxiv: v2 [cs.ir] 22 Aug 2016

A Neural Network GUI Tested on Text-To-Phoneme Mapping

Evolutive Neural Net Fuzzy Filtering: Basic Description

Discriminative Learning of Beam-Search Heuristics for Planning

Rule Learning With Negation: Issues Regarding Effectiveness

Test Effort Estimation Using Neural Network

A Comparison of Two Text Representations for Sentiment Analysis

A deep architecture for non-projective dependency parsing

Forget catastrophic forgetting: AI that learns after deployment

Глубокие рекуррентные нейронные сети для аспектно-ориентированного анализа тональности отзывов пользователей на различных языках

Learning Methods for Fuzzy Systems

INVESTIGATION OF UNSUPERVISED ADAPTATION OF DNN ACOUSTIC MODELS WITH FILTER BANK INPUT

AUTOMATED FABRIC DEFECT INSPECTION: A SURVEY OF CLASSIFIERS

Support Vector Machines for Speaker and Language Recognition

Robust Speech Recognition using DNN-HMM Acoustic Model Combining Noise-aware training with Spectral Subtraction

AUTOMATIC DETECTION OF PROLONGED FRICATIVE PHONEMES WITH THE HIDDEN MARKOV MODELS APPROACH 1. INTRODUCTION

Attributed Social Network Embedding

Summarizing Answers in Non-Factoid Community Question-Answering

OCR for Arabic using SIFT Descriptors With Online Failure Prediction

Knowledge Transfer in Deep Convolutional Neural Nets

Learning to Schedule Straight-Line Code

ACOUSTIC EVENT DETECTION IN REAL LIFE RECORDINGS

Machine Learning from Garden Path Sentences: The Application of Computational Linguistics

Comment-based Multi-View Clustering of Web 2.0 Items

Introduction to Ensemble Learning Featuring Successes in the Netflix Prize Competition

CS 446: Machine Learning

Australian Journal of Basic and Applied Sciences

The 9 th International Scientific Conference elearning and software for Education Bucharest, April 25-26, / X

Transcription:

-coded Net Model and Applications Y. Gwon, M. Cha, W. Campbell, H.T. Kung, C. Dagli IEEE International Workshop on Machine Learning for Signal Processing () September 16, 2016 This work is sponsored by the Defense Advanced Research Projects Agency under Air Force Contract FA8721-05- C- 0002. Opinions, interpretations, conclusions, and recommendations are those of the authors and are not necessarily endorsed by the United States Government.

Outline Background Coding Semi-supervised Learning with Coding -coded Net Experimental Evaluation Conclusions and Future Work 2

Background: Coding coding illustration coding illustration!!!! coding illustration coding illustration Unsupervised method to learn representation of data!!!!"#$%&#'!()#*+,! Decompose data into sparse linear combination of learned basis vectors!!!!"#$%&#'!()#*+,!!!!!"#$%&#'!()#*+,!!!!!"#$%&#'!()#*+,! -+#&.+/!0#,+,!1!!23!! 23!! 4/*+, -+#&.+/!0#,+,!1!!!"#"$#"!!! -+#&.+/!0#,+,!1!!%& 23!! -+#&.+/!0#,+,!1!!%& 23!!4/*+, 4/*+,!"#"$#"!"#"$#" %&!"#"$#" %&4/*+, 1 1 2 2 1 3 3 Domain transform: raw data feature vectors 1 2 2 3 3 1 1 2 2 1 2 1 2 Feature dic-onary 1 2 3 1 1 1 2 3 2 1 3 3 2 1 1 3 Data 3 3 3 2 2 X 2 X DY 1 2 Coding 1 1 2 3 3 3 1 2 3 2 1 3 2 3 3 3 D Y 1 2 12 1 23 2 3 3 1 3 B+,$!+C#)?'+ B+,$!+C#)?'+ B+,$!+C#)?'+ B+,$!+C#)?'+ " 1.2 " "0.8 *0.8 0.8 * ** "0.8 x 3 y101 + 0.3 ++0.3 *0.9 +0.3 * ** + 0.3 d101 y208 0.5 *0.5 +++0.5 +0.5 * ** +0.5 d208 y263 d263!!!! " "0.8 *0.8 *0.3 *0.5 "0.8 +0.5 0.8 * **!'%!'% 0.3 * **!&(!&( * **!%'"!%'" "!!'%++0.3 ++ 0.3!!&(&(++0.5 +0.5!!%'"%'" '%!567!67!87!67!!"# 67!87!67!!"$ 7!!"$ 67!87!67!!"% 7!8 9!! 67!67!87!67!7!!"# 67!87!67! 67!87!67!!"% 7!8!567!67!87!67!!"# 7!67!87!67!!"$ 7!67!87!67!!"% 7!8 9!!!5!567!67!87!67!!"# 7!7!67!87!67!!"$ 7!7!67!87!67!!"% 7!8 9!!9!!

Background: Coding (cont.) Popularly solved as L 1 -regularized optimization (LASSO/LARS) Optimizing on L 0 pseudo-norm is intractable greedy-l 0 algorithm (OMP) can be used instead Data Coding Feature dic-onary X D Y X D Y min {D,y} ǁ x Dyǁ 2 2 + λǁ yǁ 1 Convex relaxation min {D,y} ǁ x Dyǁ 2 2 + λǁ yǁ 0 4

Outline Background Coding Semi-supervised Learning with Coding -coded Net Experimental Evaluation Conclusions and Future Work 5

Semi-supervised Learning with Coding Semi-supervised learning Unsupervised stage: learn feature representation using unlabeled data Supervised stage: optimize task objective using learned feature representations of labeled data Semi-supervised learning with sparse coding Unsupervised stage: sparse coding and dictionary learning with unlabeled data Supervised stage: train classifier/regression using sparse codes of labeled data Unsupervised stage Raw data (unlabeled) Preprocessing (optional) coding & dictionary learning D (learned dictionary) Supervised stage Raw data (labeled) Preprocessing (optional) coding with D Feature pooling Classifier/ regression 6

Outline Background Coding Semi-supervised Learning with Coding -coded Net Experimental Evaluation Conclusions and Future Work 7

-coded Net Motivations Semi-supervised learning with sparse coding cannot jointly optimize feature representation learning and task objective codes used as feature vectors for task cannot be modified to induce correct data labels No supervised dictionary learning sparse coding dictionary is learned using only unlabeled data 8

-coded Net Feedforward model with sparse coding, pooling, softmax layers Pretrain: semi-supervised learning with sparse coding Finetune: SCN backpropagation p(l z) Softmax z Pooling (nonlinear rectification) y (1) y (2) y (3) y(m) D coding coding coding... coding x (1) x (2) x (3) x (M) 9

SCN Backpropagation When predicted output does not match ground truth, hold softmax weights constant and adjust pooled sparse code by gradient descent z z* Adjust sparse codes from adjusted pooled sparse code by putback z* Y* Adjust sparse coding dictionary by rank-1 updates or gradient descent D D* Redo feedforward path with adjusted dictionary and retrain softmax Repeat until convergence Softmax z Rewrite softmax loss as function of z Putback 10

Outline Background Coding Semi-supervised Learning with Coding -coded Net Experimental Evaluation Conclusions and Future Work 11

Experimental Evaluation Audio and Acoustic Signal Processing (AASP) 30-second WAV files recorded in 44.1kHz 16-bit stereo 10 classes such as bus, busy street, office, and open-air market For each class, 10 labeled examples CIFAR-10 60,000 32x32 color images 10 classes such as airplane, automobile, cat, and dog We sample 2,000 images to form train and test datasets Wikipedia 2,866 documents Annotated with 10 categorical labels Text-document is represented as 128 LDA features 12

Results: AASP Sound Classification Sound Classification Performance on AASP dataset Method Accuracy Semi-supervised via sparse coding (LARS) 73.0% Semi-supervised via sparse coding (OMP) 69.0% GMM-SVM 61.0% Deep SAE NN (4 layers) 71.0% -coded net (LARS) 78.0% -coded net (OMP) 75.0% -coded net model for LARS achieves the best accuracy performance of 78% Comparable to the best AASP scheme (79%) Significantly better than the AASP baseline (57%) 13 D.Stowell, D.Giannoulis, E.Benetos, M.Lagrange, and M.D.Plumbley, Detection and Classification of Acoustic Scenes and Events, IEEE Trans. on Multimedia, vol. 17, no. 10, 2015.

Results: CIFAR Image Classification Image Classification performance on CIFAR-10 Method Accuracy Semi-supervised via sparse coding (LARS) 84.0% Semi-supervised via sparse coding (OMP) 81.3% GMM-SVM 76.8% Deep SAE NN (4 layers) 81.9% -coded net (LARS) 87.9% -coded net (OMP) 85.5% Again, sparse-coded net model for LARS achieves the best accuracy performance of 87.9% Superior to RBM and CNN pipelines evaluated by Coates et al. 14 A. Coates, A. Ng, and H. Lee, An Analysis of Single-layer Networks in Unsupervised Feature Learning, in AISTATS, 2011.

Results: Wikipedia Category Classification Text Classification performance on Wikipedia dataset Method Accuracy Semi-supervised via sparse coding (LARS) 69.4% Semi-supervised via sparse coding (OMP) 61.1% Deep SAE NN (4 layers) 67.1% -coded net (LARS) 70.2% -coded net (OMP) 62.1% We achieve the best accuracy of 70.2% with sparse-coded net on LARS Superior to 60.5 68.2% by existing approaches 1, 2 15 1K. Duan, H. Zhang, and J. Wang, Joint learning of cross-modal classifier and factor analysis for multimedia data classification, Neural Computing and Applications, vol. 27, no. 2, 2016. 2 L. Zhang, Q. Zhang, L. Zhang, D. Tao, X. Huang, and B. Du, Ensemble Manifold Regularized Low-rank Approximation for Multi- view Feature Embedding, Pattern Recognition, vol. 48, no. 10, 2015.

Outline Background Coding Semi-supervised Learning with Coding -coded Net Experimental Evaluation Conclusions and Future Work 16

Conclusions and Future Work Conclusions Introduced sparse-coded net model that jointly optimizes sparse coding and dictionary learning with supervised task at output layer Proposed SCN backpropagation algorithm that can handle mix-up of feature vectors related to pooling nonlinearity Demonstrated superior classification performance on sound (AASP), image (CIFAR-10), and text (Wikipedia) data Future Work More realistic larger-scale experiments necessary Generalize hyperparameter optimization techniques for various datasets (e.g., audio, video, text) 17