Feature Transfer and Knowledge Distillation in Deep Neural Networks
|
|
- Gerald Ward
- 6 years ago
- Views:
Transcription
1 Feature Transfer and Knowledge Distillation in Deep Neural Networks (Two Interesting Papers at NIPS 2014) LU Yangyang KERE Seminar Dec. 31, 2014
2 Deep Learning F4 (at NIPS ) From left to right: Yann Lecun ( Geoffrey Hinton ( hinton/) Yoshua Bengio ( bengioy) Andrew Ng ( 1 Advances in Neural Information Processing Systems
3 Outline How transferable are features in deep neural networks? NIPS 14 Introduction Generality vs. Specificity Measured as Transfer Performance Experiments and Discussion Distilling the Knowledge in a Neural Network. NIPS 14 DL workshop Introduction Distillation Experiments: Distilled Models and Specialist Models Discussion
4 Outline How transferable are features in deep neural networks? NIPS 14 Introduction Generality vs. Specificity Measured as Transfer Performance Experiments and Discussion Distilling the Knowledge in a Neural Network. NIPS 14 DL workshop
5 Authors How transferable are features in deep neural networks? NIPS 14 Jason Yosinski 1, Jeff Clune 2, Yoshua Bengio 3, and Hod Lipson 4 1 Dept. Computer Science, Cornell University 2 Dept. Computer Science, University of Wyoming 3 Dept. Computer Science & Operations Research, University of Montreal 4 Dept. Mechanical & Aerospace Engineering, Cornell University
6 Introduction A common phenomenon in many deep neural networks trained on natural images: on the first layer they learn features similar to Gabor filters and color blobs. occurs not only for different datasets, but even with very different training objectives. A 2-dimensional Gabor filter
7 Introduction A common phenomenon in many deep neural networks trained on natural images: on the first layer they learn features similar to Gabor filters and color blobs. occurs not only for different datasets, but even with very different training objectives. A 2-dimensional Gabor filter Features at different layers of a neural network: First-layer features:general - finding standard features on the first layer seems to occur regardless of the exact cost function and natural image dataset Last-layer features:specific - the features computed by the last layer of a trained network must depend greatly on the chosen dataset and task If first-layer features are general and last-layer features are specific, then there must be a transition from general to specific somewhere in the network.
8 Introduction(cont.) First-layer features(general) TRANSITION Last-layer Features(Specific) Can we quantify the degree to which a particular layer is general or specific? Does the transition occur suddenly at a single layer, or is it spread out over several layers? Where does this transition take place: near the first, middle, or last layer of the network? We are interested in the answers to these questions because, to the extent that features within a network are general, we will be able to use them for transfer learning. 2 jspan/surveytl.htm, qyang/publications.html
9 Introduction(cont.) First-layer features(general) TRANSITION Last-layer Features(Specific) Can we quantify the degree to which a particular layer is general or specific? Does the transition occur suddenly at a single layer, or is it spread out over several layers? Where does this transition take place: near the first, middle, or last layer of the network? We are interested in the answers to these questions because, to the extent that features within a network are general, we will be able to use them for transfer learning. Transfer Learning 2 : first train a base network on a base dataset and task then repurpose or transfer the learned features to a second target network to be trained on a target dataset and task This process will tend to work if the features are general, meaning suitable to both base and target tasks, instead of specific to the base task. 2 jspan/surveytl.htm, qyang/publications.html
10 Introduction(cont.) The usual transfer learning approach: First n-layers: fine-tune or frozen? - depends on the size of the target dataset and the number of parameters in the first n layers This paper: - compares results from fine-tuned features and frozen features - How transferable are features in deep neural networks?
11 Generality vs. Specificity Measured as Transfer Performance This paper: define the degree of generality of a set of features learned on task A as the extent to which the features can be used for another task B. Task A and Task B: Image Classification Task randomly spilt 1, 000 ImageNet classes 500(Task A) + 500(Task B) train one N-layer (N = 8 here) convolutional network on A (basea) and another on B (baseb) define new networks: n {1, 2,..., 7} selffer networks BnB - first n layers: copied from baseb and frozen - higher (N n) layers: randomly initialized transfer networks AnB selffer networks BnB + : all layers are fine-tuned transfer networks AnB +
12 Overview of the experimental treatments and controls CNN for image classification: using Caffe 3 3 Jia Y, Shelhamer E, Donahue J, et al. Caffe: Convolutional architecture for fast feature embedding[c].proceedings of the ACM International Conference on Multimedia. ACM, 2014:
13 Three sets of Experiments Hypothesis: If A and B are similar, authors expect that transferred features will perform better than when A and B are less similar. 4 {tabby cat, tiger cat, Persian cat, Siamese cat, Egyptian cat, mountain lion, lynx, leopard, snow leopard, jaguar, lion, tiger, cheetah} 5 Jarrett K, Kavukcuoglu K, Ranzato M, et al. What is the best multi-stage architecture for object recognition?[c]. Computer Vision, 2009 IEEE 12th International Conference on. IEEE, 2009:
14 Three sets of Experiments Hypothesis: If A and B are similar, authors expect that transferred features will perform better than when A and B are less similar. Similar Datasets: Random A/B splits ImageNet contains clusters of similar classes, particularly dogs and cats. 4 On average, A and B will each contain approximately 6 or 7 of these felid classes, meaning that base networks trained on each dataset will have features at all levels that help classify some types of felids. Dissimilar Datasets: Man-made/Natural splits ImageNet also provides a hierarchy of parent classes. create a special split of the dataset into two halves that are as semantically different from each other as possible: - A: only man-made entities (551 classes) - B: only natural entities (449 classes) Random Weights: to ask whether or not the nearly optimal performance of random filters reported on small networks 5 carries over to a deeper network trained on a larger dataset 4 {tabby cat, tiger cat, Persian cat, Siamese cat, Egyptian cat, mountain lion, lynx, leopard, snow leopard, jaguar, lion, tiger, cheetah} 5 Jarrett K, Kavukcuoglu K, Ranzato M, et al. What is the best multi-stage architecture for object recognition?[c]. Computer Vision, 2009 IEEE 12th International Conference on. IEEE, 2009:
15 Similar Datasets (Random A/B splits) BnB: performance drop when n = 4, 5 - the original network contained fragile co-adapted features on successive layers - features that interact with each other in a complex or fragile way such that this co-adaptation could not be relearned by the upper layers alone. BnB + : fine-tuning thus prevents the performance drop (BnB) AnB: the combination of the drop from lost co-adaptation and the drop from features that are less and less general AnB + : transferring features + fine-tuning: generalize better than directly training on the target dataset
16 Dissimilar Datasets (Man-made/Natural splits) & Random Weights
17 Summary If first-layer features are general and last-layer features are specific, then there must be a transition from general to specific somewhere in the network.
18 Summary If first-layer features are general and last-layer features are specific, then there must be a transition from general to specific somewhere in the network. Experiments: fine-tuned v.s. frozen features on selffer and transfer networks on Similar Datasets (Random A/B splits): - performance degradation when using transferred features without fine-tuning: i) the specificity of the features themselves ii) the fragile co-adaption between neurons on neighboring layers - initializing a network with transferred features from almost any number of layers can produce a boost to generalization performance after fine-tuning to a new dataset on Dissimilar Datasets (Man-made/Natural splits): the more dissimilar the base task and target task are, the more performance drops on Random Weights: find lower performance on the relatively large ImageNet dataset than has been previously reported for smaller datasets when using features computed from random lower-layer weights vs. trained weights
19 Outline How transferable are features in deep neural networks? NIPS 14 Distilling the Knowledge in a Neural Network. NIPS 14 DL workshop Introduction Distillation Experiments: Distilled Models and Specialist Models Discussion
20 Authors Distilling the Knowledge in a Neural Network NIPS 14 Deep Learning and Representation Learning Workshop 6 Geoffrey Hinton 1,2, Oriol Vinyals 1, Jeff Dean 1 1 Google Inc.(Mountain View) 2 University of Toronto and the Canadian Institute for Advanced Research Equal contribution 6
21 Introduction The Story for Analogy: insects a larval form: optimized for extracting energy and nutrients from the environment insects an adult form: optimized for the very different requirements of traveling and reproduction
22 Introduction The Story for Analogy: insects a larval form: optimized for extracting energy and nutrients from the environment insects an adult form: optimized for the very different requirements of traveling and reproduction In large-scale machine learning (e.g. speech and object recognition) the training stage: extract structure from very large, highly redundant datasets; not in real time; can use a huge amount of computation the deployment stage: has much more stringent requirements on latency and computational resources We should be willing to train very cumbersome models if that makes it easier to extract structure from the data.
23 Introduction(cont.) Cumbersome models: an ensemble of separately trained models a single very large model trained with a very strong regularizer 7 Bucilua C, Caruana R, Niculescu-Mizil A. Model compression[c].proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, 2006:
24 Cumbersome models: Introduction(cont.) an ensemble of separately trained models a single very large model trained with a very strong regularizer Distillation: Once the cumbersome model has been trained: using a different kind of training to transfer the knowledge from the cumbersome model to a small model that is more suitable for deployment. previous work 7 : demonstrate convincingly that the knowledge acquired by a large ensemble of models can be transferred to a single small model knowledge in a trained model: a learned mapping from input vectors to output vectors 7 Bucilua C, Caruana R, Niculescu-Mizil A. Model compression[c].proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, 2006:
25 Introduction(cont.) Models are usually trained to optimize performance on the training data when the real objective is to generalize well to new data. Information about the correct way to generalize is not normally available. Distillation: transfer the generalization ability of the cumbersome model to a small model use the class probabilities produced by the cumbersome model as soft targets for training the small model 8 use the same training set or a separate transfer set for the transfer stage 8 Caruana et al., SIGKDD 06: using the using the logits (the inputs to the final softmax) for transferring
26 Distillation Neural networks for multi-class classification: a softmax output layer q i = exp(z i/t ) j exp(z j/t ) Cumbersome models the distilled model: z i : the logit, i.e. the input to the softmax layer q i : the class probability computed by the softmax layer T : a temperature that is normally set to 1 training the distilled model on a transfer set using a soft target distribution for each case in the transfer set that is produced by using the cumbersome model with a high temperature in its softmax The same high temperature is used when training the distilled model, but after it has been trained it uses a temperature of 1.
27 Distillation(cont.) When the correct labels are known for all or some of the transfer set, this method can be significantly improved by also training the distilled model to produce the correct labels. use the correct labels (hard targets) to modify the soft targets simply use a weighted average of two different objective functions Objective 1: the cross entropy with the soft targets (cumbersome and distilled models: using same high temperature) Objective 2: the cross entropy with the correct labels (using exactly the same logits in softmax of the distilled model but at a temperature of 1)
28 Distilled Models: on MNIST 10 A Single Large Neural Net: 2 hidden layers, 1200 rectified linear hidden units per layer, on all 60, 000 training cases, strongly regularized using dropout and weight-constraints 9 67 test errors A Small Model: 2 hidden layers, 800 hidden units per layer, no regularization 146 test errors 9 G. E. Hinton, N. Srivastava, A. Krizhevsky, I. Sutskever, and R. R. Salakhutdinov. Improving neural networks by preventing co-adaptation of feature detectors. arxiv preprint arxiv: , Handwriting Recognition Dataset
29 Distilled Models: on MNIST 10 A Single Large Neural Net: 2 hidden layers, 1200 rectified linear hidden units per layer, on all 60, 000 training cases, strongly regularized using dropout and weight-constraints 9 67 test errors A Small Model: 2 hidden layers, 800 hidden units per layer, no regularization 146 test errors additionally matching the soft targets of the large net at T = test errors or more units per hidden layer: T > 8, fairly similar results - 30 units per hidden layer: T [2.5, 4], significantly better than higher or lower temperatures 9 G. E. Hinton, N. Srivastava, A. Krizhevsky, I. Sutskever, and R. R. Salakhutdinov. Improving neural networks by preventing co-adaptation of feature detectors. arxiv preprint arxiv: , Handwriting Recognition Dataset
30 Distilled Models: on MNIST 10 A Single Large Neural Net: 2 hidden layers, 1200 rectified linear hidden units per layer, on all 60, 000 training cases, strongly regularized using dropout and weight-constraints 9 67 test errors A Small Model: 2 hidden layers, 800 hidden units per layer, no regularization 146 test errors additionally matching the soft targets of the large net at T = test errors or more units per hidden layer: T > 8, fairly similar results - 30 units per hidden layer: T [2.5, 4], significantly better than higher or lower temperatures omitting all examples of the digit 3 from the transfer set 206 test errors (133/1010 threes) - fine-tune bias for the 3 class 109 test errors (14/1010 threes) in the transfer set only containing the digit 7 and 8 from the training set: 47.3% test errors - fine-tune bias for the 7 and 8 class:13.2% test errors 9 G. E. Hinton, N. Srivastava, A. Krizhevsky, I. Sutskever, and R. R. Salakhutdinov. Improving neural networks by preventing co-adaptation of feature detectors. arxiv preprint arxiv: , Handwriting Recognition Dataset
31 Distilled Models: on Speech Recognition The Objective of Automatic Speech Recognition (ASR): 11 A Single Large Neural Net: 8 hidden layers, 2560 rectified linear units per layer a final softmax layer with 14, 000 labels (HMM targets h t) the total number of parameters: about 85M training set: 2000 hours of spoken English data, about 700M training examples Distilled Models: distilled under different temperatures:{1, 2, 5, 10} using a relative weight of 0.5 on the cross-entropy for the hard targets A single model distilled from an ensemble of models works significantly better than a model of the same size that is learned directly from the same training data. 11 map a (short) temporal context of features derived from the waveform to a probability distribution over the discrete states of a Hidden Markov Model (HMM)
32 Specialist Models: on Image Annotation Training an ensemble of models: An ensemble requires too much computation at test time can be dealt with by using distillation. If the individual models are large neural networks and the dataset is very large, the amount of computation required at training time is excessive, even though it is easy to parallelize. Learning specialist models: to show how learning specialist models that each focus on a different confusable subset of the classes can reduce the total amount of computation required to learn an ensemble to show how the overfitting of training specialist models may be prevented by using soft targets
33 Specialist Models: on Image Annotation(cont.) JFT: an internal Google dataset, 100 million labeled images, 15, 000 labels A Generalist Model: Google s baseline model for JFT a deep CNN had been trained for about six months using asynchronous stochastic gradient descent on a large number of cores used two types of parallelism Specialist Models: trained on data that is highly enriched in examples from a very confusable subset of the classes (e.g different types of mushroom) The softmax of this type of specialist can be made much smaller by combining all of the classes it does not care about into a single dustbin class. To reduce overfitting and share the work of learning lower level feature detectors: - each specialist model is initialized with the weights of the generalist model - training examples: 1/2 from its special subset, 1/2 sampled randomly from the remainder of the training set
34 Soft Targets as Regularizers Using soft targets instead of hard targets: A lot of helpful information can be carried in soft targets that could not possibly be encoded with a single hard target. Using far less data to fit the 85M parameters of the baseline speech model: Soft targets allow a new model to generalize well from only 3% of the training set. Using soft targets to prevent specialists from overfitting: If using a full softmax over all classes for specialists: soft targets may be a much better way to prevent them overfitting than using early stopping If a specialist is initialized with the weights of the generalist, we can make it retain nearly all of its knowledge about the non-special classes by training it with soft targets for the non-special classes in addition to training it with hard targets.
35 Summary The training stage and the deployment stage have different requirements.
36 Summary The training stage and the deployment stage have different requirements. Distillation: Cumbersome models transferring knowledge by matching soft targets (and hard targets) A small, distilled model On MNIST: - works well even when the transfer set that is used to train the distilled model lacks any examples of one or more of the classes On Speech Recognition: - nearly all of the improvement that is achieved by training an ensemble of deep neural nets can be distilled into a single neural net of the same size which is far easier to deploy. On Image Annotation: Specialist Models - the performance of a single really big net that has been trained for a very long time can be significantly improved by learning a large number of specialist nets. - Have not yet shown that this method can distill the knowledge in the specialists back into the single large net. Distillation v.s. Mixture of Experts, Transfer Learning
37 Appendix jspan/surveytl.htm
38 Appendix (cont.)
39 Appendix (cont.)
System Implementation for SemEval-2017 Task 4 Subtask A Based on Interpolated Deep Neural Networks
System Implementation for SemEval-2017 Task 4 Subtask A Based on Interpolated Deep Neural Networks 1 Tzu-Hsuan Yang, 2 Tzu-Hsuan Tseng, and 3 Chia-Ping Chen Department of Computer Science and Engineering
More informationPython Machine Learning
Python Machine Learning Unlock deeper insights into machine learning with this vital guide to cuttingedge predictive analytics Sebastian Raschka [ PUBLISHING 1 open source I community experience distilled
More informationarxiv: v1 [cs.lg] 15 Jun 2015
Dual Memory Architectures for Fast Deep Learning of Stream Data via an Online-Incremental-Transfer Strategy arxiv:1506.04477v1 [cs.lg] 15 Jun 2015 Sang-Woo Lee Min-Oh Heo School of Computer Science and
More informationAutoregressive product of multi-frame predictions can improve the accuracy of hybrid models
Autoregressive product of multi-frame predictions can improve the accuracy of hybrid models Navdeep Jaitly 1, Vincent Vanhoucke 2, Geoffrey Hinton 1,2 1 University of Toronto 2 Google Inc. ndjaitly@cs.toronto.edu,
More informationarxiv: v1 [cs.lg] 7 Apr 2015
Transferring Knowledge from a RNN to a DNN William Chan 1, Nan Rosemary Ke 1, Ian Lane 1,2 Carnegie Mellon University 1 Electrical and Computer Engineering, 2 Language Technologies Institute Equal contribution
More informationKnowledge Transfer in Deep Convolutional Neural Nets
Knowledge Transfer in Deep Convolutional Neural Nets Steven Gutstein, Olac Fuentes and Eric Freudenthal Computer Science Department University of Texas at El Paso El Paso, Texas, 79968, U.S.A. Abstract
More informationQuickStroke: An Incremental On-line Chinese Handwriting Recognition System
QuickStroke: An Incremental On-line Chinese Handwriting Recognition System Nada P. Matić John C. Platt Λ Tony Wang y Synaptics, Inc. 2381 Bering Drive San Jose, CA 95131, USA Abstract This paper presents
More informationLecture 1: Machine Learning Basics
1/69 Lecture 1: Machine Learning Basics Ali Harakeh University of Waterloo WAVE Lab ali.harakeh@uwaterloo.ca May 1, 2017 2/69 Overview 1 Learning Algorithms 2 Capacity, Overfitting, and Underfitting 3
More informationUnsupervised Learning of Word Semantic Embedding using the Deep Structured Semantic Model
Unsupervised Learning of Word Semantic Embedding using the Deep Structured Semantic Model Xinying Song, Xiaodong He, Jianfeng Gao, Li Deng Microsoft Research, One Microsoft Way, Redmond, WA 98052, U.S.A.
More informationDual-Memory Deep Learning Architectures for Lifelong Learning of Everyday Human Behaviors
Proceedings of the Twenty-Fifth International Joint Conference on Artificial Intelligence (IJCAI-6) Dual-Memory Deep Learning Architectures for Lifelong Learning of Everyday Human Behaviors Sang-Woo Lee,
More informationAssignment 1: Predicting Amazon Review Ratings
Assignment 1: Predicting Amazon Review Ratings 1 Dataset Analysis Richard Park r2park@acsmail.ucsd.edu February 23, 2015 The dataset selected for this assignment comes from the set of Amazon reviews for
More informationTaxonomy-Regularized Semantic Deep Convolutional Neural Networks
Taxonomy-Regularized Semantic Deep Convolutional Neural Networks Wonjoon Goo 1, Juyong Kim 1, Gunhee Kim 1, Sung Ju Hwang 2 1 Computer Science and Engineering, Seoul National University, Seoul, Korea 2
More informationSemantic Segmentation with Histological Image Data: Cancer Cell vs. Stroma
Semantic Segmentation with Histological Image Data: Cancer Cell vs. Stroma Adam Abdulhamid Stanford University 450 Serra Mall, Stanford, CA 94305 adama94@cs.stanford.edu Abstract With the introduction
More informationDeep search. Enhancing a search bar using machine learning. Ilgün Ilgün & Cedric Reichenbach
#BaselOne7 Deep search Enhancing a search bar using machine learning Ilgün Ilgün & Cedric Reichenbach We are not researchers Outline I. Periscope: A search tool II. Goals III. Deep learning IV. Applying
More informationHIERARCHICAL DEEP LEARNING ARCHITECTURE FOR 10K OBJECTS CLASSIFICATION
HIERARCHICAL DEEP LEARNING ARCHITECTURE FOR 10K OBJECTS CLASSIFICATION Atul Laxman Katole 1, Krishna Prasad Yellapragada 1, Amish Kumar Bedi 1, Sehaj Singh Kalra 1 and Mynepalli Siva Chaitanya 1 1 Samsung
More informationCSL465/603 - Machine Learning
CSL465/603 - Machine Learning Fall 2016 Narayanan C Krishnan ckn@iitrpr.ac.in Introduction CSL465/603 - Machine Learning 1 Administrative Trivia Course Structure 3-0-2 Lecture Timings Monday 9.55-10.45am
More informationModule 12. Machine Learning. Version 2 CSE IIT, Kharagpur
Module 12 Machine Learning 12.1 Instructional Objective The students should understand the concept of learning systems Students should learn about different aspects of a learning system Students should
More informationTHE enormous growth of unstructured data, including
INTL JOURNAL OF ELECTRONICS AND TELECOMMUNICATIONS, 2014, VOL. 60, NO. 4, PP. 321 326 Manuscript received September 1, 2014; revised December 2014. DOI: 10.2478/eletel-2014-0042 Deep Image Features in
More informationModeling function word errors in DNN-HMM based LVCSR systems
Modeling function word errors in DNN-HMM based LVCSR systems Melvin Jose Johnson Premkumar, Ankur Bapna and Sree Avinash Parchuri Department of Computer Science Department of Electrical Engineering Stanford
More informationSecond Exam: Natural Language Parsing with Neural Networks
Second Exam: Natural Language Parsing with Neural Networks James Cross May 21, 2015 Abstract With the advent of deep learning, there has been a recent resurgence of interest in the use of artificial neural
More informationTraining a Neural Network to Answer 8th Grade Science Questions Steven Hewitt, An Ju, Katherine Stasaski
Training a Neural Network to Answer 8th Grade Science Questions Steven Hewitt, An Ju, Katherine Stasaski Problem Statement and Background Given a collection of 8th grade science questions, possible answer
More informationLearning Structural Correspondences Across Different Linguistic Domains with Synchronous Neural Language Models
Learning Structural Correspondences Across Different Linguistic Domains with Synchronous Neural Language Models Stephan Gouws and GJ van Rooyen MIH Medialab, Stellenbosch University SOUTH AFRICA {stephan,gvrooyen}@ml.sun.ac.za
More informationModeling function word errors in DNN-HMM based LVCSR systems
Modeling function word errors in DNN-HMM based LVCSR systems Melvin Jose Johnson Premkumar, Ankur Bapna and Sree Avinash Parchuri Department of Computer Science Department of Electrical Engineering Stanford
More informationLearning From the Past with Experiment Databases
Learning From the Past with Experiment Databases Joaquin Vanschoren 1, Bernhard Pfahringer 2, and Geoff Holmes 2 1 Computer Science Dept., K.U.Leuven, Leuven, Belgium 2 Computer Science Dept., University
More informationTRANSFER LEARNING OF WEAKLY LABELLED AUDIO. Aleksandr Diment, Tuomas Virtanen
TRANSFER LEARNING OF WEAKLY LABELLED AUDIO Aleksandr Diment, Tuomas Virtanen Tampere University of Technology Laboratory of Signal Processing Korkeakoulunkatu 1, 33720, Tampere, Finland firstname.lastname@tut.fi
More informationA Simple VQA Model with a Few Tricks and Image Features from Bottom-up Attention
A Simple VQA Model with a Few Tricks and Image Features from Bottom-up Attention Damien Teney 1, Peter Anderson 2*, David Golub 4*, Po-Sen Huang 3, Lei Zhang 3, Xiaodong He 3, Anton van den Hengel 1 1
More informationCS Machine Learning
CS 478 - Machine Learning Projects Data Representation Basic testing and evaluation schemes CS 478 Data and Testing 1 Programming Issues l Program in any platform you want l Realize that you will be doing
More informationCalibration of Confidence Measures in Speech Recognition
Submitted to IEEE Trans on Audio, Speech, and Language, July 2010 1 Calibration of Confidence Measures in Speech Recognition Dong Yu, Senior Member, IEEE, Jinyu Li, Member, IEEE, Li Deng, Fellow, IEEE
More informationDropout improves Recurrent Neural Networks for Handwriting Recognition
2014 14th International Conference on Frontiers in Handwriting Recognition Dropout improves Recurrent Neural Networks for Handwriting Recognition Vu Pham,Théodore Bluche, Christopher Kermorvant, and Jérôme
More informationDiverse Concept-Level Features for Multi-Object Classification
Diverse Concept-Level Features for Multi-Object Classification Youssef Tamaazousti 12 Hervé Le Borgne 1 Céline Hudelot 2 1 CEA, LIST, Laboratory of Vision and Content Engineering, F-91191 Gif-sur-Yvette,
More informationDeep Neural Network Language Models
Deep Neural Network Language Models Ebru Arısoy, Tara N. Sainath, Brian Kingsbury, Bhuvana Ramabhadran IBM T.J. Watson Research Center Yorktown Heights, NY, 10598, USA {earisoy, tsainath, bedk, bhuvana}@us.ibm.com
More informationResidual Stacking of RNNs for Neural Machine Translation
Residual Stacking of RNNs for Neural Machine Translation Raphael Shu The University of Tokyo shu@nlab.ci.i.u-tokyo.ac.jp Akiva Miura Nara Institute of Science and Technology miura.akiba.lr9@is.naist.jp
More informationarxiv: v2 [cs.ir] 22 Aug 2016
Exploring Deep Space: Learning Personalized Ranking in a Semantic Space arxiv:1608.00276v2 [cs.ir] 22 Aug 2016 ABSTRACT Jeroen B. P. Vuurens The Hague University of Applied Science Delft University of
More informationUsing Deep Convolutional Neural Networks in Monte Carlo Tree Search
Using Deep Convolutional Neural Networks in Monte Carlo Tree Search Tobias Graf (B) and Marco Platzner University of Paderborn, Paderborn, Germany tobiasg@mail.upb.de, platzner@upb.de Abstract. Deep Convolutional
More informationOPTIMIZATINON OF TRAINING SETS FOR HEBBIAN-LEARNING- BASED CLASSIFIERS
OPTIMIZATINON OF TRAINING SETS FOR HEBBIAN-LEARNING- BASED CLASSIFIERS Václav Kocian, Eva Volná, Michal Janošek, Martin Kotyrba University of Ostrava Department of Informatics and Computers Dvořákova 7,
More informationarxiv: v2 [cs.cl] 26 Mar 2015
Effective Use of Word Order for Text Categorization with Convolutional Neural Networks Rie Johnson RJ Research Consulting Tarrytown, NY, USA riejohnson@gmail.com Tong Zhang Baidu Inc., Beijing, China Rutgers
More informationAUTOMATIC DETECTION OF PROLONGED FRICATIVE PHONEMES WITH THE HIDDEN MARKOV MODELS APPROACH 1. INTRODUCTION
JOURNAL OF MEDICAL INFORMATICS & TECHNOLOGIES Vol. 11/2007, ISSN 1642-6037 Marek WIŚNIEWSKI *, Wiesława KUNISZYK-JÓŹKOWIAK *, Elżbieta SMOŁKA *, Waldemar SUSZYŃSKI * HMM, recognition, speech, disorders
More informationIntroduction to Ensemble Learning Featuring Successes in the Netflix Prize Competition
Introduction to Ensemble Learning Featuring Successes in the Netflix Prize Competition Todd Holloway Two Lecture Series for B551 November 20 & 27, 2007 Indiana University Outline Introduction Bias and
More informationA New Perspective on Combining GMM and DNN Frameworks for Speaker Adaptation
A New Perspective on Combining GMM and DNN Frameworks for Speaker Adaptation SLSP-2016 October 11-12 Natalia Tomashenko 1,2,3 natalia.tomashenko@univ-lemans.fr Yuri Khokhlov 3 khokhlov@speechpro.com Yannick
More informationHuman Emotion Recognition From Speech
RESEARCH ARTICLE OPEN ACCESS Human Emotion Recognition From Speech Miss. Aparna P. Wanare*, Prof. Shankar N. Dandare *(Department of Electronics & Telecommunication Engineering, Sant Gadge Baba Amravati
More informationA Deep Bag-of-Features Model for Music Auto-Tagging
1 A Deep Bag-of-Features Model for Music Auto-Tagging Juhan Nam, Member, IEEE, Jorge Herrera, and Kyogu Lee, Senior Member, IEEE latter is often referred to as music annotation and retrieval, or simply
More informationGeorgetown University at TREC 2017 Dynamic Domain Track
Georgetown University at TREC 2017 Dynamic Domain Track Zhiwen Tang Georgetown University zt79@georgetown.edu Grace Hui Yang Georgetown University huiyang@cs.georgetown.edu Abstract TREC Dynamic Domain
More informationIEEE/ACM TRANSACTIONS ON AUDIO, SPEECH AND LANGUAGE PROCESSING, VOL XXX, NO. XXX,
IEEE/ACM TRANSACTIONS ON AUDIO, SPEECH AND LANGUAGE PROCESSING, VOL XXX, NO. XXX, 2017 1 Small-footprint Highway Deep Neural Networks for Speech Recognition Liang Lu Member, IEEE, Steve Renals Fellow,
More informationCourse Outline. Course Grading. Where to go for help. Academic Integrity. EE-589 Introduction to Neural Networks NN 1 EE
EE-589 Introduction to Neural Assistant Prof. Dr. Turgay IBRIKCI Room # 305 (322) 338 6868 / 139 Wensdays 9:00-12:00 Course Outline The course is divided in two parts: theory and practice. 1. Theory covers
More informationГлубокие рекуррентные нейронные сети для аспектно-ориентированного анализа тональности отзывов пользователей на различных языках
Глубокие рекуррентные нейронные сети для аспектно-ориентированного анализа тональности отзывов пользователей на различных языках Тарасов Д. С. (dtarasov3@gmail.com) Интернет-портал reviewdot.ru, Казань,
More informationCultivating DNN Diversity for Large Scale Video Labelling
Cultivating DNN Diversity for Large Scale Video Labelling Mikel Bober-Irizar mikel@mxbi.net Sameed Husain sameed.husain@surrey.ac.uk Miroslaw Bober m.bober@surrey.ac.uk Eng-Jon Ong e.ong@surrey.ac.uk Abstract
More information(Sub)Gradient Descent
(Sub)Gradient Descent CMSC 422 MARINE CARPUAT marine@cs.umd.edu Figures credit: Piyush Rai Logistics Midterm is on Thursday 3/24 during class time closed book/internet/etc, one page of notes. will include
More informationLip Reading in Profile
CHUNG AND ZISSERMAN: BMVC AUTHOR GUIDELINES 1 Lip Reading in Profile Joon Son Chung http://wwwrobotsoxacuk/~joon Andrew Zisserman http://wwwrobotsoxacuk/~az Visual Geometry Group Department of Engineering
More informationSpeech Recognition at ICSI: Broadcast News and beyond
Speech Recognition at ICSI: Broadcast News and beyond Dan Ellis International Computer Science Institute, Berkeley CA Outline 1 2 3 The DARPA Broadcast News task Aspects of ICSI
More informationarxiv: v1 [cs.cl] 27 Apr 2016
The IBM 2016 English Conversational Telephone Speech Recognition System George Saon, Tom Sercu, Steven Rennie and Hong-Kwang J. Kuo IBM T. J. Watson Research Center, Yorktown Heights, NY, 10598 gsaon@us.ibm.com
More informationA study of speaker adaptation for DNN-based speech synthesis
A study of speaker adaptation for DNN-based speech synthesis Zhizheng Wu, Pawel Swietojanski, Christophe Veaux, Steve Renals, Simon King The Centre for Speech Technology Research (CSTR) University of Edinburgh,
More informationLinking Task: Identifying authors and book titles in verbose queries
Linking Task: Identifying authors and book titles in verbose queries Anaïs Ollagnier, Sébastien Fournier, and Patrice Bellot Aix-Marseille University, CNRS, ENSAM, University of Toulon, LSIS UMR 7296,
More informationSemi-Supervised GMM and DNN Acoustic Model Training with Multi-system Combination and Confidence Re-calibration
INTERSPEECH 2013 Semi-Supervised GMM and DNN Acoustic Model Training with Multi-system Combination and Confidence Re-calibration Yan Huang, Dong Yu, Yifan Gong, and Chaojun Liu Microsoft Corporation, One
More informationBUILDING CONTEXT-DEPENDENT DNN ACOUSTIC MODELS USING KULLBACK-LEIBLER DIVERGENCE-BASED STATE TYING
BUILDING CONTEXT-DEPENDENT DNN ACOUSTIC MODELS USING KULLBACK-LEIBLER DIVERGENCE-BASED STATE TYING Gábor Gosztolya 1, Tamás Grósz 1, László Tóth 1, David Imseng 2 1 MTA-SZTE Research Group on Artificial
More informationOn the Formation of Phoneme Categories in DNN Acoustic Models
On the Formation of Phoneme Categories in DNN Acoustic Models Tasha Nagamine Department of Electrical Engineering, Columbia University T. Nagamine Motivation Large performance gap between humans and state-
More informationLearning to Schedule Straight-Line Code
Learning to Schedule Straight-Line Code Eliot Moss, Paul Utgoff, John Cavazos Doina Precup, Darko Stefanović Dept. of Comp. Sci., Univ. of Mass. Amherst, MA 01003 Carla Brodley, David Scheeff Sch. of Elec.
More informationLearning Methods for Fuzzy Systems
Learning Methods for Fuzzy Systems Rudolf Kruse and Andreas Nürnberger Department of Computer Science, University of Magdeburg Universitätsplatz, D-396 Magdeburg, Germany Phone : +49.39.67.876, Fax : +49.39.67.8
More informationarxiv: v4 [cs.cl] 28 Mar 2016
LSTM-BASED DEEP LEARNING MODELS FOR NON- FACTOID ANSWER SELECTION Ming Tan, Cicero dos Santos, Bing Xiang & Bowen Zhou IBM Watson Core Technologies Yorktown Heights, NY, USA {mingtan,cicerons,bingxia,zhou}@us.ibm.com
More informationPREDICTING SPEECH RECOGNITION CONFIDENCE USING DEEP LEARNING WITH WORD IDENTITY AND SCORE FEATURES
PREDICTING SPEECH RECOGNITION CONFIDENCE USING DEEP LEARNING WITH WORD IDENTITY AND SCORE FEATURES Po-Sen Huang, Kshitiz Kumar, Chaojun Liu, Yifan Gong, Li Deng Department of Electrical and Computer Engineering,
More informationarxiv:submit/ [cs.cv] 2 Aug 2017
Associative Domain Adaptation Philip Haeusser 1,2 haeusser@in.tum.de Thomas Frerix 1 Alexander Mordvintsev 2 thomas.frerix@tum.de moralex@google.com 1 Dept. of Informatics, TU Munich 2 Google, Inc. Daniel
More informationWord Segmentation of Off-line Handwritten Documents
Word Segmentation of Off-line Handwritten Documents Chen Huang and Sargur N. Srihari {chuang5, srihari}@cedar.buffalo.edu Center of Excellence for Document Analysis and Recognition (CEDAR), Department
More informationImprovements to the Pruning Behavior of DNN Acoustic Models
Improvements to the Pruning Behavior of DNN Acoustic Models Matthias Paulik Apple Inc., Infinite Loop, Cupertino, CA 954 mpaulik@apple.com Abstract This paper examines two strategies that positively influence
More informationDistributed Learning of Multilingual DNN Feature Extractors using GPUs
Distributed Learning of Multilingual DNN Feature Extractors using GPUs Yajie Miao, Hao Zhang, Florian Metze Language Technologies Institute, School of Computer Science, Carnegie Mellon University Pittsburgh,
More informationArtificial Neural Networks written examination
1 (8) Institutionen för informationsteknologi Olle Gällmo Universitetsadjunkt Adress: Lägerhyddsvägen 2 Box 337 751 05 Uppsala Artificial Neural Networks written examination Monday, May 15, 2006 9 00-14
More informationA Compact DNN: Approaching GoogLeNet-Level Accuracy of Classification and Domain Adaptation
A Compact DNN: Approaching GoogLeNet-Level Accuracy of Classification and Domain Adaptation Chunpeng Wu 1, Wei Wen 1, Tariq Afzal 2, Yongmei Zhang 2, Yiran Chen 3, and Hai (Helen) Li 3 1 Electrical and
More informationMachine Learning and Data Mining. Ensembles of Learners. Prof. Alexander Ihler
Machine Learning and Data Mining Ensembles of Learners Prof. Alexander Ihler Ensemble methods Why learn one classifier when you can learn many? Ensemble: combine many predictors (Weighted) combina
More informationSARDNET: A Self-Organizing Feature Map for Sequences
SARDNET: A Self-Organizing Feature Map for Sequences Daniel L. James and Risto Miikkulainen Department of Computer Sciences The University of Texas at Austin Austin, TX 78712 dljames,risto~cs.utexas.edu
More informationAxiom 2013 Team Description Paper
Axiom 2013 Team Description Paper Mohammad Ghazanfari, S Omid Shirkhorshidi, Farbod Samsamipour, Hossein Rahmatizadeh Zagheli, Mohammad Mahdavi, Payam Mohajeri, S Abbas Alamolhoda Robotics Scientific Association
More informationThere are some definitions for what Word
Word Embeddings and Their Use In Sentence Classification Tasks Amit Mandelbaum Hebrew University of Jerusalm amit.mandelbaum@mail.huji.ac.il Adi Shalev bitan.adi@gmail.com arxiv:1610.08229v1 [cs.lg] 26
More informationBuild on students informal understanding of sharing and proportionality to develop initial fraction concepts.
Recommendation 1 Build on students informal understanding of sharing and proportionality to develop initial fraction concepts. Students come to kindergarten with a rudimentary understanding of basic fraction
More informationSpeech Emotion Recognition Using Support Vector Machine
Speech Emotion Recognition Using Support Vector Machine Yixiong Pan, Peipei Shen and Liping Shen Department of Computer Technology Shanghai JiaoTong University, Shanghai, China panyixiong@sjtu.edu.cn,
More informationProduct Feature-based Ratings foropinionsummarization of E-Commerce Feedback Comments
Product Feature-based Ratings foropinionsummarization of E-Commerce Feedback Comments Vijayshri Ramkrishna Ingale PG Student, Department of Computer Engineering JSPM s Imperial College of Engineering &
More informationADVANCED MACHINE LEARNING WITH PYTHON BY JOHN HEARTY DOWNLOAD EBOOK : ADVANCED MACHINE LEARNING WITH PYTHON BY JOHN HEARTY PDF
Read Online and Download Ebook ADVANCED MACHINE LEARNING WITH PYTHON BY JOHN HEARTY DOWNLOAD EBOOK : ADVANCED MACHINE LEARNING WITH PYTHON BY JOHN HEARTY PDF Click link bellow and free register to download
More informationLecture 1: Basic Concepts of Machine Learning
Lecture 1: Basic Concepts of Machine Learning Cognitive Systems - Machine Learning Ute Schmid (lecture) Johannes Rabold (practice) Based on slides prepared March 2005 by Maximilian Röglinger, updated 2010
More informationON THE USE OF WORD EMBEDDINGS ALONE TO
ON THE USE OF WORD EMBEDDINGS ALONE TO REPRESENT NATURAL LANGUAGE SEQUENCES Anonymous authors Paper under double-blind review ABSTRACT To construct representations for natural language sequences, information
More informationRobust Speech Recognition using DNN-HMM Acoustic Model Combining Noise-aware training with Spectral Subtraction
INTERSPEECH 2015 Robust Speech Recognition using DNN-HMM Acoustic Model Combining Noise-aware training with Spectral Subtraction Akihiro Abe, Kazumasa Yamamoto, Seiichi Nakagawa Department of Computer
More informationModel Ensemble for Click Prediction in Bing Search Ads
Model Ensemble for Click Prediction in Bing Search Ads Xiaoliang Ling Microsoft Bing xiaoling@microsoft.com Hucheng Zhou Microsoft Research huzho@microsoft.com Weiwei Deng Microsoft Bing dedeng@microsoft.com
More informationImage based Static Facial Expression Recognition with Multiple Deep Network Learning
Image based Static Facial Expression Recognition with Multiple Deep Network Learning ABSTRACT Zhiding Yu Carnegie Mellon University 5000 Forbes Ave Pittsburgh, PA 1521 yzhiding@andrew.cmu.edu We report
More informationA Review: Speech Recognition with Deep Learning Methods
Available Online at www.ijcsmc.com International Journal of Computer Science and Mobile Computing A Monthly Journal of Computer Science and Information Technology IJCSMC, Vol. 4, Issue. 5, May 2015, pg.1017
More informationSORT: Second-Order Response Transform for Visual Recognition
SORT: Second-Order Response Transform for Visual Recognition Yan Wang 1, Lingxi Xie 2( ), Chenxi Liu 2, Siyuan Qiao 2 Ya Zhang 1( ), Wenjun Zhang 1, Qi Tian 3, Alan Yuille 2 1 Cooperative Medianet Innovation
More informationDevice Independence and Extensibility in Gesture Recognition
Device Independence and Extensibility in Gesture Recognition Jacob Eisenstein, Shahram Ghandeharizadeh, Leana Golubchik, Cyrus Shahabi, Donghui Yan, Roger Zimmermann Department of Computer Science University
More informationAttributed Social Network Embedding
JOURNAL OF LATEX CLASS FILES, VOL. 14, NO. 8, MAY 2017 1 Attributed Social Network Embedding arxiv:1705.04969v1 [cs.si] 14 May 2017 Lizi Liao, Xiangnan He, Hanwang Zhang, and Tat-Seng Chua Abstract Embedding
More informationGenerative models and adversarial training
Day 4 Lecture 1 Generative models and adversarial training Kevin McGuinness kevin.mcguinness@dcu.ie Research Fellow Insight Centre for Data Analytics Dublin City University What is a generative model?
More informationOffline Writer Identification Using Convolutional Neural Network Activation Features
Pattern Recognition Lab Department Informatik Universität Erlangen-Nürnberg Prof. Dr.-Ing. habil. Andreas Maier Telefon: +49 9131 85 27775 Fax: +49 9131 303811 info@i5.cs.fau.de www5.cs.fau.de Offline
More informationA Case Study: News Classification Based on Term Frequency
A Case Study: News Classification Based on Term Frequency Petr Kroha Faculty of Computer Science University of Technology 09107 Chemnitz Germany kroha@informatik.tu-chemnitz.de Ricardo Baeza-Yates Center
More informationIndian Institute of Technology, Kanpur
Indian Institute of Technology, Kanpur Course Project - CS671A POS Tagging of Code Mixed Text Ayushman Sisodiya (12188) {ayushmn@iitk.ac.in} Donthu Vamsi Krishna (15111016) {vamsi@iitk.ac.in} Sandeep Kumar
More informationSoftprop: Softmax Neural Network Backpropagation Learning
Softprop: Softmax Neural Networ Bacpropagation Learning Michael Rimer Computer Science Department Brigham Young University Provo, UT 84602, USA E-mail: mrimer@axon.cs.byu.edu Tony Martinez Computer Science
More informationAn Online Handwriting Recognition System For Turkish
An Online Handwriting Recognition System For Turkish Esra Vural, Hakan Erdogan, Kemal Oflazer, Berrin Yanikoglu Sabanci University, Tuzla, Istanbul, Turkey 34956 ABSTRACT Despite recent developments in
More informationProbabilistic Latent Semantic Analysis
Probabilistic Latent Semantic Analysis Thomas Hofmann Presentation by Ioannis Pavlopoulos & Andreas Damianou for the course of Data Mining & Exploration 1 Outline Latent Semantic Analysis o Need o Overview
More informationA Neural Network GUI Tested on Text-To-Phoneme Mapping
A Neural Network GUI Tested on Text-To-Phoneme Mapping MAARTEN TROMPPER Universiteit Utrecht m.f.a.trompper@students.uu.nl Abstract Text-to-phoneme (T2P) mapping is a necessary step in any speech synthesis
More informationCopyright by Sung Ju Hwang 2013
Copyright by Sung Ju Hwang 2013 The Dissertation Committee for Sung Ju Hwang certifies that this is the approved version of the following dissertation: Discriminative Object Categorization with External
More informationRule Learning With Negation: Issues Regarding Effectiveness
Rule Learning With Negation: Issues Regarding Effectiveness S. Chua, F. Coenen, G. Malcolm University of Liverpool Department of Computer Science, Ashton Building, Ashton Street, L69 3BX Liverpool, United
More informationPOS tagging of Chinese Buddhist texts using Recurrent Neural Networks
POS tagging of Chinese Buddhist texts using Recurrent Neural Networks Longlu Qin Department of East Asian Languages and Cultures longlu@stanford.edu Abstract Chinese POS tagging, as one of the most important
More informationarxiv: v2 [cs.cv] 4 Mar 2016
MULTI-SCALE CONTEXT AGGREGATION BY DILATED CONVOLUTIONS Fisher Yu Princeton University Vladlen Koltun Intel Labs arxiv:1511.07122v2 [cs.cv] 4 Mar 2016 ABSTRACT State-of-the-art models for semantic segmentation
More informationLecture 2: Quantifiers and Approximation
Lecture 2: Quantifiers and Approximation Case study: Most vs More than half Jakub Szymanik Outline Number Sense Approximate Number Sense Approximating most Superlative Meaning of most What About Counting?
More informationA Latent Semantic Model with Convolutional-Pooling Structure for Information Retrieval
A Latent Semantic Model with Convolutional-Pooling Structure for Information Retrieval Yelong Shen Microsoft Research Redmond, WA, USA yeshen@microsoft.com Xiaodong He Jianfeng Gao Li Deng Microsoft Research
More informationarxiv: v2 [stat.ml] 30 Apr 2016 ABSTRACT
UNSUPERVISED AND SEMI-SUPERVISED LEARNING WITH CATEGORICAL GENERATIVE ADVERSARIAL NETWORKS Jost Tobias Springenberg University of Freiburg 79110 Freiburg, Germany springj@cs.uni-freiburg.de arxiv:1511.06390v2
More informationarxiv: v4 [cs.cv] 13 Aug 2017
Ruben Villegas 1 * Jimei Yang 2 Yuliang Zou 1 Sungryull Sohn 1 Xunyu Lin 3 Honglak Lee 1 4 arxiv:1704.05831v4 [cs.cv] 13 Aug 17 Abstract We propose a hierarchical approach for making long-term predictions
More informationExploration. CS : Deep Reinforcement Learning Sergey Levine
Exploration CS 294-112: Deep Reinforcement Learning Sergey Levine Class Notes 1. Homework 4 due on Wednesday 2. Project proposal feedback sent Today s Lecture 1. What is exploration? Why is it a problem?
More informationExperiments with SMS Translation and Stochastic Gradient Descent in Spanish Text Author Profiling
Experiments with SMS Translation and Stochastic Gradient Descent in Spanish Text Author Profiling Notebook for PAN at CLEF 2013 Andrés Alfonso Caurcel Díaz 1 and José María Gómez Hidalgo 2 1 Universidad
More information