Tapas Joshi Atefeh Mahdavi Chandan Patil. Semi-Supervised Learning with Ladder Networks CSE 5290 Artificial Intelligence

Similar documents
Python Machine Learning

(Sub)Gradient Descent

Module 12. Machine Learning. Version 2 CSE IIT, Kharagpur

Lecture 1: Machine Learning Basics

Artificial Neural Networks written examination

OPTIMIZATINON OF TRAINING SETS FOR HEBBIAN-LEARNING- BASED CLASSIFIERS

Deep search. Enhancing a search bar using machine learning. Ilgün Ilgün & Cedric Reichenbach

Course Outline. Course Grading. Where to go for help. Academic Integrity. EE-589 Introduction to Neural Networks NN 1 EE

Learning Structural Correspondences Across Different Linguistic Domains with Synchronous Neural Language Models

System Implementation for SemEval-2017 Task 4 Subtask A Based on Interpolated Deep Neural Networks

QuickStroke: An Incremental On-line Chinese Handwriting Recognition System

Word Segmentation of Off-line Handwritten Documents

A Neural Network GUI Tested on Text-To-Phoneme Mapping

Unsupervised Learning of Word Semantic Embedding using the Deep Structured Semantic Model

Artificial Neural Networks

Autoregressive product of multi-frame predictions can improve the accuracy of hybrid models

Evolution of Symbolisation in Chimpanzees and Neural Nets

Modeling function word errors in DNN-HMM based LVCSR systems

arxiv: v2 [stat.ml] 30 Apr 2016 ABSTRACT

CSL465/603 - Machine Learning

Modeling function word errors in DNN-HMM based LVCSR systems

Model Ensemble for Click Prediction in Bing Search Ads

Learning Methods for Fuzzy Systems

arxiv: v2 [cs.cv] 30 Mar 2017

ADVANCED MACHINE LEARNING WITH PYTHON BY JOHN HEARTY DOWNLOAD EBOOK : ADVANCED MACHINE LEARNING WITH PYTHON BY JOHN HEARTY PDF

Generative models and adversarial training

Laboratorio di Intelligenza Artificiale e Robotica

Knowledge Transfer in Deep Convolutional Neural Nets

arxiv: v1 [cs.lg] 15 Jun 2015

Speech Recognition at ICSI: Broadcast News and beyond

Robust Speech Recognition using DNN-HMM Acoustic Model Combining Noise-aware training with Spectral Subtraction

Forget catastrophic forgetting: AI that learns after deployment

HIERARCHICAL DEEP LEARNING ARCHITECTURE FOR 10K OBJECTS CLASSIFICATION

A New Perspective on Combining GMM and DNN Frameworks for Speaker Adaptation

Evolutive Neural Net Fuzzy Filtering: Basic Description

Testing A Moving Target: How Do We Test Machine Learning Systems? Peter Varhol Technology Strategy Research, USA

The 9 th International Scientific Conference elearning and software for Education Bucharest, April 25-26, / X

TRANSFER LEARNING OF WEAKLY LABELLED AUDIO. Aleksandr Diment, Tuomas Virtanen

INPE São José dos Campos

have to be modeled) or isolated words. Output of the system is a grapheme-tophoneme conversion system which takes as its input the spelling of words,

Softprop: Softmax Neural Network Backpropagation Learning

Semi-supervised methods of text processing, and an application to medical concept extraction. Yacine Jernite Text-as-Data series September 17.

Proposal of Pattern Recognition as a necessary and sufficient principle to Cognitive Science

Analysis of Hybrid Soft and Hard Computing Techniques for Forex Monitoring Systems

OCR for Arabic using SIFT Descriptors With Online Failure Prediction

Language Acquisition Fall 2010/Winter Lexical Categories. Afra Alishahi, Heiner Drenhaus

Notes on The Sciences of the Artificial Adapted from a shorter document written for course (Deciding What to Design) 1

Laboratorio di Intelligenza Artificiale e Robotica

Introduction to Ensemble Learning Featuring Successes in the Netflix Prize Competition

On the Combined Behavior of Autonomous Resource Management Agents

Lecture 10: Reinforcement Learning

Iterative Cross-Training: An Algorithm for Learning from Unlabeled Web Pages

Abstractions and the Brain

Predicting Student Attrition in MOOCs using Sentiment Analysis and Neural Networks

Using focal point learning to improve human machine tacit coordination

An empirical study of learning speed in backpropagation

Human Emotion Recognition From Speech

Rule Learning With Negation: Issues Regarding Effectiveness

Knowledge-Based - Systems

SARDNET: A Self-Organizing Feature Map for Sequences

An Introduction to Simio for Beginners

A study of speaker adaptation for DNN-based speech synthesis

Activities, Exercises, Assignments Copyright 2009 Cem Kaner 1

Circuit Simulators: A Revolutionary E-Learning Platform

WHEN THERE IS A mismatch between the acoustic

Classification Using ANN: A Review

CS Machine Learning

Axiom 2013 Team Description Paper

arxiv: v1 [cs.lg] 7 Apr 2015

Test Effort Estimation Using Neural Network

Twitter Sentiment Classification on Sanders Data using Hybrid Approach

Soft Computing based Learning for Cognitive Radio

Reinforcement Learning by Comparing Immediate Reward

Seminar - Organic Computing

Citrine Informatics. The Latest from Citrine. Citrine Informatics. The data analytics platform for the physical world

Australian Journal of Basic and Applied Sciences

Time series prediction

A Case-Based Approach To Imitation Learning in Robotic Agents

Lecture 1: Basic Concepts of Machine Learning

Challenges in Deep Reinforcement Learning. Sergey Levine UC Berkeley

Probabilistic Latent Semantic Analysis

arxiv: v1 [cs.cl] 2 Apr 2017

Extending Place Value with Whole Numbers to 1,000,000

TRANSFER LEARNING IN MIR: SHARING LEARNED LATENT REPRESENTATIONS FOR MUSIC AUDIO CLASSIFICATION AND SIMILARITY

UNIDIRECTIONAL LONG SHORT-TERM MEMORY RECURRENT NEURAL NETWORK WITH RECURRENT OUTPUT LAYER FOR LOW-LATENCY SPEECH SYNTHESIS. Heiga Zen, Haşim Sak

Exploration. CS : Deep Reinforcement Learning Sergey Levine

Lip reading: Japanese vowel recognition by tracking temporal changes of lip shape

Entrepreneurial Discovery and the Demmert/Klein Experiment: Additional Evidence from Germany

Problems of the Arabic OCR: New Attitudes

Learning Methods in Multilingual Speech Recognition

arxiv: v2 [cs.ir] 22 Aug 2016

Глубокие рекуррентные нейронные сети для аспектно-ориентированного анализа тональности отзывов пользователей на различных языках

Second Exam: Natural Language Parsing with Neural Networks

Semantic Segmentation with Histological Image Data: Cancer Cell vs. Stroma

FUZZY EXPERT. Dr. Kasim M. Al-Aubidy. Philadelphia University. Computer Eng. Dept February 2002 University of Damascus-Syria

A Case Study: News Classification Based on Term Frequency

Training a Neural Network to Answer 8th Grade Science Questions Steven Hewitt, An Ju, Katherine Stasaski

Learning to Schedule Straight-Line Code

Deep Neural Network Language Models

Courses in English. Application Development Technology. Artificial Intelligence. 2017/18 Spring Semester. Database access

Transcription:

1. Introduction Semi-Supervised Learning with Ladder Networks CSE 5290 Artificial Intelligence Group 2 In this modern era of autonomous cars and deep learning, pure supervised learning is widely popular and trending. Supervised learning is when an agent is fed with several examples (training data) to learn features which are already labeled and after the learning process, test data is given to the agent to examine how accurate the agent is. The key point here is that our data needs to be labeled for the agent to make sense out of it and storing useful knowledge from it which can further be used to solve the challenges presented to an agent. In general, supervised learning is used mostly for the traditional classification problem. However, there are some learning approaches that does not or partially require labelling like the unsupervised learning. Unsupervised learning, on the other hand is when an agent is not provided with labeled data set unlike the supervised learning and the training data is completely unlabeled. The agent evolves by finding patterns and assigning weights to more probable outcomes and storing the desired knowledge for the test data challenge. Unsupervised learning is mostly used for clustering problems where data is bunched into similar groups. Unsupervised learning is widely used for machine learning, pattern recognition, image analysis, computer graphics and much more. Unsupervised learning is not much efficient when used for classification as it needs to label the groups after clustering. These above-mentioned approaches either work well given a lot of labelled data or given nothing but solving clustering problems. However, for supervised learning, labelling data is expensive to collect and this is one of the biggest drawbacks of supervised learning. Deep learning was inspired by the human brain[1] where several million neurons combine to form a single thought in our mind. As it is said Practice makes you perfect is true as repetition of one task several times would make that certain million neurons fire at the same time, starting from a cluttered, unorganized manner but performing it over and over would fire those specific neurons so many times that the electric charge between those specific neurons increase to an extent where we can perform the same task with so much ease that it can be completed within seconds. Deep learning tries to mimic this aspect of our brain and then neural networks are born. The only problem with supervised learning is that humans have not been evolved to use supervised learning. In our olden times, we used to discover new things and discovery is not associated with supervised learning. Discovery falls somewhere between supervised and semi-supervised learning which is semi-supervised learning That s why, Hinton says it, We expect unsupervised learning to become far more important in the longer term. Human and animal learning is largely unsupervised: we discover the structure of the world by observing it, not by being told the name of every object. [2] In semi-supervised learning, the agent is provided with a very few labeled data in addition to a large amount of unlabeled data as training dataset and then it tries to classify from the test data. This approach excels supervised learning from the point of labelling where we only must label a few

hundreds of data rather than labelling millions of them. Semi-supervised learning gets the best of both worlds: Supervised and Unsupervised learning; solving the classification problem. Semi-supervised can be combined with several approaches and in this project, we combine semi-supervised learning with ladder networks. Valpola (2015) proposed ladder network where it combines the supervised and unsupervised learning and both are used to train simultaneously rather than training in a traditional way where unsupervised is used for pre-training followed by pure supervised learning [3]. This minimizes the sum of unsupervised and supervised cost functions using back-propagation which avoids the need for pre-training. In this project, we follow the footsteps of the authors of the paper Semi-supervised learning with ladder network [3] and we prove how efficient semi-supervised learning is compared to supervised learning. We also study the implementation of this paper in python. 2. Overview As stated above, semi-supervised learning uses a very small amount of labeled data and a very huge amount of unlabeled data for training. But how does semi-supervised learning use unlabeled data for classification? Let s look at an example to clarify that. Figure 2.1 Example 1 Classification from semi-supervised learning (Before and After) Here, let s assume that we have only two labeled data; A black and a white dot that is clearly visible in Figure 2.1 (Before). Now as we can see that there are a lot of other grey dots which are nothing but unlabeled data which we will use for training. If we consider smoothness assumption, we conclude that labels are homogeneous in densely populated space, that is data points that are near to each other can be considered to be in the same group. Now if we take this assumption and iterate this over and over, we finally train our classifier to be almost perfect as shown in Figure 2.1 (After). If we use this assumption, then we can get higher accuracy even though we use plentiful of unlabeled data. In real world, there are a lot of unlabeled data and very scare amount of labeled data as labelling is expensive. Ladder network use the above approach and combines supervised and unsupervised network to get even more accuracy and a minute loss. Ladder network combines the supervised and unsupervised learning and both of them are used for training simultaneously rather than training in a traditional way where unsupervised is used for pre-training followed by pure supervised learning [3].

We used MNIST dataset for our testing and achieved remarkable accuracy after the full training. While ladder networks are astonishing, there are three key factors for ladder networks: 1. Compatibility Ladder networks are compatible with any supervised learning methods and can be added to any existing feedforward neural networks. It can also be extended to any recurrent neural networks. 2. Scalability resulting from local learning Ladder networks can be scaled for some very deep neural networks as this model has unsupervised learning targets on each layer besides the traditional supervised learning targets. 3. Computational efficiency Including the decoder triples the computation, however it does not affect the training time. 3. Algorithm For the basics, let s break down the algorithm into four basic steps: a. As we are working with a traditional recurrent neural network, we use the encode-decoder functions for input-output sequence mapping. First let s take a feedforward neural network which aids supervised learning as an encoder. We use two encoder paths One clean and One Corrupted. b. Next, we add a decoder that inverts the mapping on each layer of the encoder as the decoder uses a denoising function (Denoising Source Separation) for removing noise. As it is also used for reconstruction, the difference between the reconstructed output and the clean source is the unsupervised cost. c. Then, we calculate the supervised cost from the difference of target output and corrupted encoder output. The final cost is the summation of the supervised and unsupervised costs. final cost = supervised cost + unsupervised cost d. The last step is to train the whole network using standard optimization techniques like stochastic gradient descent to minimize the cost. Let s dive in more into the technical implementation of the algorithm. First let s represent the labeled and unlabeled data. Note that labeled data and unlabeled data are in two groups (x and y) and the labeled data is scare and unlabeled data is plentiful. Labelled data: {xx tt, yy tt } 1 tt NN Unlabelled data: {xx tt } NN + 1 tt MM Often labelled data is scarce, unlabelled data is plentiful: NN MM

Then we corrupt the input xx tt by adding some Gaussian noise and we get xx tt Encoder function ff: xx h Decoder function gg: h xx Reconstruction xx Figure 3.1: EncoderDecoder functions Then we use the Denoising AutoEncoder (DAE) for reconstruction and denoising function. As shown in figure 3.1 (b), we can see the when the corrupted input is fed, it generates the output which is robust to the noise and reconstructs as precise as the input. Encoder: bb = ff(xx ) = φφ WW (1) xx + bb (1) Decoder: xx = gg(h) = WW (2) h + bb (2) Figure 3.2: Combining DSS and DAE In figure 3.2, we combine the Denoising Source Seperation (DSS) and Denoising AutoEncoder (DAE) which forms a ladder network of degree 1. C x is the checking function and it is not going to be used for training. It will be used in the testing phase.

Figure 3.3: Ladder network architecture In figure 3.3, we see the whole degree 3 Ladder network. Here the image of numeric digit 7 is passed as input and then Gaussian noise (NN) is added to the input. Then we have the encoder function for climbing up ladder and then the decoder function comes while stepping down the ladder. Everything is parallelly checked with the function C for testing and increasing accuracy.

4. Results We have tested the code with the MNIST database and we got 98.9% maximum accuracy. However, we also tested on various machines and found some unique results and patterns. Below, we have attached some code snippets. Figure 4.1: Code snippet: Initialization In figure 4.1, we see that we are using Tensorflow api from Google. For passing the labeled data for encoder, we are only using 100 images out of 60,000 images (MNIST dataset). We are also applying batch normalization for normalizing our data and also defining several layer sizes. For the output, we started with approximately 11% accuracy and got 98.55% in CPU as seen in Figure 4.2. Figure 4.2: Output results

In figure 4.3, we can see the different comparisons with different machines and the one with the most powerful GPU performed in least amount of time with the maximum accuracy. The CPU took almost 7 hours to train while the base GPU took around 2 hours and the best GPU (NVIDIA 770) took us only 45 minutes. The time was reduced 10 times compared to the CPU. 5. Conclusion Figure 4.3: Results: Time Comparison To put everything in a nut shell, supervised learning is a classification problem, in which the training set is given and the agent hypothesizes a function that maps from input to output based on observation of some input-out pairs. In unsupervised learning which is a clustering problem, given a completely unknown set, the agent tries to learn. If we divide the inputs into groups based on parameters the learning process is going be a classification. In the implementation of Ladder Network, we found that the performance of this method is very impressive. This method is simple and easy to implement with many architectures like feed-forward networks and recurrent neural networks. The training is based on back propagation from a simple cost function. This method is compatible with supervised methods, moreover, it has local unsupervised learning target on every layer which provides adaptability for very deep neural networks. Using the

decoder also triples the computation during training so the result can be achieved faster through the better utilization. 6. References [1] Plunkett, K., and J. L. Elman. "Exercises in rethinking innateness: A handbook for connectionist experiments." (1997). [2] LeCun, Bengio, Hinton, Nature 2015 [3] Rasmus, Antti, Mathias Berglund, Mikko Honkala, Harri Valpola, and Tapani Raiko. "Semi-supervised learning with ladder networks." In Advances in Neural Information Processing Systems, pp. 3546-3554. 2015.