Introduction to Deep Learning

Similar documents
Module 12. Machine Learning. Version 2 CSE IIT, Kharagpur

System Implementation for SemEval-2017 Task 4 Subtask A Based on Interpolated Deep Neural Networks

Lecture 1: Machine Learning Basics

Python Machine Learning

CSL465/603 - Machine Learning

Axiom 2013 Team Description Paper

HIERARCHICAL DEEP LEARNING ARCHITECTURE FOR 10K OBJECTS CLASSIFICATION

Lecture 1: Basic Concepts of Machine Learning

QuickStroke: An Incremental On-line Chinese Handwriting Recognition System

CS4491/CS 7265 BIG DATA ANALYTICS INTRODUCTION TO THE COURSE. Mingon Kang, PhD Computer Science, Kennesaw State University

Human Emotion Recognition From Speech

Deep search. Enhancing a search bar using machine learning. Ilgün Ilgün & Cedric Reichenbach

Course Outline. Course Grading. Where to go for help. Academic Integrity. EE-589 Introduction to Neural Networks NN 1 EE

Learning Methods for Fuzzy Systems

Laboratorio di Intelligenza Artificiale e Robotica

Forget catastrophic forgetting: AI that learns after deployment

ADVANCED MACHINE LEARNING WITH PYTHON BY JOHN HEARTY DOWNLOAD EBOOK : ADVANCED MACHINE LEARNING WITH PYTHON BY JOHN HEARTY PDF

Reinforcement Learning by Comparing Immediate Reward

A Simple VQA Model with a Few Tricks and Image Features from Bottom-up Attention

Second Exam: Natural Language Parsing with Neural Networks

arxiv: v1 [cs.lg] 15 Jun 2015

Knowledge Transfer in Deep Convolutional Neural Nets

Artificial Neural Networks written examination

Twitter Sentiment Classification on Sanders Data using Hybrid Approach

Dual-Memory Deep Learning Architectures for Lifelong Learning of Everyday Human Behaviors

Rule Learning With Negation: Issues Regarding Effectiveness

Word Segmentation of Off-line Handwritten Documents

Predicting Student Attrition in MOOCs using Sentiment Analysis and Neural Networks

Semantic Segmentation with Histological Image Data: Cancer Cell vs. Stroma

Time series prediction

Autoregressive product of multi-frame predictions can improve the accuracy of hybrid models

Generative models and adversarial training

Test Effort Estimation Using Neural Network

Laboratorio di Intelligenza Artificiale e Robotica

(Sub)Gradient Descent

LEGO MINDSTORMS Education EV3 Coding Activities

CS 446: Machine Learning

OPTIMIZATINON OF TRAINING SETS FOR HEBBIAN-LEARNING- BASED CLASSIFIERS

Training a Neural Network to Answer 8th Grade Science Questions Steven Hewitt, An Ju, Katherine Stasaski

Rule Learning with Negation: Issues Regarding Effectiveness

Testing A Moving Target: How Do We Test Machine Learning Systems? Peter Varhol Technology Strategy Research, USA

A Case Study: News Classification Based on Term Frequency

Introduction and Motivation

Deep Neural Network Language Models

FUZZY EXPERT. Dr. Kasim M. Al-Aubidy. Philadelphia University. Computer Eng. Dept February 2002 University of Damascus-Syria

Model Ensemble for Click Prediction in Bing Search Ads

Artificial Neural Networks

Learning to Schedule Straight-Line Code

arxiv: v1 [cs.cv] 10 May 2017

Softprop: Softmax Neural Network Backpropagation Learning

OCR for Arabic using SIFT Descriptors With Online Failure Prediction

CS Machine Learning

Dialog-based Language Learning

An investigation of imitation learning algorithms for structured prediction

The 9 th International Scientific Conference elearning and software for Education Bucharest, April 25-26, / X

A Neural Network GUI Tested on Text-To-Phoneme Mapping

Active Learning. Yingyu Liang Computer Sciences 760 Fall

Purdue Data Summit Communication of Big Data Analytics. New SAT Predictive Validity Case Study

Evolutive Neural Net Fuzzy Filtering: Basic Description

Seminar - Organic Computing

Top US Tech Talent for the Top China Tech Company

Machine Learning from Garden Path Sentences: The Application of Computational Linguistics

A New Perspective on Combining GMM and DNN Frameworks for Speaker Adaptation

B. How to write a research paper

arxiv: v2 [cs.cl] 26 Mar 2015

Learning From the Past with Experiment Databases

Knowledge-Based - Systems

Speech Recognition at ICSI: Broadcast News and beyond

Modeling function word errors in DNN-HMM based LVCSR systems

Semi-supervised methods of text processing, and an application to medical concept extraction. Yacine Jernite Text-as-Data series September 17.

OFFICE SUPPORT SPECIALIST Technical Diploma

Mining Association Rules in Student s Assessment Data

Rule discovery in Web-based educational systems using Grammar-Based Genetic Programming

THE enormous growth of unstructured data, including

ReinForest: Multi-Domain Dialogue Management Using Hierarchical Policies and Knowledge Ontology

IAT 888: Metacreation Machines endowed with creative behavior. Philippe Pasquier Office 565 (floor 14)

Глубокие рекуррентные нейронные сети для аспектно-ориентированного анализа тональности отзывов пользователей на различных языках

Georgetown University at TREC 2017 Dynamic Domain Track

COURSE SYNOPSIS COURSE OBJECTIVES. UNIVERSITI SAINS MALAYSIA School of Management

Unsupervised Learning of Word Semantic Embedding using the Deep Structured Semantic Model

Speech Emotion Recognition Using Support Vector Machine

Challenges in Deep Reinforcement Learning. Sergey Levine UC Berkeley

arxiv: v2 [cs.cv] 4 Mar 2016

Exploration. CS : Deep Reinforcement Learning Sergey Levine

SORT: Second-Order Response Transform for Visual Recognition

An Evaluation of E-Resources in Academic Libraries in Tamil Nadu

Exposé for a Master s Thesis

2017 Florence, Italty Conference Abstract

Computerized Adaptive Psychological Testing A Personalisation Perspective

INNOWIZ: A GUIDING FRAMEWORK FOR PROJECTS IN INDUSTRIAL DESIGN EDUCATION

Discriminative Learning of Beam-Search Heuristics for Planning

Learning Optimal Dialogue Strategies: A Case Study of a Spoken Dialogue Agent for

CS 1103 Computer Science I Honors. Fall Instructor Muller. Syllabus

Classification Using ANN: A Review

ATW 202. Business Research Methods

DOUBLE DEGREE PROGRAM AT EURECOM. June 2017 Caroline HANRAS International Relations Manager

Natural Language Processing. George Konidaris

SARDNET: A Self-Organizing Feature Map for Sequences

Citrine Informatics. The Latest from Citrine. Citrine Informatics. The data analytics platform for the physical world

Introduction to Ensemble Learning Featuring Successes in the Netflix Prize Competition

Transcription:

IIT Patna 1 Introduction to Deep Learning Arijit Mondal Dept. of Computer Science & Engineering Indian Institute of Technology Patna arijit@iitp.ac.in

Course structure IIT Patna 2 Introduction to big data problem & representation learning Overview of linear algebra and probability Basics of feature engineering Neural network Introduction to open-source tools Deep learning network Regularization Optimization Advanced topics Practical applications

Evaluation policy IIT Patna 3 Mid-sem - 20% Project - 40%-60% End-sem - 20%-40% Paper presentation - 10% (Depending on class size)

Project & Presentation IIT Patna 4 Group wise project A group can have 2-3 students (Depending on class size) Each group will be assigned papers for presentation in the class Presentation duration 30 minutes

Books IIT Patna 5 Deep Learning - Ian Goodfellow, Yoshua Bengio, Aaron Courville The Elements of Statistical Learning - Jerome H Friedman, Robert Tibshirani, Trevor Hastie Reinforcement Learning: An Introduction - Richard S Sutton, Andrew G Barto

Acknowledgement IIT Patna 6 Deep Learning Book by Ian Goodfellow, Yoshua Bengio, Aaron Courville Presentation by Yann LeCun, Geoff Hinton, Yoshua Bengio Various websites for images Dr. Jacob Minz (Synopsys) IIT KGP Batch of 2001 Joydeep Acharya (Hitachi) Sanjeev Kumar (Liv.AI) Mithun Dasgupta (Microsoft) Amit Kumar (Avnera) Mrinmoy Ghosh (Facebook) Animesh Datta (Qualcomm) Bhaskar Saha (PARC) Banit Agrawal (Facebook)

Introduction IIT Patna 7

Problem Solving Strategies for Big Data IIT Patna 8 Need to solve problems efficiently and accurately when the input data is huge ( GB, TB order) Finding a deterministic algorithm is difficult Need to find out features Requires significant effort for model building Need to have domain knowledge Statistical inference is found to be suitable Feature selection is not crucial Model will learn from past data

Applications: Computer vision IIT Patna 9 2d to 3d conversion Street view generation Image classifications Image segmentation Image source: Internet

Applications: Activity Recognition IIT Patna 10 Recognize activities like walking, running, cooking, etc. from still image or video data Image source: Internet

Applications: Image Captioning Automated caption generation for a given image Image source: Internet IIT Patna 11

Applications: Object Identification IIT Patna 12 Identify objects in still image or in video stream Image source: Internet

Applications: Automated Car IIT Patna 13 Self driving car Image source: Internet

Applications: Drones & Robots IIT Patna 14 Managing movement of robot or drones Image source: Internet

Applications: Natural Language Processing IIT Patna 15 Recommender system Sentiment analysis Question answering Information extraction from website Automated email reply Image source: Internet

Applications: Speech processing IIT Patna 16 Conversion of speech into text Generation of particular voice for the given text Image source: Internet

Other possible applications IIT Patna 17 Write a story/text and generate a video/image of it Conversion of speech from one language to another language in real time Weather prediction Genomics Drug discovery Particle physics

Issue of Representation IIT Patna 18 Representation of data in an efficient/structured manner is crucial for solving problems more effectively Searching of a set of elements in a given list (sorted/unsorted) Arithmetic operations on Arabic and Roman numerals Primality test of n when n is represented as 11111... 111 (n-number of one) Structured representation can help in predicting future values

Learning representation/feature Traditional approaches Pattern recognition Input, output of the problem End to end learning System automatically learns internal representation IIT Patna 19

AI-ML Tasks IIT Patna 20 Heavily depends on features Requires good domain knowledge Feature extraction is not easy job Identify a car How to describe wheel Shadow/brightness Obscuring element

Representation Learning IIT Patna 21 Learned representation often result in better performance compared to hand design Allows the system to rapidly adapt to new task Need to discover a good set of features Manual design of features is nearly impossible

Design of Features IIT Patna 22 Goal is to separate out variation factors These factors are separate sources of influence It may exist as unobserved object or unobserved forces that affect observable quantity Speech - Factors are age, sex, accent, etc Image - Position, color, brightness, etc.

Deep Learning IIT Patna 23 Try to address the problem of representation learning Representation are expressed in terms of other simpler representation Develop complex concept using simpler concept

Simple to Complex Features IIT Patna 24 Image source: Deep Learning Book

Simple to Complex Features IIT Patna 25 Image source: Deep Learning Tutorial by Yann LeCun Marc Aurelio Ranzato, ICML, 2013

Conventional Machine Learning IIT Patna 26 Image source: Deep Learning by Yann LeCun, Yoshua Bengio & Geoffrey Hinton

Deep Learning Model IIT Patna 27 Feed forward deep network or multilayer perceptron Mathematical functions that map input to output Composed of simpler functions Each layer provides a new representation Learning right representation

Representation learning IIT Patna 28 Hand Rule based Input designed Output system program Classic Hand machine Input designed learning program Mapping from feature Output Mapping Input Feature from Output Deep Learning Input Feature feature Abstract feature Mapping from feature Output

Depth of network Number of sequential instruction must be executed to evaluate the architecture Length of the longest path Depth of the model Image source: Deep Learning Book IIT Patna 29

History IIT Patna 30 Has many names and view point Cybernetics (1940-1960) Connectionism (1980-1990) (neural net) Deep learning (2006+) More useful as the amount of data is increased Models have grown in size as increase in computing resources Solving complex problem with increasing accuracy

Learning Algorithm Early learning algorithm How learning happen in brain? Computational model of biological learning Neural perspective of DL Brains provide a proof by example Reverse engineer the computational principle behind the brain and duplicate its functionality IIT Patna 31

History of basic model The first learning machine: the Perceptron Built at Cornell, 1960 The perceptron was simple linear classifier on top of simple feature extractor Most of the practical applications of ML today use glorified linear classifiers or glorified template matching. Significant effort is required from the expert for identifying relevant features ( N ) Typically it will solve y = sign (w i f i (X ) + b) x 2 x 1 b w 2 w 1 1 i=1 IIT Patna 32 0/1

Broad Categories of Problem IIT Patna 33 Regression Classification y y x x

Regression IIT Patna 34 Regression (linear) Regression (Non-linear) y y x x

Classification IIT Patna 35 Linear Non-linear y y x x

Artificial Neural Network A simple model x 2 w 22 w 21 x 1 2 w21 1 w20 1 w 20 w 12 out 1 w 1 11 x 1 w 11 x 1 1 w 1 10 w 10 b 1 1 1 x 1 0 w 1 01 w 1 00 out 0 IIT Patna 36

Example NN: AND gate IIT Patna 37 x 2 w 2 x 1 w 1 0/1 b 1 x 2 x 1

Example NN: AND gate IIT Patna 37 x 2 w 2 x 1 w 1 0/1 1.5 b 1 x 2 x 1

Example NN: AND gate IIT Patna 37 x 2 w 2 1 x 1 w 1 1 0/1 1.5 b 1 x 2 x 1

Example NN: AND gate IIT Patna 37 x 2 w 2 1 x 10 1 w 1 1 0/1 1.5 b 1 x 2 x 1

Example NN: AND gate IIT Patna 37 x 21 2 w 2 1 x 10 1 w 1 1 0/1 1.5 b 1 x 2 x 1

Example NN: AND gate IIT Patna 37 x 21 2 w 2 1 x 10 1 w 1 1 0.5 0/10 1.5 b 1 x 2 x 1

Example NN: AND gate IIT Patna 37 x 21 2 w 2 1 x 10 1 w 1 1 0.5 0/10 1.5 b 1 x 2 x 1

Example NN: XOR gate IIT Patna 38 x 2 x 1

Example NN: XOR gate IIT Patna 38 x 2 x 1

Example NN: XOR gate IIT Patna 38 x 2 x 2 x 1 x 1

Distributed representation IIT Patna 39 Each input should be represented by many features Each feature should be involved in the representation of many possible inputs Example: car, flower, birds red, green, blue 9 neurons For each combination of color and object Distributed neurons 3 Neurons for color 3 Neurons for object Total 6 neurons

Popularization of Neural Network Most of the theory of neural network was developed in the 1980s Started gaining popularity around 4-5 years ago Geoffrey Hinton and Alex Krizhevsky winning the ImageNet competition where they beat the nearest competitor by a huge margin (2012) Image source: Deep Residual Learning by Kaiming He, et.al. IIT Patna 40

Popularity Increase data size Computing resources are available Accepting performance 5000 labeled example per category 10 million for human performance Increasing model size Increasing accuracy, complexity, real world impact Used by many companies Google, Microsoft, Facebook, IBM, Baidu, Apple, Adobe, Nvidia, NEC, etc. Availability of good commercial & open-source tools Theano, Torch, DistBelief, Caffe, TensorFlow, Keras, etc. IIT Patna 41