DEEP LEARNING AND ITS APPLICATION NEURAL NETWORK BASICS

Similar documents
Artificial Neural Networks written examination

Course Outline. Course Grading. Where to go for help. Academic Integrity. EE-589 Introduction to Neural Networks NN 1 EE

Artificial Neural Networks

A Neural Network GUI Tested on Text-To-Phoneme Mapping

Lecture 1: Machine Learning Basics

Python Machine Learning

Softprop: Softmax Neural Network Backpropagation Learning

Module 12. Machine Learning. Version 2 CSE IIT, Kharagpur

SARDNET: A Self-Organizing Feature Map for Sequences

arxiv: v1 [cs.lg] 15 Jun 2015

Evolutive Neural Net Fuzzy Filtering: Basic Description

(Sub)Gradient Descent

System Implementation for SemEval-2017 Task 4 Subtask A Based on Interpolated Deep Neural Networks

*** * * * COUNCIL * * CONSEIL OFEUROPE * * * DE L'EUROPE. Proceedings of the 9th Symposium on Legal Data Processing in Europe

QuickStroke: An Incremental On-line Chinese Handwriting Recognition System

1 NETWORKS VERSUS SYMBOL SYSTEMS: TWO APPROACHES TO MODELING COGNITION

Axiom 2013 Team Description Paper

Human Emotion Recognition From Speech

Test Effort Estimation Using Neural Network

INPE São José dos Campos

Evolution of Symbolisation in Chimpanzees and Neural Nets

Framewise Phoneme Classification with Bidirectional LSTM and Other Neural Network Architectures

Knowledge Transfer in Deep Convolutional Neural Nets

Calibration of Confidence Measures in Speech Recognition

Unsupervised Learning of Word Semantic Embedding using the Deep Structured Semantic Model

An empirical study of learning speed in backpropagation

Analysis of Hybrid Soft and Hard Computing Techniques for Forex Monitoring Systems

Machine Learning and Data Mining. Ensembles of Learners. Prof. Alexander Ihler

Deep search. Enhancing a search bar using machine learning. Ilgün Ilgün & Cedric Reichenbach

OPTIMIZATINON OF TRAINING SETS FOR HEBBIAN-LEARNING- BASED CLASSIFIERS

A New Perspective on Combining GMM and DNN Frameworks for Speaker Adaptation

A Simple VQA Model with a Few Tricks and Image Features from Bottom-up Attention

Learning Methods for Fuzzy Systems

HIERARCHICAL DEEP LEARNING ARCHITECTURE FOR 10K OBJECTS CLASSIFICATION

DIRECT ADAPTATION OF HYBRID DNN/HMM MODEL FOR FAST SPEAKER ADAPTATION IN LVCSR BASED ON SPEAKER CODE

Proposal of Pattern Recognition as a necessary and sufficient principle to Cognitive Science

Lecture 10: Reinforcement Learning

Lecture 1: Basic Concepts of Machine Learning

arxiv: v1 [cs.cv] 10 May 2017

Autoregressive product of multi-frame predictions can improve the accuracy of hybrid models

Model Ensemble for Click Prediction in Bing Search Ads

Using the Attribute Hierarchy Method to Make Diagnostic Inferences about Examinees Cognitive Skills in Algebra on the SAT

Generative models and adversarial training

Forget catastrophic forgetting: AI that learns after deployment

Learning to Schedule Straight-Line Code

Discriminative Learning of Beam-Search Heuristics for Planning

PREDICTING SPEECH RECOGNITION CONFIDENCE USING DEEP LEARNING WITH WORD IDENTITY AND SCORE FEATURES

Introduction to Ensemble Learning Featuring Successes in the Netflix Prize Competition

Knowledge-Based - Systems

Neuro-Symbolic Approaches for Knowledge Representation in Expert Systems

Grade 6: Correlated to AGS Basic Math Skills

WHEN THERE IS A mismatch between the acoustic

Syntactic systematicity in sentence processing with a recurrent self-organizing network

Mathematics. Mathematics

Second Exam: Natural Language Parsing with Neural Networks

Analysis of Speech Recognition Models for Real Time Captioning and Post Lecture Transcription

FUZZY EXPERT. Dr. Kasim M. Al-Aubidy. Philadelphia University. Computer Eng. Dept February 2002 University of Damascus-Syria

phone hidden time phone

Deep Neural Network Language Models

POS tagging of Chinese Buddhist texts using Recurrent Neural Networks

AP Calculus AB. Nevada Academic Standards that are assessable at the local level only.

Time series prediction

Assignment 1: Predicting Amazon Review Ratings

Using focal point learning to improve human machine tacit coordination

Modeling function word errors in DNN-HMM based LVCSR systems

Classification Using ANN: A Review

CS 1103 Computer Science I Honors. Fall Instructor Muller. Syllabus

CS 101 Computer Science I Fall Instructor Muller. Syllabus

A Reinforcement Learning Variant for Control Scheduling

Dublin City Schools Mathematics Graded Course of Study GRADE 4

Radius STEM Readiness TM

School of Innovative Technologies and Engineering

How People Learn Physics

Device Independence and Extensibility in Gesture Recognition

Improvements to the Pruning Behavior of DNN Acoustic Models

University of Groningen. Systemen, planning, netwerken Bosman, Aart

Issues in the Mining of Heart Failure Datasets

Speech Emotion Recognition Using Support Vector Machine

A Review: Speech Recognition with Deep Learning Methods

FF+FPG: Guiding a Policy-Gradient Planner

Mathematics subject curriculum

CSL465/603 - Machine Learning

An Empirical and Computational Test of Linguistic Relativity

Data Fusion Through Statistical Matching

Learning Optimal Dialogue Strategies: A Case Study of a Spoken Dialogue Agent for

Distributed Learning of Multilingual DNN Feature Extractors using GPUs

Rover Races Grades: 3-5 Prep Time: ~45 Minutes Lesson Time: ~105 minutes

A study of speaker adaptation for DNN-based speech synthesis

On the Formation of Phoneme Categories in DNN Acoustic Models

The Evolution of Random Phenomena

CS Machine Learning

On the Combined Behavior of Autonomous Resource Management Agents

Learning Structural Correspondences Across Different Linguistic Domains with Synchronous Neural Language Models

MYCIN. The MYCIN Task

Phonetic- and Speaker-Discriminant Features for Speaker Recognition. Research Project

arxiv: v1 [cs.lg] 7 Apr 2015

Predicting Student Attrition in MOOCs using Sentiment Analysis and Neural Networks

Attributed Social Network Embedding

Testing A Moving Target: How Do We Test Machine Learning Systems? Peter Varhol Technology Strategy Research, USA

Accelerated Learning Online. Course Outline

Transcription:

DEEP LEARNING AND ITS APPLICATION NEURAL NETWORK BASICS

Argument on AI 1. Symbolism 2. Connectionism 3. Actionism Kai Yu. SJTU Deep Learning Lecture. 2

Argument on AI 1. Symbolism Symbolism AI Origin Cognitive element Core of AI Based on Representatives mathematic logic symbol knowledge and knowledgebased theoretical system hypothesis of symbol operation system and principle of limit reasonability Newell, Shaw, Simon and Nilsson Kai Yu. SJTU Deep Learning Lecture. 3

Argument on AI 2. Connectionism Symbolism AI Origin Cognitive element Core of AI Based on Representatives bionics neuron brain working mode NN and connectionism and learning algorithm between NN Meculloch-Pitts, Hopfield and Rumelhart Kai Yu. SJTU Deep Learning Lecture. 4

Argument on AI 3. Actionism Symbolism AI Origin Cognitive element Core of AI Based on Representatives mathematic logic perception and actions perception-action working mode cybernetics and control principle of perception action Winner, Brooks Kai Yu. SJTU Deep Learning Lecture. 5

Biological Neuron 10 billion neurons in human brain Summation of input stimuli Spatial (signals) Temporal (pulses) Threshold over composed inputs Constant firing strength 1,000,000 billion synapses in human brain Chemical transmission and modulation of signals Inhibitory synapses Excitatory synapses Kai Yu. SJTU Deep Learning Lecture. 6

Biological Neural Networks 100,000 synapses per neuron Computational power = connectivity Plasticity new connections strength of connections modified Kai Yu. SJTU Deep Learning Lecture. 7

Neural Dynamics 40 20 mv membrane rest activation 0 Action potential -20-40 -60-80 -100-120 Refractory time 0 10 20 30 40 50 60 70 80 90 100 ms Action potential 100mV Threshold potential -20~-30mV Rest potential -65mV Spike time 1-2ms Refractory time 10-20ms Kai Yu. SJTU Deep Learning Lecture. 8

Connectionist Model Kai Yu. SJTU Deep Learning Lecture. 9

What is an Artificial Neural Network? An artificial neural network is a network of many simple processors (neurons, units) Units are linked by connections Each connection has a weight associated with it Units operate only locally on their weights and the inputs received through connections Kai Yu. SJTU Deep Learning Lecture. 10

What is an Artificial Neural Network? An ANN is a massively parallel distributed processor made up of simple processing unit, which has a natural propensity for storing experimental knowledge and making it available for use. It resembles the brain in two respects: Knowledge is acquired by the network from its environment through a learning process Interneuron connection strengths, known as synaptic weights, are used to store the acquired knowledge Kai Yu. SJTU Deep Learning Lecture. 11

First Generation NN 1943, McCulloch and Pitts developed basic models of neurons. Perceptron with no hidden layer. Kai Yu. SJTU Deep Learning Lecture. 12

First Generation NN 1948, Wiener: cybernetics 1949, Hebb: learning rule (Hebb s rule) 1958, Rosenblatt: perceptron model and perceptron convergence algorithm 1960, Widrow-Hoff: least mean square algorithm 1969, Minsky-Papert: limitations of perceptron: (can not solve nonlinearly separable problems) Kai Yu. SJTU Deep Learning Lecture. 13

Second Generation NN Multi-Layer Perceptron (MLP) (80 90 ). Back-Propagation. Kai Yu. SJTU Deep Learning Lecture. 14

Second Generation NN 1980s, Stephen Grossberg: Adaptive resonance theory 1982, Hopfield: energy function, recurrent network model 1982, Kohonen: self-organizing maps 1986, Rumelhart, Hinton et. al.: back-propagation 1990s: Decline Require experience and skills Easy to over-train or be trapped in a local optima Hard to go deep Kai Yu. SJTU Deep Learning Lecture. 15

Renaissance of NN 2006, Geoffrey Hinton invented Deep Belief Networks (DBN) to allow fast and effective deep neural network learning. Pre-train each layer from bottom up Each pair of layers is an Restricted Boltzmann Machine(RBM) Jointly fine-tune all layers using backpropagation Kai Yu. SJTU Deep Learning Lecture. 16

Perceptron: The base for ANN Input variable: Output variable: Weights: Kai Yu. SJTU Deep Learning Lecture. 17

Activation Functions Hardlimit Step fun. Kai Yu. SJTU Deep Learning Lecture. 18

Decision Surface of a Perceptron A perceptron represents a decision surface in a d dimensional space as a hyper-plane Works only for those sets of examples that are linearly separable Many boolean functions can be represented by a perceptron: AND, OR, NAND,NOR Kai Yu. SJTU Deep Learning Lecture. 19

Example AND NAND Solid circle: 1 Hollow circle: 0 OR NOR Kai Yu. SJTU Deep Learning Lecture. 20

Example Kai Yu. SJTU Deep Learning Lecture. 21

Error Gradient Descent Given a lossfunction E(X, t, w) Ideal approach: closed-form solution r w E(X, t, w) =0 solving for w will be troublesome if not impossible. Practical approach: gradientdescent Start at some value of the weights Update the weights iteratively using If there is only one local optima, GD is guaranteed to converge Kai Yu. SJTU Deep Learning Lecture. 22

Gradient Descent Example Kai Yu. SJTU Deep Learning Lecture. 23

Stochastic Gradient Descent Gradient descent 2 (oneupdatewith { } 2 all data) MX w n = w n 1 r w E(X, t, w) =w n 1 r w E(x m,t m, w) Stochastic X gradient descent X (oneupdatewith a randomly selected single data) SGD is much faster than GD m=1 w n = w n 1 r w E(x m,t m, w) 2 { } 2 Kai Yu. SJTU Deep Learning Lecture. 24

Perceptron Algorithm Consider a perceptron with n inputs: (vector input) and n+1 weights: X For linearly separable data set 2N x i 2 R n t i 2 {+1, 1} X m = {(x 1,t 1 ),, (x m,t m )} X Howcan we find and undercriterion: E(X, t, w) = MX m=1 E(x m,t m, w) = X m2n err t m (w T x m +w 0 ) Kai Yu. SJTU Deep Learning Lecture. 25

Perceptron Convergence Theorem Stochastic gradient descent E(x m,t m, w) = t m (w T x m + w 0 ) m 2 N err w n = w n 1 r w E(x m,t m, w) =w n 1 + t m x m w0 n = w0 n 1 r w0 E(x m,t m, w) =w0 n 1 + t m If training data is linearly separable, the perceptron learning algorithm is guaranteed to find an exact solution in a finite number of steps. Kai Yu. SJTU Deep Learning Lecture. 26

Perceptron Algorithm Initialize weights and learning rate compute perceptron outputs apply SGD to update weights N outputs == targets Y output learned weights Kai Yu. SJTU Deep Learning Lecture. 27

Perceptron example Training set: Initial Weights Learning rate is set to 1. Kai Yu. SJTU Deep Learning Lecture. 28

Perceptron example First Iteration: Kai Yu. SJTU Deep Learning Lecture. 29

Perceptron example Second Iteration: Kai Yu. SJTU Deep Learning Lecture. 30

Perceptron example Check: Output: W, b Kai Yu. SJTU Deep Learning Lecture. 31

Example XOR? More Non-linear Layers Noway to get a solution for perception. self-contradictory! Kai Yu. SJTU Deep Learning Lecture. 32

Hidden Units: Multi-Layer NN Multi-Layer Perceptron (MLP) Kai Yu. SJTU Deep Learning Lecture. 33

Expressive Capabilities of NNs Boolean functions: Every Boolean function can be represented by a network with a single hidden layer But might require exponential (in number of inputs) hidden units Continuous functions: Every bounded continuous function can be approximated with arbitrary small error, by network with one hidden layer [Cybenko 1989; Hornik et al 1989] Any function can be approximated to arbitrary accuracy by a network with two hidden layers [Cybenko 1988]. Kai Yu. SJTU Deep Learning Lecture. 34

Expressive Capabilities of NNs Rough proof of Boolean functions How to construct such 2-Layer MLP x1 x2 x3 y 0 0 1 1 0 1 0 1 1 1 1 1... 0 w21=[1,1,1] OR cell w11=[0,0,1] w12=? w13=? x1 x2 x3 Selector cells y = [(NOT x 1 ) AND (NOT x 2 ) AND (x 3 )] OR [(NOT x 1 ) AND (x 2 ) AND (NOT x 3 )] OR [(x 1 ) AND (x 2 ) AND (x 3 )] Kai Yu. SJTU Deep Learning Lecture. 35

Non-linearActivation Function (HiddenLayer) Sigmoid Tangent When the activation function is non-linear, then a two-layer neural network can be proven to be a universal function approximator. Kai Yu. SJTU Deep Learning Lecture. 36

Output function: Logistic Regression The structure of logistic regression: Post-processing outside NN y (i) 2 {0, 1} X x (i) 2 R n Logistic (sigmoid) function denotes confidence of target class Kai Yu. SJTU Deep Learning Lecture. 37

Output function: Softmax Training data Softmax function constructs probability distribution of K- dimensional output (K classes) Kai Yu. SJTU Deep Learning Lecture. 38

NN Output Function Summary Linear output: j is input feature index, i is output class index Logistic Regression: denote confidence of each class Softmax: denote probability of each class When k=2, softmax is similar to logistic regression Kai Yu. SJTU Deep Learning Lecture. 39

NN Loss Function (Criterion for Param. Est.) Consider M output targets, N data samples Sum of square error: Cross-entropy: Kai Yu. SJTU Deep Learning Lecture. 40

Matching Output Function with Loss Function Regression Output: linear Loss function: sum of square error Binary Classification Output: logistic (sigmoid)/softmax Loss function: cross entropy Multi-classification Output: softmax Loss function: cross entropy Q: For binary classification, what sthe lossfunction for logistic and softmax output respectively? Kai Yu. SJTU Deep Learning Lecture. 41

Error Back-Propagation for Multi-Layer NN While numeric gradient computation can be used to estimate the gradient and thereby adjust the weights of the neural net, doing so is not very efficient. A more efficient, if not slightly more confusing method of computing the gradient, is to use back-propagation. Back-propagation (BP) is the most widely used parameter update approach for multi-layerneural network Kai Yu. SJTU Deep Learning Lecture. 42

Back-propagation Algorithm(1) Review multi-layer neural networks Feed forward operation is a chain function calculations

Back-propagation Algorithm(2) Lossfunctionexample: square error NN example: a simple one layer linear model: So the derivative of loss function (single sample) is:

Back-propagation Algorithm(3) General unit activation in a multilayer network: Activation function Forward propagation: calculate for each unit Activation Input/output of hidden layer The loss L depends on only through : Error signal

Back-propagation Algorithm(4) Output unit with linear output function: Hidden unit which sends inputs to units : Check all nodes connected to t Apply chain rules Update weights (learning rate ): Kai Yu. SJTU Deep Learning Lecture. 46

Back-propagation Algorithm(5) BP algorithm for multi-layer NN can be decomposed in the following four steps: I. Feed-forward computation II. Back propagation to the output layer III. Back propagation to the hidden layer IV. Weight updates Kai Yu. SJTU Deep Learning Lecture. 47

Example of BP Sigmoid Consider a 2-dimensional neuron (inputs x and weights w) that uses the sigmoid activation function. Differentiate sub-functions in the expression. Kai Yu. SJTU Deep Learning Lecture. 48

Example of BP Sigmoid The inputs are [x0,x1] and the (learnable) weights are [w0,w1,w2]. The forward pass computes values from inputs to output (green). The backward pass then performs back-propagation to compute the gradients (red). -1.0=1.00*(-1) -0.20 =-0.53 * exp(-1) 1.37 = 0.37 + 1-0.53 =1.00 * -1/(1.37^2) If learning rate=1, updated weights: w0=1.8, w1=-3.39, w2=-2.8 Kai Yu. SJTU Deep Learning Lecture. 49

Patterns in Backward Propagation The add gate distributes the gradient equally to all of its inputs. The max gate routes the gradient unchanged to exactly one of its inputs with the highest forward. The multiply gate: Its local gradients are the switched input values multiplied by the gradient on its output during the chain rule. Kai Yu. SJTU Deep Learning Lecture. 50

Computational Efficiency The back-propagation algorithm is computationally more efficient than standard numerical minimization. Suppose that is the total number of weights and biases in the network. Back-propagation: the evaluation is for large, as there are many more weights than units. Standard approach: perturb each weight, and forward propagate to compute the change in. This will requires computations, so the total complexity is. Kai Yu. SJTU Deep Learning Lecture. 51

Application Classify points Task: use a 2-Layer MLP to classify 3 classes of 2-dimensional points Kai Yu. SJTU Deep Learning Lecture. 52

Application Classify points Structure of NN Training Number of epochs Learning rate DEMO Kai Yu. SJTU Deep Learning Lecture. 53

Application Approximate y=sin(x) Task: use a 2-Layer MLP to approximate Kai Yu. SJTU Deep Learning Lecture. 54

Application Approximate y=sin(x) Structure of NN Training Number of epochs Learning rate DEMO Kai Yu. SJTU Deep Learning Lecture. 55

Types of NNs DNN(Deep neural networks) A fancy Playground: http://playground.tensorflow.org/ Kai Yu. SJTU Deep Learning Lecture. 56

Types of NNs CNN(Convolutional neural networks) Kai Yu. SJTU Deep Learning Lecture. 57

Types of NNs RNN(Recurrent neural networks) Kai Yu. SJTU Deep Learning Lecture. 58

DL Assignments https://github.com/caodi0207/deep-learning- Course-2017 New assignments will be uploaded to this repo Assignment submission: File name pattern: StudentID-YourName-AssignmentID.zip. E.g.: 12345- 小明 -as2.zip. Upload your zip file to: ftp://202.120.38.125. Public account: dl2016/dl2016. Be careful to upload to corresponding folder. Kai Yu. SJTU Deep Learning Lecture. 59

DL Assignments Kai Yu. SJTU Deep Learning Lecture. 60

DL Assignments Discuss and Q&A Kai Yu. SJTU Deep Learning Lecture. 61

DL Assignments Discuss and Q&A If you encounter any troubles or find any bugs, feel free to discuss and help others in those issues. Contact TAs: caodi0207@sjtu.edu.cn( 曹迪 ) Kai Yu. SJTU Deep Learning Lecture. 62

dl_assignment1 Softmax Two-layer MLP Kai Yu. SJTU Deep Learning Lecture. 63