Introduction to Deep Learning

Similar documents
Python Machine Learning

OPTIMIZATINON OF TRAINING SETS FOR HEBBIAN-LEARNING- BASED CLASSIFIERS

Lecture 1: Machine Learning Basics

INPE São José dos Campos

Module 12. Machine Learning. Version 2 CSE IIT, Kharagpur

Artificial Neural Networks written examination

System Implementation for SemEval-2017 Task 4 Subtask A Based on Interpolated Deep Neural Networks

A study of speaker adaptation for DNN-based speech synthesis

(Sub)Gradient Descent

Course Outline. Course Grading. Where to go for help. Academic Integrity. EE-589 Introduction to Neural Networks NN 1 EE

Lecture 1: Basic Concepts of Machine Learning

Evolutive Neural Net Fuzzy Filtering: Basic Description

Machine Learning from Garden Path Sentences: The Application of Computational Linguistics

Unsupervised Learning of Word Semantic Embedding using the Deep Structured Semantic Model

Model Ensemble for Click Prediction in Bing Search Ads

QuickStroke: An Incremental On-line Chinese Handwriting Recognition System

Modeling function word errors in DNN-HMM based LVCSR systems

Axiom 2013 Team Description Paper

Deep search. Enhancing a search bar using machine learning. Ilgün Ilgün & Cedric Reichenbach

Forget catastrophic forgetting: AI that learns after deployment

Modeling function word errors in DNN-HMM based LVCSR systems

Evolution of Symbolisation in Chimpanzees and Neural Nets

Testing A Moving Target: How Do We Test Machine Learning Systems? Peter Varhol Technology Strategy Research, USA

FUZZY EXPERT. Dr. Kasim M. Al-Aubidy. Philadelphia University. Computer Eng. Dept February 2002 University of Damascus-Syria

Knowledge Transfer in Deep Convolutional Neural Nets

Human Emotion Recognition From Speech

CS Machine Learning

Word Segmentation of Off-line Handwritten Documents

Artificial Neural Networks

Generative models and adversarial training

Learning Structural Correspondences Across Different Linguistic Domains with Synchronous Neural Language Models

HIERARCHICAL DEEP LEARNING ARCHITECTURE FOR 10K OBJECTS CLASSIFICATION

Second Exam: Natural Language Parsing with Neural Networks

PREDICTING SPEECH RECOGNITION CONFIDENCE USING DEEP LEARNING WITH WORD IDENTITY AND SCORE FEATURES

Laboratorio di Intelligenza Artificiale e Robotica

Softprop: Softmax Neural Network Backpropagation Learning

Test Effort Estimation Using Neural Network

arxiv: v2 [cs.cv] 30 Mar 2017

On the Formation of Phoneme Categories in DNN Acoustic Models

Laboratorio di Intelligenza Artificiale e Robotica

A Neural Network GUI Tested on Text-To-Phoneme Mapping

Attributed Social Network Embedding

Learning Methods for Fuzzy Systems

A New Perspective on Combining GMM and DNN Frameworks for Speaker Adaptation

Analysis of Hybrid Soft and Hard Computing Techniques for Forex Monitoring Systems

CSL465/603 - Machine Learning

Predicting Student Attrition in MOOCs using Sentiment Analysis and Neural Networks

Autoregressive product of multi-frame predictions can improve the accuracy of hybrid models

Issues in the Mining of Heart Failure Datasets

OCR for Arabic using SIFT Descriptors With Online Failure Prediction

Top US Tech Talent for the Top China Tech Company

Semi-supervised methods of text processing, and an application to medical concept extraction. Yacine Jernite Text-as-Data series September 17.

Speech Recognition at ICSI: Broadcast News and beyond

Iterative Cross-Training: An Algorithm for Learning from Unlabeled Web Pages

arxiv: v1 [cs.cl] 2 Apr 2017

Learning to Schedule Straight-Line Code

Calibration of Confidence Measures in Speech Recognition

arxiv: v2 [cs.ir] 22 Aug 2016

Using focal point learning to improve human machine tacit coordination

arxiv: v1 [cs.lg] 15 Jun 2015

Rule Learning With Negation: Issues Regarding Effectiveness

arxiv: v1 [cs.lg] 7 Apr 2015

Cultivating DNN Diversity for Large Scale Video Labelling

Software Maintenance

A Deep Bag-of-Features Model for Music Auto-Tagging

An OO Framework for building Intelligence and Learning properties in Software Agents

Assignment 1: Predicting Amazon Review Ratings

Truth Inference in Crowdsourcing: Is the Problem Solved?

Robust Speech Recognition using DNN-HMM Acoustic Model Combining Noise-aware training with Spectral Subtraction

Purdue Data Summit Communication of Big Data Analytics. New SAT Predictive Validity Case Study

Twitter Sentiment Classification on Sanders Data using Hybrid Approach

Probabilistic Latent Semantic Analysis

Citrine Informatics. The Latest from Citrine. Citrine Informatics. The data analytics platform for the physical world

Phonetic- and Speaker-Discriminant Features for Speaker Recognition. Research Project

Specification and Evaluation of Machine Translation Toy Systems - Criteria for laboratory assignments

Automating the E-learning Personalization

arxiv: v1 [cs.cv] 10 May 2017

COMPUTER-AIDED DESIGN TOOLS THAT ADAPT

arxiv: v2 [cs.ro] 3 Mar 2017

On-the-Fly Customization of Automated Essay Scoring

*** * * * COUNCIL * * CONSEIL OFEUROPE * * * DE L'EUROPE. Proceedings of the 9th Symposium on Legal Data Processing in Europe

Training a Neural Network to Answer 8th Grade Science Questions Steven Hewitt, An Ju, Katherine Stasaski

AN EXAMPLE OF THE GOMORY CUTTING PLANE ALGORITHM. max z = 3x 1 + 4x 2. 3x 1 x x x x N 2

Language Acquisition Fall 2010/Winter Lexical Categories. Afra Alishahi, Heiner Drenhaus

Framewise Phoneme Classification with Bidirectional LSTM and Other Neural Network Architectures

SARDNET: A Self-Organizing Feature Map for Sequences

MASTER OF SCIENCE (M.S.) MAJOR IN COMPUTER SCIENCE

Using the Attribute Hierarchy Method to Make Diagnostic Inferences about Examinees Cognitive Skills in Algebra on the SAT

Development of an IT Curriculum. Dr. Jochen Koubek Humboldt-Universität zu Berlin Technische Universität Berlin 2008

AUTOMATIC DETECTION OF PROLONGED FRICATIVE PHONEMES WITH THE HIDDEN MARKOV MODELS APPROACH 1. INTRODUCTION

The Strong Minimalist Thesis and Bounded Optimality

Rule Learning with Negation: Issues Regarding Effectiveness

Deep Neural Network Language Models

BMBF Project ROBUKOM: Robust Communication Networks

Classification Using ANN: A Review

DIRECT ADAPTATION OF HYBRID DNN/HMM MODEL FOR FAST SPEAKER ADAPTATION IN LVCSR BASED ON SPEAKER CODE

An empirical study of learning speed in backpropagation

Time series prediction

TD(λ) and Q-Learning Based Ludo Players

Major Milestones, Team Activities, and Individual Deliverables

Transcription:

On the one hand, is unsurprising given DNNs status as arbitrary function app cific network weights and nonlinearities allow DNNs to easily adapt to various na other hand, they are not unique in their permitting multiple interpretations. One standard, simpler, algorithms through various lenses. For example one can derive 1 / 24 t breadth of possible interpretations, some interesting points begin to emerge. For to be a limitless number of interpretations for DNNs, apparently constrained onl which the mathematical operations are viewed. Physics interpretations stem from a physics background. Connections to sparsity and wavelets come from researcher important contributions to those fields. Ultimately, the interpretation of DNNs a a type of Rorschach test a psychological test wherein subjects interpret a se ambiguous ink-blots [101] (see Figure 1). Rorschach tests depend not only on w a subject sees in the ink-blots but also on the reasoning (methods used) behi perception, thus making the analogy particularly apropos. Introduction to Deep Learning Figure 1: What do you see? DNNs can be viewed in many ways. 1a. Stylistic ex with an input layer (red), output layer (blue) and two hidden layers (green); exa for DNN theory. 1b. Example (normalized) ink blot from the Rorschach test.

Is it a question? Given training data with categories A ( ) and B ( ), say well drilling sites with different outcomes 2 / 24

Is it a question? Given training data with categories A ( ) and B ( ), say well drilling sites with different outcomes Question? How to classify the rest of points, say where should we propose a new drilling site for the desired outcome? 2 / 24

AI via Machine Learning 1. AI via Machine Learning has advanced radically over the past 10 year. 3 / 24

AI via Machine Learning 1. AI via Machine Learning has advanced radically over the past 10 year. 2. ML algorithms now achieve human-level performance or better on the tasks such as 3 / 24

AI via Machine Learning 1. AI via Machine Learning has advanced radically over the past 10 year. 2. ML algorithms now achieve human-level performance or better on the tasks such as face recognition optical character recognition speech recognition object recognition playing the game Go in fact, defeated human champions 3 / 24

AI via Machine Learning 1. AI via Machine Learning has advanced radically over the past 10 year. 2. ML algorithms now achieve human-level performance or better on the tasks such as face recognition optical character recognition speech recognition object recognition playing the game Go in fact, defeated human champions 3. Deep Learning becomes the centerpiece of ML toolbox. 3 / 24

Deep Learning Deep Learning = multilayered Artificial Neural Network (ANN). 4 / 24

Deep Learning Deep Learning = multilayered Artificial Neural Network (ANN). A simple ANN with four layers Layer 2 Layer 1 (Input layer) Layer 3 Layer 4 (Output layer) Figure 3: A network with four layers. the two neurons in layer two. Since the input data has the form x R 2, the ights and biases for layer two may be represented by a matrix W [2] R 2 2 4 / 24

Deep Learning An ANN in a mathematically term 5 / 24

Deep Learning An ANN in a mathematically term ( F (x) = σ W [4] σ (W [3] σ(w [2] x + b [2] ) + b [3]) ) + b [4] 5 / 24

Deep Learning An ANN in a mathematically term ( F (x) = σ W [4] σ (W [3] σ(w [2] x + b [2] ) + b [3]) ) + b [4] where p := {(W [2], b [2] ), (W [3], b [3] ), (W [4], b [4] )} are parameters to be trained/computed from training data. σ( ) is an activiation function, say sigmoid function σ(z) = 1 1 + e z 5 / 24

Deep Learning The objective of training is to minimize a properly defined cost function, say min Cost(p) 1 p m m F (x (i) ) y (i) 2 2, i=1 where {(x (i), y (i) )} are training data 6 / 24

Deep Learning The objective of training is to minimize a properly defined cost function, say min Cost(p) 1 p m m F (x (i) ) y (i) 2 2, i=1 where {(x (i), y (i) )} are training data Steepest/gradient descent p p τ Cost(p) where τ is known as the learning rate. 6 / 24

Deep Learning The objective of training is to minimize a properly defined cost function, say min Cost(p) 1 p m m F (x (i) ) y (i) 2 2, i=1 where {(x (i), y (i) )} are training data Steepest/gradient descent p p τ Cost(p) where τ is known as the learning rate. The underlying operations of DL are stunningly simple, mostly matrix-vector products, but extremely intense. 6 / 24

Experiment 1 Given training data with categories A ( ) and B ( ), say well drilling sites with different outcomes 7 / 24

Experiment 1 Given training data with categories A ( ) and B ( ), say well drilling sites with different outcomes Question for DL: How to classify the rest of points, say where should we propose a new drilling site for the desired outcome? 7 / 24

Experiment 1 Classification after 90 seconds training on my desktop 8 / 24

Experiment 1 Classification after 90 seconds training on my desktop 8 / 24

Experiment 1 The value of Cost(W [ ], b [ ] ): 9 / 24

Experiment 2 Given training data with categories A ( ) and B ( ), say well drilling sites with different outcomes 10 / 24

Experiment 2 Given training data with categories A ( ) and B ( ), say well drilling sites with different outcomes Question for DL: How to classify the rest of points, say where should we propose a new drilling site for the desired outcome? 10 / 24

Experiment 2 Classification after 90 seconds training on my desktop 11 / 24

Experiment 2 Classification after 90 seconds training on my desktop 11 / 24

Experiment 2 The value of Cost(W [ ], b [ ] ): 12 / 24

Experiment 3 Given training data with categories A ( ) and B ( ), say well drilling sites with different outcomes 13 / 24

Experiment 3 Given training data with categories A ( ) and B ( ), say well drilling sites with different outcomes Question for DL: How to classify the rest of points, say where should we propose a new drilling site for the desired outcome? 13 / 24

Experiment 3 Classification after 16 seconds training on my desktop 14 / 24

Experiment 3 Classification after 16 seconds training on my desktop 14 / 24

Experiment 3 Classification after 38 seconds training on my desktop 15 / 24

Experiment 3 Classification after 38 seconds training on my desktop 15 / 24

Experiment 3 Classification after 46 seconds training on my desktop 16 / 24

Experiment 3 Classification after 46 seconds training on my desktop 16 / 24

Experiment 3 Classification after 62 seconds training on my desktop 17 / 24

Experiment 3 Classification after 62 seconds training on my desktop 17 / 24

Experiment 3 Classification after 83 seconds training on my desktop 18 / 24

Experiment 3 Classification after 83 seconds training on my desktop 18 / 24

Experiment 3 Classification after 156 seconds training on my desktop 19 / 24

Experiment 3 Classification after 156 seconds training on my desktop 19 / 24

Experiment 3 The value of Cost(W [ ], b [ ] ): 16 38 46 62 83 156 20 / 24

Experiment 4 Given training data with categories A ( ) and B ( ), say well drilling sites with different outcomes 21 / 24

Experiment 4 Given training data with categories A ( ) and B ( ), say well drilling sites with different outcomes Question for DL: How to classify the rest of points, say where should we propose a new drilling site for the desired outcome? 21 / 24

Experiment 4 Classification after 90 seconds training on my desktop 22 / 24

Experiment 4 Classification after 90 seconds training on my desktop 22 / 24

Experiment 4 The value of Cost(W [ ], b [ ] ): 23 / 24

Perfect Storm 1. The recent success of ANNs in ML, despite their long history, can be contributed to a perfect storm of 24 / 24

Perfect Storm 1. The recent success of ANNs in ML, despite their long history, can be contributed to a perfect storm of large labeled datasets; improved hardware; clever parameter constraints; advancements in optimization algorithms; more open sharing of stable, reliable code leveraging the latest in methods. 24 / 24

Perfect Storm 1. The recent success of ANNs in ML, despite their long history, can be contributed to a perfect storm of large labeled datasets; improved hardware; clever parameter constraints; advancements in optimization algorithms; more open sharing of stable, reliable code leveraging the latest in methods. 2. ANN is simultaneously one of the simplest and most complex methods: 24 / 24

Perfect Storm 1. The recent success of ANNs in ML, despite their long history, can be contributed to a perfect storm of large labeled datasets; improved hardware; clever parameter constraints; advancements in optimization algorithms; more open sharing of stable, reliable code leveraging the latest in methods. 2. ANN is simultaneously one of the simplest and most complex methods: learning to model and parameterization capable of self-enhancement generic computation architecture executable on local HPC and on cloud broadly applicable but requires good understanding of the underlying problems and algorthms 24 / 24