Deep Learning Introduction

Similar documents
Python Machine Learning

System Implementation for SemEval-2017 Task 4 Subtask A Based on Interpolated Deep Neural Networks

Deep search. Enhancing a search bar using machine learning. Ilgün Ilgün & Cedric Reichenbach

Module 12. Machine Learning. Version 2 CSE IIT, Kharagpur

Lecture 1: Machine Learning Basics

HIERARCHICAL DEEP LEARNING ARCHITECTURE FOR 10K OBJECTS CLASSIFICATION

A Simple VQA Model with a Few Tricks and Image Features from Bottom-up Attention

(Sub)Gradient Descent

Second Exam: Natural Language Parsing with Neural Networks

Training a Neural Network to Answer 8th Grade Science Questions Steven Hewitt, An Ju, Katherine Stasaski

arxiv: v1 [cs.lg] 15 Jun 2015

Semantic Segmentation with Histological Image Data: Cancer Cell vs. Stroma

Machine Learning and Data Mining. Ensembles of Learners. Prof. Alexander Ihler

Unsupervised Learning of Word Semantic Embedding using the Deep Structured Semantic Model

Artificial Neural Networks written examination

Georgetown University at TREC 2017 Dynamic Domain Track

SORT: Second-Order Response Transform for Visual Recognition

Learning Structural Correspondences Across Different Linguistic Domains with Synchronous Neural Language Models

CSL465/603 - Machine Learning

arxiv: v1 [cs.lg] 7 Apr 2015

A New Perspective on Combining GMM and DNN Frameworks for Speaker Adaptation

Глубокие рекуррентные нейронные сети для аспектно-ориентированного анализа тональности отзывов пользователей на различных языках

Autoregressive product of multi-frame predictions can improve the accuracy of hybrid models

Deep Neural Network Language Models

Cultivating DNN Diversity for Large Scale Video Labelling

arxiv: v1 [cs.cv] 10 May 2017

arxiv: v4 [cs.cl] 28 Mar 2016

Generative models and adversarial training

THE enormous growth of unstructured data, including

arxiv: v2 [cs.cl] 26 Mar 2015

QuickStroke: An Incremental On-line Chinese Handwriting Recognition System

CS224d Deep Learning for Natural Language Processing. Richard Socher, PhD

Semi-supervised methods of text processing, and an application to medical concept extraction. Yacine Jernite Text-as-Data series September 17.

Word Segmentation of Off-line Handwritten Documents

A Neural Network GUI Tested on Text-To-Phoneme Mapping

Dual-Memory Deep Learning Architectures for Lifelong Learning of Everyday Human Behaviors

INPE São José dos Campos

arxiv: v4 [cs.cv] 13 Aug 2017

Dropout improves Recurrent Neural Networks for Handwriting Recognition

Lip Reading in Profile

Seminar - Organic Computing

TRANSFER LEARNING OF WEAKLY LABELLED AUDIO. Aleksandr Diment, Tuomas Virtanen

arxiv: v5 [cs.ai] 18 Aug 2015

Residual Stacking of RNNs for Neural Machine Translation

FUZZY EXPERT. Dr. Kasim M. Al-Aubidy. Philadelphia University. Computer Eng. Dept February 2002 University of Damascus-Syria

arxiv: v2 [cs.ir] 22 Aug 2016

Course Outline. Course Grading. Where to go for help. Academic Integrity. EE-589 Introduction to Neural Networks NN 1 EE

arxiv: v2 [cs.cv] 30 Mar 2017

There are some definitions for what Word

UNIDIRECTIONAL LONG SHORT-TERM MEMORY RECURRENT NEURAL NETWORK WITH RECURRENT OUTPUT LAYER FOR LOW-LATENCY SPEECH SYNTHESIS. Heiga Zen, Haşim Sak

arxiv:submit/ [cs.cv] 2 Aug 2017

Attributed Social Network Embedding

Model Ensemble for Click Prediction in Bing Search Ads

Laboratorio di Intelligenza Artificiale e Robotica

Using Deep Convolutional Neural Networks in Monte Carlo Tree Search

A Review: Speech Recognition with Deep Learning Methods

Knowledge Transfer in Deep Convolutional Neural Nets

Softprop: Softmax Neural Network Backpropagation Learning

Axiom 2013 Team Description Paper

A study of speaker adaptation for DNN-based speech synthesis

OPTIMIZATINON OF TRAINING SETS FOR HEBBIAN-LEARNING- BASED CLASSIFIERS

Diverse Concept-Level Features for Multi-Object Classification

CS Machine Learning

ON THE USE OF WORD EMBEDDINGS ALONE TO

Knowledge-Based - Systems

COMPUTER SCIENCE GRADUATE STUDIES Course Descriptions by Research Area

THE world surrounding us involves multiple modalities

A Deep Bag-of-Features Model for Music Auto-Tagging

Framewise Phoneme Classification with Bidirectional LSTM and Other Neural Network Architectures

Laboratorio di Intelligenza Artificiale e Robotica

ADVANCED MACHINE LEARNING WITH PYTHON BY JOHN HEARTY DOWNLOAD EBOOK : ADVANCED MACHINE LEARNING WITH PYTHON BY JOHN HEARTY PDF

COMPUTER SCIENCE GRADUATE STUDIES Course Descriptions by Methodology

Learning Methods for Fuzzy Systems

Time series prediction

Taxonomy-Regularized Semantic Deep Convolutional Neural Networks

Comment-based Multi-View Clustering of Web 2.0 Items

Test Effort Estimation Using Neural Network

Human Emotion Recognition From Speech

A JOINT MANY-TASK MODEL: GROWING A NEURAL NETWORK FOR MULTIPLE NLP TASKS

The Strong Minimalist Thesis and Bounded Optimality

Assignment 1: Predicting Amazon Review Ratings

A Reinforcement Learning Variant for Control Scheduling

Calibration of Confidence Measures in Speech Recognition

Exploration. CS : Deep Reinforcement Learning Sergey Levine

Active Learning. Yingyu Liang Computer Sciences 760 Fall

Ask Me Anything: Dynamic Memory Networks for Natural Language Processing

Modeling function word errors in DNN-HMM based LVCSR systems

Analysis of Hybrid Soft and Hard Computing Techniques for Forex Monitoring Systems

Summarizing Answers in Non-Factoid Community Question-Answering

Forget catastrophic forgetting: AI that learns after deployment

arxiv: v1 [cs.cl] 27 Apr 2016

arxiv: v2 [cs.cv] 4 Mar 2016

Lecture 1: Basic Concepts of Machine Learning

What Can Neural Networks Teach us about Language? Graham Neubig a2-dlearn 11/18/2017

A Latent Semantic Model with Convolutional-Pooling Structure for Information Retrieval

Offline Writer Identification Using Convolutional Neural Network Activation Features

Learning to Schedule Straight-Line Code

Modeling function word errors in DNN-HMM based LVCSR systems

Specification and Evaluation of Machine Translation Toy Systems - Criteria for laboratory assignments

arxiv: v2 [stat.ml] 30 Apr 2016 ABSTRACT

Transcription:

Deep Learning Introduction Christian Szegedy Geoffrey Irving Google Research

Machine Learning Supervised Learning Task Assume Ground truth G Model architecture f Prediction metric σ Training samples Find model parameters m ϵ M such that the expected is minimized

Machine Learning Unsupervised Learning Set of tasks that work on the uncurated data. Predict properties that are inherently present in the data alone.

Machine Learning Generative Learning Task : Ω(D) [0, 1] f : [0, 1]n M D Input space with probability measure Generative model architecture Find model parameters m ϵ M such that: (f(s, m)) ~ (S)

Machine Learning Supervised Learning as Marginal Computation : Ω(D P) [0, 1] f : [0, 1]n D M P Expanded Input space Conditional generative model. Find model parameters m ϵ M such that: (f(s, d, m) d) ~ (S)

Deep versus Shallow Learning Predictor Predictor Hand crafted Features Learned Features Data Data Traditional machine learning Deep Learning

Deep versus Shallow Learning Predictor Predictor Hand crafted Features Learned Features Data Data Traditional machine learning Deep Learning

Deep versus Shallow Learning Predictor Predictor Hand crafted Features Learned Features Data Traditional machine learning Mostly convex, provably tractable. Special purpose solvers. Non-layered architectures. Data Deep Learning Mostly NP-Hard General purpose solvers. Hierarchical models

Provably Tractable Deep Learning Approaches Sum-Product networks [by Hoifung Poon and Pedro Domingos] Can learn generative models Hierarchical structure Automated learning of low level features Tractable training/inference under certain conditions Practical implementations Provable Bounds for Learning Some Deep Representations [Sanjeev Arora, Aditya Bhaskara, Rong Ge and Tengyu Ma] Can learn generative models. Hierarchical structure Automated learning of low level features Provably tractable for extremely sparse graphs Creates deep and sparse artificial neural networks Based on the polynomial time solvable graph-square-root problem.

Classical Feed-Forward Artificial Neural Networks Input v W1 x + b1 Each sample is a vector tanh(x) W2 x + b2... Wx+b (Element-wise nonlinearity) Minimize Multilayer perceptron [Frank Rosenblatt, 1961] tanh(x) Loss (e.g SVM)

Classical Feed-Forward Artificial Neural Networks Input v W1 x + b1 Each sample is a vector tanh(x) W2 x + b2... Wx+b (Element-wise nonlinearity) Minimize In today s networks, tanh is increasingly replaced by max(x, 0) (Rectified linear units or ReLUs) tanh(x) Loss (e.g SVM)

Classical Feed-Forward Artificial Neural Networks Input v W1 x + b1 Each sample is a vector tanh(x) W2 x + b2... Wx+b tanh(x) (Element-wise nonlinearity) Minimize A highly nonlinear function! Huge Sum, ranges over all training examples! Loss (e.g SVM)

Optimizing the Neural Network Parameters With Minimize

Optimizing the Neural Network Parameters With Minimize Use gradient descent in the parameter space:

Stochastic Gradient Descent Learning rate α Randomly sampled Minibatch Bi

Compute derivatives via chain rule Input v Each sample is a vector W1 x + b1 tanh(x) W2 x + b2... (Element-wise nonlinearity) tanh(x) Loss (e.g SVM) Local gradient of the function involved: Gradients propagated by a backward pass recursively Backpropagation algorithm Wx+b Forward propagated function values Rummelhart et al, 1986

Sketch of Deep Artificial Neural Network Training Sample batch Bi of training examples Maintain network parameters Compute network output N(v) for each training example v Compute loss(n(v)) of each of the predictions. Use backpropagation to compute the gradients g with respect to the model parameters. Update M by subtracting αg.

Real Life Deep Network Training Data collection and preprocessing and input encoding Choosing a suitable framework that can do automatic differentiation. Designing suitable network architecture Using more sophisticated optimizers Implementation optimization: Hardware acceleration, esp. GPU Distributed training using multiple model replicas Choose hyperparameters like learning rate and weights for auxiliary losses.

Convolutional Networks (Image credit: Yann Lecun) Spatial Parameter-sharing. Neocognitron by [K. Fukushima, 1980]. Convolutional Neural Network, by Yann Lecun et al. (1988).

Deep versus Shallow Learning Predictor Predictor Hand crafted Features Learned Features Data Data Traditional machine learning Deep Learning

Low level features learned by vision networks ImageNet Classification with Deep Convolutional Neural Networks [Alex Krizhevsky, Ilya Sutskever, and Geoffrey Hinton 2012]

DeepDream visualization of internal feature representation Starting from white noise image, backpropagate the gradient from a trained network to the image pixel and try to maximize the response of various feature outputs. [Alexander Mordvintsev, Christopher Olah, Mike Tyka, 2015]

Cambrian Explosion of Deep Vision Research Zeiler-Fergus Network (ILSVRC winner 2013) Inception-v1 (GoogLeNet), ILSVRC winner 2014

Fisher Vectors + Hand crafted features Better than-human performance Task: 1000 fine grained classes including the difference between Eskimo dog and Siberian husky Convolutional networks Inception-v1 convolutional network Residual convolutional network

Siberian husky Eskimo dog Example images from the ImageNet dataset (ImageNet Large Scale Visual Recognition Challenge, IJCV 2015 by Russakovsky et al)

Object Detection VOC benchmark: detecting objects for 20 different categories (persons, cars, cats, birds, potted plants, bottles, chairs etc.) State of the art: Pre-deep Deep-learning learning in model 2015 2013 (Deformable Parts) 36% map 78% map

Stylistic Transfer using Deep Neural Features Source: Semantic Style Transfer and Turning Two-Bit Doodles into Fine Artwork, nucl.ai Conference 2016 by Alex J. Champandard http://arxiv.org/pdf/1603.01768v1.pdf [2016]

Real Life Applications of Deep Vision Networks Google Image and Photo Search (Inception-v2) Face detection and tagging in Google photos PlaNet Identifying the location where image was taken StreetView privacy protection Google Visual Translate Nvidia s DriveNet All of the above applications use variants of the Inception network architecture.

Recurrent Neural Networks... Parameter-sharing over time. LSTM: Long-short term memory by [Sepp Hochreiter, J urgen Schmidhuber, 1997] (Image credit: Cristopher Olah)

Generative Models of Text [Andrej Karpathy 2016]

Some Real life applications of recurrent networks Voice transcription in phones [Siri, OK Google] Video Captioning in YouTube Google Translate House number transcription from StreetView to Google Maps

Open Source Deep Learning Frameworks torch http://torch.ch Lua API Long history GPU backend (via cudnn) Most control about dynamic execution No support for distributed training

Open Source Deep Learning Frameworks Theano http://deeplearning.net/software/theano Python API University of Montreal project Fast GPU backend (via cudnn) Less control over dynamic execution than torch No support for distributed training

Open Source Deep Learning Frameworks https://www.tensorflow.org Python, C++ APIs Used and maintained by Google Fast GPU backend (via cudnn) Less control over dynamic execution than torch Support for distributed training now in open source

Deep learning for lemma selection Collaboration between Josef Urban s group Google Research Input from the Mizar corpus: Set of known lemmas Proposition to prove Pick small subset of lemmas to give to E Prover

Deep learning for lemma selection Simplified goal: Rank lemmas by usefulness for a given conjecture Embed lemma into using an LSTM Embed conjecture into using a different LSTM Combine embeddings to estimate usefulness conjecture LSTM FC lemma LSTM FC softmax

Thank you!