CS60010: Deep Learning

Similar documents
Generative models and adversarial training

Python Machine Learning

CSL465/603 - Machine Learning

(Sub)Gradient Descent

Lecture 1: Machine Learning Basics

CS224d Deep Learning for Natural Language Processing. Richard Socher, PhD

arxiv: v1 [cs.lg] 15 Jun 2015

Module 12. Machine Learning. Version 2 CSE IIT, Kharagpur

Unsupervised Learning of Word Semantic Embedding using the Deep Structured Semantic Model

Autoregressive product of multi-frame predictions can improve the accuracy of hybrid models

CS Machine Learning

Knowledge Transfer in Deep Convolutional Neural Nets

System Implementation for SemEval-2017 Task 4 Subtask A Based on Interpolated Deep Neural Networks

Deep search. Enhancing a search bar using machine learning. Ilgün Ilgün & Cedric Reichenbach

Artificial Neural Networks written examination

Learning Methods for Fuzzy Systems

Axiom 2013 Team Description Paper

Forget catastrophic forgetting: AI that learns after deployment

A Neural Network GUI Tested on Text-To-Phoneme Mapping

Purdue Data Summit Communication of Big Data Analytics. New SAT Predictive Validity Case Study

An empirical study of learning speed in backpropagation

arxiv: v1 [cs.lg] 7 Apr 2015

Model Ensemble for Click Prediction in Bing Search Ads

ADVANCED MACHINE LEARNING WITH PYTHON BY JOHN HEARTY DOWNLOAD EBOOK : ADVANCED MACHINE LEARNING WITH PYTHON BY JOHN HEARTY PDF

Machine Learning from Garden Path Sentences: The Application of Computational Linguistics

Exploration. CS : Deep Reinforcement Learning Sergey Levine

Self Study Report Computer Science

Dual-Memory Deep Learning Architectures for Lifelong Learning of Everyday Human Behaviors

Machine Learning and Data Mining. Ensembles of Learners. Prof. Alexander Ihler

Evolution of Symbolisation in Chimpanzees and Neural Nets

Residual Stacking of RNNs for Neural Machine Translation

Training a Neural Network to Answer 8th Grade Science Questions Steven Hewitt, An Ju, Katherine Stasaski

HIERARCHICAL DEEP LEARNING ARCHITECTURE FOR 10K OBJECTS CLASSIFICATION

Semi-supervised methods of text processing, and an application to medical concept extraction. Yacine Jernite Text-as-Data series September 17.

Business Analytics and Information Tech COURSE NUMBER: 33:136:494 COURSE TITLE: Data Mining and Business Intelligence

OPTIMIZATINON OF TRAINING SETS FOR HEBBIAN-LEARNING- BASED CLASSIFIERS

Laboratorio di Intelligenza Artificiale e Robotica

We are strong in research and particularly noted in software engineering, information security and privacy, and humane gaming.

AP Calculus AB. Nevada Academic Standards that are assessable at the local level only.

An Introduction to the Minimalist Program

Using the Attribute Hierarchy Method to Make Diagnostic Inferences about Examinees Cognitive Skills in Algebra on the SAT

Testing A Moving Target: How Do We Test Machine Learning Systems? Peter Varhol Technology Strategy Research, USA

A New Perspective on Combining GMM and DNN Frameworks for Speaker Adaptation

A Review: Speech Recognition with Deep Learning Methods

Undergraduate Program Guide. Bachelor of Science. Computer Science DEPARTMENT OF COMPUTER SCIENCE and ENGINEERING

Twitter Sentiment Classification on Sanders Data using Hybrid Approach

Circuit Simulators: A Revolutionary E-Learning Platform

Lahore University of Management Sciences. FINN 321 Econometrics Fall Semester 2017

Evolutive Neural Net Fuzzy Filtering: Basic Description


COMPUTER SCIENCE GRADUATE STUDIES Course Descriptions by Methodology

DIRECT ADAPTATION OF HYBRID DNN/HMM MODEL FOR FAST SPEAKER ADAPTATION IN LVCSR BASED ON SPEAKER CODE

Word Segmentation of Off-line Handwritten Documents

CS4491/CS 7265 BIG DATA ANALYTICS INTRODUCTION TO THE COURSE. Mingon Kang, PhD Computer Science, Kennesaw State University

DIGITAL GAMING & INTERACTIVE MEDIA BACHELOR S DEGREE. Junior Year. Summer (Bridge Quarter) Fall Winter Spring GAME Credits.

Top US Tech Talent for the Top China Tech Company

A Reinforcement Learning Variant for Control Scheduling

Course Outline. Course Grading. Where to go for help. Academic Integrity. EE-589 Introduction to Neural Networks NN 1 EE

INPE São José dos Campos

Attributed Social Network Embedding

Getting Started with Deliberate Practice

EECS 571 PRINCIPLES OF REAL-TIME COMPUTING Fall 10. Instructor: Kang G. Shin, 4605 CSE, ;

A Simple VQA Model with a Few Tricks and Image Features from Bottom-up Attention

Spring 2016 Stony Brook University Instructor: Dr. Paul Fodor

Human Emotion Recognition From Speech

arxiv: v1 [cs.cl] 27 Apr 2016

Stochastic Calculus for Finance I (46-944) Spring 2008 Syllabus

Deep Neural Network Language Models

Second Exam: Natural Language Parsing with Neural Networks

Radius STEM Readiness TM

Challenges in Deep Reinforcement Learning. Sergey Levine UC Berkeley

arxiv: v4 [cs.cv] 13 Aug 2017

Introduction to Ensemble Learning Featuring Successes in the Netflix Prize Competition

QuickStroke: An Incremental On-line Chinese Handwriting Recognition System

Speech Emotion Recognition Using Support Vector Machine

Proof Theory for Syntacticians

AI Agent for Ice Hockey Atari 2600

Laboratorio di Intelligenza Artificiale e Robotica

A study of speaker adaptation for DNN-based speech synthesis

EGRHS Course Fair. Science & Math AP & IB Courses

Navigating the PhD Options in CMS

Predicting Student Attrition in MOOCs using Sentiment Analysis and Neural Networks

Major Milestones, Team Activities, and Individual Deliverables

Mathematics subject curriculum

A Deep Bag-of-Features Model for Music Auto-Tagging

Seminar - Organic Computing

School of Innovative Technologies and Engineering

COMPUTER SCIENCE GRADUATE STUDIES Course Descriptions by Research Area

Cultivating DNN Diversity for Large Scale Video Labelling

arxiv: v1 [cs.cv] 10 May 2017

Book Review: Build Lean: Transforming construction using Lean Thinking by Adrian Terry & Stuart Smith

The 9 th International Scientific Conference elearning and software for Education Bucharest, April 25-26, / X

A deep architecture for non-projective dependency parsing

Analysis of Hybrid Soft and Hard Computing Techniques for Forex Monitoring Systems

San José State University Department of Psychology PSYC , Human Learning, Spring 2017

Corpus Linguistics (L615)

have to be modeled) or isolated words. Output of the system is a grapheme-tophoneme conversion system which takes as its input the spelling of words,

AUTOMATIC DETECTION OF PROLONGED FRICATIVE PHONEMES WITH THE HIDDEN MARKOV MODELS APPROACH 1. INTRODUCTION

Artificial Neural Networks

Kamaldeep Kaur University School of Information Technology GGS Indraprastha University Delhi

Transcription:

CS60010: Deep Learning Sudeshna Sarkar Spring 2018 8 Jan 2018

INTRODUCTION

Milestones: Digit Recognition LeNet 1989: recognize zip codes, Yann Lecun, Bernhard Boser and others, ran live in US postal service

Milestones: Image Classification Convolutional NNs: AlexNet (2012): trained on 200 GB of ImageNet Data Human performance 5.1% error

Milestones: Speech Recognition Recurrent Nets: LSTMs (1997):

Milestones: Language Translation Sequence-to-sequence models with LSTMs and attention: Source Luong, Cho, Manning ACL Tutorial 2016.

Milestones: Deep Reinforcement Learning In 2013, Deep Mind s arcade player bests human expert on six Atari Games. Acquired by Google in 2014,. In 2016, Deep Mind s alphago defeats former world champion Lee Sedol 7

Learning about Deep Neural Networks Yann Lecun: DNNs require: an interplay between intuitive insights, theoretical modeling, practical implementations, empirical studies, and scientific analyses i.e. there isn t a framework or core set of principles to explain everything (c.f. graphical models for machine learning). 8

This Course Goals: Introduce deep learning. Review principles and techniques for understanding deep networks. Develop skill at designing networks for applications 9

This Course Times: Mon 12-1, Tue 10-12, Thu 8-9 Assignments (pre-midterm): 20% Post-midterm assignments / Project: 20% Midterm: 30% Endterm: 30% TAs: Ayan Das, Alapan Kuila, Aishik Chakraborty, Ravi Bansal, Jeenu Grover Moodle: DL Deep Learning Course Home Page: cse.iitkgp.ac.in - TBD 10

Prerequisites Knowledge of calculus and linear algebra Probability and Statistics Machine Learning Programming in Python. 11

Logistics 3 hours of lecture 1 hour of programming / tutorial Attendance is compulsory 12

Phases of Neural Network Research 1940s-1960s: Cybernetics: Brain like electronic systems, morphed into modern control theory and signal processing. 1960s-1980s: Digital computers, automata theory, computational complexity theory: simple shallow circuits are very limited 1980s-1990s: Connectionism: complex, non-linear networks, backpropagation. 1990s-2010s: Computational learning theory, graphical models: Learning is computationally hard, simple shallow circuits are very limited 2006 : Deep learning: End-to-end training, large datasets, explosion in applications.

Citations of the LeNet paper Recall the LeNet was a modern visual classification network that recognized digits for zip codes. Its citations look like this: Second phase Deep Learning Winter Third phase The 2000s were a golden age for machine learning, and marked the ascent of graphical models. But not so for neural networks.

Why the success of DNNs is surprising From both complexity and learning theory perspectives, simple networks are very limited. Can t compute parity with a small network. NP-Hard to learn simple functions like 3SAT formulae, and i.e. training a DNN is NP-hard.

Why the success of DNNs is surprising The most successful DNN training algorithm is a version of gradient descent which will only find local optima. In other words, it s a greedy algorithm. Backprop: loss = f(g(h(y))) d loss/dy = f (g) x g (h) x h (y) Greedy algorithms are even more limited in what they can represent and how well they learn. If a problem has a greedy solution, its regarded as an easy problem.

Why the success of DNNs is surprising In graphical models, values in a network represent random variables, and have a clear meaning. The network structure encodes dependency information, i.e. you can represent rich models. In a DNN, node activations encode nothing in particular, and the network structure only encodes (trivially) how they derive from each other.

Why the success of DNNs is surprising obvious Hierarchical representations are ubiquitous in AI. Computer vision:

Why the success of DNNs is surprising obvious Natural language:

Why the success of DNNs is surprising obvious Human Learning: is deeply layered.

Why the success of DNNs is surprising obvious What about greedy optimization? Less obvious, but it looks like many learning problems (e.g. image classification) are actually easy i.e. have reliable steepest descent paths to a good model. Ian Goodfellow ICLR 2015 Tutorial

Representations Matter Cartesian coordinates Polar coordinates y θ x r

Representation Learning Use machine learning to discover not only the mapping from representation to output but also the representation itself. Representation Learning Learned representations often result in much better performance than can be obtained with hand-designed representations. They also enable AI systems to rapidly adapt to new tasks, with minimal human intervention.

Depth CAR PERSON ANIMAL Output (object identity) 3rd hidden layer (object parts) 2nd hidden layer (corners and contours) 1st hidden layer (edges) Visible layer (input pixels)

Output Output Output Mapping from features Output Mapping from features Mapping from features Additional layers of more abstract features Handdesigned program Handdesigned features Features Simple features Input Input Input Input Rule-based systems Classic machine learning Representation learning Deep learning

ML BASICS

Definition Mitchell (1997) A computer program is said to learn from experience E with respect to some class of tasks T and performance measure P, if its performance at tasks in T, as measured by P, improves with experience E.

Linear Regression In the case of linear regression, the output is a linear function of the input. Let y be the value that our model predicts y should take on. We define the output to be y = w T x MMM tttt = 1 m y (tttt) y (tttt) 2 2

Normal Equations