Welcome to CSCE 496/896: Deep Learning! Welcome to CSCE 496/896: Deep Learning! Override Policy. Override Policy. Override Policy.

Similar documents
Module 12. Machine Learning. Version 2 CSE IIT, Kharagpur

Lecture 1: Machine Learning Basics

Python Machine Learning

Lecture 1: Basic Concepts of Machine Learning

Laboratorio di Intelligenza Artificiale e Robotica

Introduction to Ensemble Learning Featuring Successes in the Netflix Prize Competition

Deep search. Enhancing a search bar using machine learning. Ilgün Ilgün & Cedric Reichenbach

Axiom 2013 Team Description Paper

Lecture 10: Reinforcement Learning

Artificial Neural Networks written examination

Course Outline. Course Grading. Where to go for help. Academic Integrity. EE-589 Introduction to Neural Networks NN 1 EE

Word Segmentation of Off-line Handwritten Documents

(Sub)Gradient Descent

QuickStroke: An Incremental On-line Chinese Handwriting Recognition System

Laboratorio di Intelligenza Artificiale e Robotica

CS Machine Learning

Machine Learning and Data Mining. Ensembles of Learners. Prof. Alexander Ihler

Exploration. CS : Deep Reinforcement Learning Sergey Levine

System Implementation for SemEval-2017 Task 4 Subtask A Based on Interpolated Deep Neural Networks

A Neural Network GUI Tested on Text-To-Phoneme Mapping

OPTIMIZATINON OF TRAINING SETS FOR HEBBIAN-LEARNING- BASED CLASSIFIERS

CSL465/603 - Machine Learning

Notes on The Sciences of the Artificial Adapted from a shorter document written for course (Deciding What to Design) 1

INPE São José dos Campos

OCR for Arabic using SIFT Descriptors With Online Failure Prediction

IAT 888: Metacreation Machines endowed with creative behavior. Philippe Pasquier Office 565 (floor 14)

HIERARCHICAL DEEP LEARNING ARCHITECTURE FOR 10K OBJECTS CLASSIFICATION

A Simple VQA Model with a Few Tricks and Image Features from Bottom-up Attention

Calibration of Confidence Measures in Speech Recognition

Rule Learning With Negation: Issues Regarding Effectiveness

Seminar - Organic Computing

Evolutive Neural Net Fuzzy Filtering: Basic Description

Generative models and adversarial training

Analysis of Hybrid Soft and Hard Computing Techniques for Forex Monitoring Systems

Human Emotion Recognition From Speech

Reinforcement Learning by Comparing Immediate Reward

Autoregressive product of multi-frame predictions can improve the accuracy of hybrid models

A Reinforcement Learning Variant for Control Scheduling

Probabilistic Latent Semantic Analysis

Australian Journal of Basic and Applied Sciences

An OO Framework for building Intelligence and Learning properties in Software Agents

AQUA: An Ontology-Driven Question Answering System

Radius STEM Readiness TM

Machine Learning from Garden Path Sentences: The Application of Computational Linguistics

Feature-oriented vs. Needs-oriented Product Access for Non-Expert Online Shoppers

Lahore University of Management Sciences. FINN 321 Econometrics Fall Semester 2017

Active Learning. Yingyu Liang Computer Sciences 760 Fall

MYCIN. The MYCIN Task

Abstractions and the Brain

Purdue Data Summit Communication of Big Data Analytics. New SAT Predictive Validity Case Study

Full text of O L O W Science As Inquiry conference. Science as Inquiry

Speech Recognition at ICSI: Broadcast News and beyond

Analysis of Emotion Recognition System through Speech Signal Using KNN & GMM Classifier

The 9 th International Scientific Conference elearning and software for Education Bucharest, April 25-26, / X

CS 446: Machine Learning

On-Line Data Analytics

A study of speaker adaptation for DNN-based speech synthesis

Learning Methods for Fuzzy Systems

arxiv: v1 [cs.lg] 15 Jun 2015

EECS 571 PRINCIPLES OF REAL-TIME COMPUTING Fall 10. Instructor: Kang G. Shin, 4605 CSE, ;

Knowledge Transfer in Deep Convolutional Neural Nets

ADVANCED MACHINE LEARNING WITH PYTHON BY JOHN HEARTY DOWNLOAD EBOOK : ADVANCED MACHINE LEARNING WITH PYTHON BY JOHN HEARTY PDF

Clerical Skills Level II

Model Ensemble for Click Prediction in Bing Search Ads

Reducing Features to Improve Bug Prediction

The Singapore Copyright Act applies to the use of this document.

Rover Races Grades: 3-5 Prep Time: ~45 Minutes Lesson Time: ~105 minutes

Speech Emotion Recognition Using Support Vector Machine

Unsupervised Learning of Word Semantic Embedding using the Deep Structured Semantic Model

Intelligent Agents. Chapter 2. Chapter 2 1

BUILDING CONTEXT-DEPENDENT DNN ACOUSTIC MODELS USING KULLBACK-LEIBLER DIVERGENCE-BASED STATE TYING

Learning Optimal Dialogue Strategies: A Case Study of a Spoken Dialogue Agent for

CS 1103 Computer Science I Honors. Fall Instructor Muller. Syllabus

TD(λ) and Q-Learning Based Ludo Players

Computerized Adaptive Psychological Testing A Personalisation Perspective

Rule Learning with Negation: Issues Regarding Effectiveness

The Good Judgment Project: A large scale test of different methods of combining expert predictions

Firms and Markets Saturdays Summer I 2014

K 1 2 K 1 2. Iron Mountain Public Schools Standards (modified METS) Checklist by Grade Level Page 1 of 11

arxiv: v2 [cs.cv] 30 Mar 2017

PREP S SPEAKER LISTENER TECHNIQUE COACHING MANUAL

WHEN THERE IS A mismatch between the acoustic

Proof Theory for Syntacticians

A New Perspective on Combining GMM and DNN Frameworks for Speaker Adaptation

Test Effort Estimation Using Neural Network

Artificial Neural Networks

Semi-supervised methods of text processing, and an application to medical concept extraction. Yacine Jernite Text-as-Data series September 17.

Clerical Skills Level I

A Case Study: News Classification Based on Term Frequency

Modeling function word errors in DNN-HMM based LVCSR systems

Connect Microbiology. Training Guide

CENTRAL MAINE COMMUNITY COLLEGE Introduction to Computer Applications BCA ; FALL 2011

Speeding Up Reinforcement Learning with Behavior Transfer

Counseling 150. EOPS Student Readiness and Success

Lecture 6: Applications

Learning goal-oriented strategies in problem solving

Softprop: Softmax Neural Network Backpropagation Learning

Financial Accounting Concepts and Research

ACTL5103 Stochastic Modelling For Actuaries. Course Outline Semester 2, 2014

Modeling function word errors in DNN-HMM based LVCSR systems

Transcription:

Welcome to CSCE 496/896: Deep! Welcome to CSCE 496/896: Deep! Please check off your name on the roster, or write your name if you're not listed Indicate if you wish to register or sit in Policy on sit-ins: You may sit in on the course without registering, but not at the expense of resources needed by registered students Don't expect to get homework, etc. graded If there are no open seats, you may have to surrender yours to someone who is registered You should have two handouts: Syllabus Copies of slides Option 1 Priority given to Undergraduate CSE majors graduating in May or December CSE graduate students who need it for research Option 1 Priority given to Undergraduate CSE majors graduating in May or December CSE graduate students who need it for research Option 2 Option 1 Priority given to Undergraduate CSE majors graduating in May or December CSE graduate students who need it for research If you want an override, fill out the sheet with your name, NUID, major, which course (496 vs 896), and why this course is necessary for you Option 2

What is Machine? Building machines that automatically learn from experience Sub-area of artificial intelligence (Very) small sampling of applications: Introduction to Machine Stephen Scott Detection of fraudulent credit card transactions Filtering spam email Autonomous vehicles driving on public highways Self-customizing programs: Web browser that learns what you like/where you are) and adjusts; autocorrect Applications we can t program by hand: E.g., speech recognition You ve used it today already J What is? Does Memorization =? Many different answers, depending on the field you re considering and whom you ask Test #1: Thomas learns his mother s face Artificial intelligence vs. psychology vs. education vs. neurobiology vs. Sees: But will he recognize: Does Memorization =? (cont d) Test #2: Nicholas learns about trucks Sees: Thus he can generalize beyond what he s seen! But will he recognize others?

What is Machine? (cont d) So learning involves ability to generalize from labeled examples In contrast, memorization is trivial, especially for a computer When do we use machine learning? Human expertise does not exist (navigating on Mars) Humans are unable to explain their expertise (speech recognition; face recognition; driving) Solution changes in time (routing on a computer network; browsing history; driving) Solution needs to be adapted to particular cases (biometrics; speech recognition; spam filtering) In short, when one needs to generalize from experience in a non-obvious way What is Machine? (cont d) When do we not use machine learning? Calculating payroll Sorting a list of words Web server Word processing Monitoring CPU usage Querying a database When we can definitively specify how all cases should be handled More Formal Definition From Tom Mitchell s 1997 textbook: A computer program is said to learn from experience E with respect to some class of tasks T and performance measure P if its performance at tasks in T, as measured by P, improves with experience E. Wide variations of how T, P, and E manifest One Type of Task T: Classification Given several labeled examples of a concept E.g., trucks vs. s (binary); height (real) This is the experience E Examples are described by features E.g., number-of-wheels (int), relative-height (height divided by width), hauls-cargo (yes/no) A machine learning algorithm uses these examples to create a hypothesis (or model) that will predict the label of new (previously unseen) examples Classification (cont d) Labeled Training Data (labeled examples w/features) Machine Algorithm Hypotheses can take on many forms Unlabeled Data (unlabeled exs) Hypothesis Predicted Labels

Example Hypothesis Type: Decision Tree Very easy to comprehend by humans Compactly represents if-then rules no hauls-cargo yes num-of-wheels <4 4 truck 1 relative-height <1 Artificial Neural Networks (cont d) ANNs are basis of deep learning Deep refers to depth of the architecture More layers => more processing of inputs Each input to a node is multiplied by a weight Weighted sum S sent through activation function: Our Focus: Artificial Neural Networks Designed to simulate brains Neurons (processing units) communicate via connections, each with a numeric weight comes from adjusting the weights Small Sampling of Deep Examples Image recognition, speech recognition, document analysis, game playing, 8 Inspirational Applications of Deep Rectified linear: max(0, S) Convolutional + pooling: Weights represent a (e.g.) 3x3 convolutional kernel to identify features in (e.g.) images that are translation invariant Sigmoid: tanh(s) or 1/(1+exp(-S)) Often trained via stochastic gradient descent Example Performance Measures P Let X be a set of labeled instances Classification error: number of instances of X hypothesis h predicts correctly, divided by X Squared error: Sum (yi - h(xi))2 over all xi If labels from {0,1}, same as classification error Useful when labels are real-valued Cross-entropy: Sum over all xi from X: yi ln h(xi) + (1 yi) ln (1 - h(xi)) Generalizes to > 2 classes Effective when h predicts probabilities Another Type of Task T: Unsupervised E is now a set of unlabeled examples Examples are still described by features Still want to infer a model of the data, but instead of predicting labels, want to understand its structure E.g., clustering, density estimation, feature extraction

Clustering Examples Flat Hierarchical Feature Extraction via Autoencoding Can train an ANN with unlabeled data Goal: have output x match input x Results in embedding z of input x Can pre-train network to identify features Later, replace decoder with classifier Semisupervised learning Another Type of Task T: Semisupervised E is now a mixture of both labeled and unlabeled examples Cannot afford to label all of it (e.g., images from web) Goal is to infer a classifier, but leverage abundant unlabeled data in the process Pre-train in order to identify relevant features Actively purchase labels from small subset Could also use transfer learning from one task to another Another Type of Task T: Reinforcement An agent A interacts with its environment At each step, A perceives the state s of its environment and takes action a Action a results in some reward r and changes state to s Markov decision process (MDP) Goal is to maximize expected long-term reward Applications: Backgammon, Go, video games, self-driving cars Reinforcement (cont d) RL differs from previous tasks in that the feedback (reward) is typically delayed Often takes several actions before reward received E.g., no reward in checkers until game ends Need to decide how much each action contributed to final reward Credit assignment problem Also, limited sensing ability makes distinct states look the same Partially observable MDP (POMDP) Issue: Model Complexity In classification and regression, possible to find hypothesis that perfectly classifies all training data But should we necessarily use it?

Model Complexity (cont d) Label: Football player? èto generalize well, need to balance training accuracy with simplicity Relevant Disciplines Artificial intelligence: as a search problem, using prior knowledge to guide learning Probability theory: computing probabilities of hypotheses Computational complexity theory: Bounds on inherent complexity of learning Control theory: to control processes to optimize performance measures Philosophy: Occam s razor (everything else being equal, simplest explanation is best) Psychology and neurobiology: Practice improves performance, biological justification for artificial neural networks Statistics: Estimating generalization performance Conclusions Idea of intelligent machines has been around a long time Early on was primarily academic interest Past few decades, improvements in processing power plus very large data sets allows highly sophisticated (and successful!) approaches Prevalent in modern society You ve probably used it several times today No single best approach for any problem Depends on requirements, type of data, volume of data