Andrew Ng. Nuts and bolts of building AI applications using Deep Learning. Andrew Ng. Trend #1: Scale driving Deep Learning progress

Similar documents
Lecture 1: Machine Learning Basics

Python Machine Learning

System Implementation for SemEval-2017 Task 4 Subtask A Based on Interpolated Deep Neural Networks

ADVANCED MACHINE LEARNING WITH PYTHON BY JOHN HEARTY DOWNLOAD EBOOK : ADVANCED MACHINE LEARNING WITH PYTHON BY JOHN HEARTY PDF

Laboratorio di Intelligenza Artificiale e Robotica

Testing A Moving Target: How Do We Test Machine Learning Systems? Peter Varhol Technology Strategy Research, USA

CS Machine Learning

Introduction to Ensemble Learning Featuring Successes in the Netflix Prize Competition

(Sub)Gradient Descent

Laboratorio di Intelligenza Artificiale e Robotica

Module 12. Machine Learning. Version 2 CSE IIT, Kharagpur

Rule Learning With Negation: Issues Regarding Effectiveness

A New Perspective on Combining GMM and DNN Frameworks for Speaker Adaptation

Active Learning. Yingyu Liang Computer Sciences 760 Fall

Learning From the Past with Experiment Databases

Rule Learning with Negation: Issues Regarding Effectiveness

Speech Recognition at ICSI: Broadcast News and beyond

Twitter Sentiment Classification on Sanders Data using Hybrid Approach

Model Ensemble for Click Prediction in Bing Search Ads

have to be modeled) or isolated words. Output of the system is a grapheme-tophoneme conversion system which takes as its input the spelling of words,

Role of Pausing in Text-to-Speech Synthesis for Simultaneous Interpretation

Individual Differences & Item Effects: How to test them, & how to test them well

Deep search. Enhancing a search bar using machine learning. Ilgün Ilgün & Cedric Reichenbach

Lecture 1: Basic Concepts of Machine Learning

Notes on The Sciences of the Artificial Adapted from a shorter document written for course (Deciding What to Design) 1

Artificial Neural Networks written examination

Getting Started with Deliberate Practice

WHEN THERE IS A mismatch between the acoustic

Major Milestones, Team Activities, and Individual Deliverables

Assignment 1: Predicting Amazon Review Ratings

*Net Perceptions, Inc West 78th Street Suite 300 Minneapolis, MN

B. How to write a research paper

Training a Neural Network to Answer 8th Grade Science Questions Steven Hewitt, An Ju, Katherine Stasaski

arxiv: v1 [cs.lg] 7 Apr 2015

Top US Tech Talent for the Top China Tech Company

Problems of the Arabic OCR: New Attitudes

CSL465/603 - Machine Learning

GCSE. Mathematics A. Mark Scheme for January General Certificate of Secondary Education Unit A503/01: Mathematics C (Foundation Tier)

Developing Grammar in Context

How to make an A in Physics 101/102. Submitted by students who earned an A in PHYS 101 and PHYS 102.

Machine Learning and Data Mining. Ensembles of Learners. Prof. Alexander Ihler

Exploration. CS : Deep Reinforcement Learning Sergey Levine

Computers Change the World

STUDIES WITH FABRICATED SWITCHBOARD DATA: EXPLORING SOURCES OF MODEL-DATA MISMATCH

Autoregressive product of multi-frame predictions can improve the accuracy of hybrid models

Calibration of Confidence Measures in Speech Recognition

Softprop: Softmax Neural Network Backpropagation Learning

Mathematics process categories

The 9 th International Scientific Conference elearning and software for Education Bucharest, April 25-26, / X

Kristin Moser. Sherry Woosley, Ph.D. University of Northern Iowa EBI

OPTIMIZATINON OF TRAINING SETS FOR HEBBIAN-LEARNING- BASED CLASSIFIERS

Rule discovery in Web-based educational systems using Grammar-Based Genetic Programming

Machine Learning from Garden Path Sentences: The Application of Computational Linguistics

Managerial Decision Making

THE world surrounding us involves multiple modalities

Part I. Figuring out how English works

AQUA: An Ontology-Driven Question Answering System

Глубокие рекуррентные нейронные сети для аспектно-ориентированного анализа тональности отзывов пользователей на различных языках

Generative models and adversarial training

Circuit Simulators: A Revolutionary E-Learning Platform

Knowledge Transfer in Deep Convolutional Neural Nets

A Case Study: News Classification Based on Term Frequency

Learning Methods in Multilingual Speech Recognition

Courses in English. Application Development Technology. Artificial Intelligence. 2017/18 Spring Semester. Database access

A study of speaker adaptation for DNN-based speech synthesis

A Neural Network GUI Tested on Text-To-Phoneme Mapping

OCR for Arabic using SIFT Descriptors With Online Failure Prediction

HIDDEN RULES FOR OFFICE HOURS W I L L I A M & M A R Y N E U R O D I V E R S I T Y I N I T I A T I V E

CS 446: Machine Learning

Learning to Schedule Straight-Line Code

Linking Task: Identifying authors and book titles in verbose queries

SIE: Speech Enabled Interface for E-Learning

ADVANCES IN DEEP NEURAL NETWORK APPROACHES TO SPEAKER RECOGNITION

Kindergarten Lessons for Unit 7: On The Move Me on the Map By Joan Sweeney

Physics 270: Experimental Physics

Pedagogical Content Knowledge for Teaching Primary Mathematics: A Case Study of Two Teachers

Sample Problems for MATH 5001, University of Georgia

New Project Learning Environment Integrates Company Based R&D-work and Studying

Australian Journal of Basic and Applied Sciences

Chinese Language Parsing with Maximum-Entropy-Inspired Parser

AN EXAMPLE OF THE GOMORY CUTTING PLANE ALGORITHM. max z = 3x 1 + 4x 2. 3x 1 x x x x N 2

FUZZY EXPERT. Dr. Kasim M. Al-Aubidy. Philadelphia University. Computer Eng. Dept February 2002 University of Damascus-Syria

arxiv: v2 [cs.cv] 30 Mar 2017

COSI Meet the Majors Fall 17. Prof. Mitch Cherniack Undergraduate Advising Head (UAH), COSI Fall '17: Instructor COSI 29a

Chapter 2 Rule Learning in a Nutshell

Create Quiz Questions

Probabilistic Latent Semantic Analysis

Natural Language Processing. George Konidaris

SOFTWARE EVALUATION TOOL

Beyond the Pipeline: Discrete Optimization in NLP

2/15/13. POS Tagging Problem. Part-of-Speech Tagging. Example English Part-of-Speech Tagsets. More Details of the Problem. Typical Problem Cases

Book Review: Build Lean: Transforming construction using Lean Thinking by Adrian Terry & Stuart Smith

Applications of memory-based natural language processing

UDL AND LANGUAGE ARTS LESSON OVERVIEW

Detecting English-French Cognates Using Orthographic Edit Distance

Unsupervised Learning of Word Semantic Embedding using the Deep Structured Semantic Model

Welcome to MyOutcomes Online, the online course for students using Outcomes Elementary, in the classroom.

STAT 220 Midterm Exam, Friday, Feb. 24

IAT 888: Metacreation Machines endowed with creative behavior. Philippe Pasquier Office 565 (floor 14)

Algebra 1, Quarter 3, Unit 3.1. Line of Best Fit. Overview

Transcription:

Trend #1: Scale driving Deep Learning progress Nuts and bolts of building AI applications using Deep Learning Trend #2: The rise of end-to-end learning Learning with integer or real-valued outputs: Learning with complex (e.g., string valued) outputs: Major categories of DL models 1. General neural networks 2. Sequence models (1D sequences) RNN, GRU, LSTM, CTC, attention models,. 3. Image models 2D and 3D convolutional networks 4. Advanced/future tech: Unsupervised learning (sparse coding, ICA, SFA, ), Reinforcement learning,.

End-to-end learning: Speech recognition Traditional model End-to-end learning: Autonomous driving Traditional model End-to-end learning End-to-end learning This works well given enough labeled (audio, transcript) data. Given the safety-critical requirement of autonomous driving and thus the need for extremely high levels of accuracy, a pure end-to-end approach is still challenging to get to work. End-toend works only when you have enough (x,y) data to learn function of needed level of complexity. Machine Learning Strategy Often you will have a lot of ideas for how to improve an AI system, what do you do? Good strategy will help avoid months of wasted effort. Traditional train/dev/test and bias/variance Say you want to build a human-level speech recognition system. You split your data into train/dev/test: Training (60%) Dev (20%) Test (20%) Human level error. 1% Training set error... 5% Dev set error.. 8% Compared to earlier eras, we still talk about bias and variance, but somewhat less about the tradeoff between them.

Basic recipe for machine learning Training error high? Automatic data synthesis examples Bigger model Train longer New model architecture (Bias) More data Regularization New model architecture (Variance) Dev error high? OCR Machine Learning Text against random backgrounds Speech recognition Synthesize clean audio against different background noise NLP: Grammar correction Synthesize random grammatical errors Sometimes synthesized data that appears great to human eyes is actually very impoverished in the eyes of ML algorithms, and covers only a minuscule fraction of the actual distribution of data. E.g., images of cars extracted from video games. Done! Different training and test set distributions Different training and test set distributions Say you want to build a speech recognition system for a new in- car rearview mirror product. You have 50,000 hours of general speech data, and 10 hours of in- car data. How do you split your data? This is a bad way to do it: Training General speech data (50,000 hours) Dev Better way: Make the dev and test sets come from the same distribution. Training (~50,000h) Training- Dev (20h) General speech data Dev (5h) Test (5h) In- car data Human level error... 1% Avoidable bias Test Training error... 1.1% In- car data (10 hours) Overfitting of training set Training- Dev error... 1.5% Having mismatched dev and test distributions is not a good idea. Your team may spend months optimizing for dev set performance only to find it doesn t work well on the test set. Dev set error. 8% Test set error. 8.5% Data mismatch Overfitting of dev set

New recipe for machine learning General Human/Bias/Variance analysis Training error high? Train-Dev error high? Dev error high? Bigger model Train longer More data Regularization Make training data more similar to test data. Data synthesis (Domain adaptation.) (Bias) (Variance) (Train-test data mismatch) Test error high? More dev set data (Overfit dev set) Done! Performance of humans Performance on examples you ve trained on Performance on examples you haven t trained on General speech data (50,000 hours) Human-level error Training error Training-Dev error Data mismatch In-car speech data (10 hours) (Carry out human evaluation to measure.) (Insert some in-car data into training set to measure.) Dev/Test error /degree of overfitting Human level performance You ll often see the fastest performance improvements on a task while the ML is performing worse than humans. Human-level performance is a proxy for Bayes optimal error, which we can never surpass. Can rely on human intuition: (i) Have humans provide labeled data. (ii) Error analysis to understand how humans got examples right. (iii) Estimate bias/variance. E.g., On an image recognition task, training error = 8%, dev error = 10%. What do you do? Two cases: Human level error. 1% Training set error... 8% Dev set error 10% Focus on bias. Human level error. 7.5% Training set error... 8% Dev set error 10% Focus on variance. Quiz: Medical imaging Suppose that on an image labeling task: Typical human.. 3% error Typical doctor... 1% error Experienced doctor. 0.7% error Team of experienced doctors. 0.5% error What is human-level error? Answer: For purpose of driving ML progress, 0.5% is best answer since it s closest to Bayes error.

AI Product Management AI Product Management The availability of new supervised DL algorithms means we re rethinking the workflow of how to have teams collaborate to build applications using DL. A Product Manager (PM) can help an AI team prioritize the most fruitful ML tasks. E.g., should you improve speech performance with car noise, café noise, for low- bandwidth audio, for accented speech, or improve latency, reduce binary size, or something else? What can AI do today? Some heuristics for PMs: If a typical person can do a mental task with less than one second of thought, we can probably automate it using AI either now or in the near future. For any concrete, repeated event that we observe (e.g., whether user clicks on ad; how long it takes to deliver a package;.), we can reasonably try to predict the outcome of the next event (whether user clicks on next ad). How should PMs and AI teams work together? Here s one default split of responsibilities: Product Manager (PM) responsibility Provide dev/test sets, ideally drawn from same distribution. Provide evaluation metric for learning algorithm (accuracy, F1, etc.) AI Scientist/Engineer responsibility Acquire training data Develop system that does well according to the provided metric on the dev/test data. This is a way for the PM to express what ML task they think will make the biggest difference to users. Machine Learning Yearning Book on AI/ML technical strategy. Sign up at http://mlyearning.org Thank you for coming to this tutorial!