Introduction to AI. Math in Machine Learning seminar (MiML) McGill Math and Stats (McMaS)

Similar documents
(Sub)Gradient Descent

Deep search. Enhancing a search bar using machine learning. Ilgün Ilgün & Cedric Reichenbach

Lecture 1: Machine Learning Basics

Module 12. Machine Learning. Version 2 CSE IIT, Kharagpur

System Implementation for SemEval-2017 Task 4 Subtask A Based on Interpolated Deep Neural Networks

Generative models and adversarial training

Python Machine Learning

arxiv: v1 [cs.lg] 15 Jun 2015

CSL465/603 - Machine Learning

CS Machine Learning

OPTIMIZATINON OF TRAINING SETS FOR HEBBIAN-LEARNING- BASED CLASSIFIERS

A Simple VQA Model with a Few Tricks and Image Features from Bottom-up Attention

Challenges in Deep Reinforcement Learning. Sergey Levine UC Berkeley

The Evolution of Random Phenomena

HIERARCHICAL DEEP LEARNING ARCHITECTURE FOR 10K OBJECTS CLASSIFICATION

Learning to Schedule Straight-Line Code

Autoregressive product of multi-frame predictions can improve the accuracy of hybrid models

Model Ensemble for Click Prediction in Bing Search Ads

Dual-Memory Deep Learning Architectures for Lifelong Learning of Everyday Human Behaviors

Unsupervised Learning of Word Semantic Embedding using the Deep Structured Semantic Model

Axiom 2013 Team Description Paper

Semantic Segmentation with Histological Image Data: Cancer Cell vs. Stroma

Testing A Moving Target: How Do We Test Machine Learning Systems? Peter Varhol Technology Strategy Research, USA

ADVANCED MACHINE LEARNING WITH PYTHON BY JOHN HEARTY DOWNLOAD EBOOK : ADVANCED MACHINE LEARNING WITH PYTHON BY JOHN HEARTY PDF

A Neural Network GUI Tested on Text-To-Phoneme Mapping

FUZZY EXPERT. Dr. Kasim M. Al-Aubidy. Philadelphia University. Computer Eng. Dept February 2002 University of Damascus-Syria

Modeling function word errors in DNN-HMM based LVCSR systems

Artificial Neural Networks written examination

Modeling function word errors in DNN-HMM based LVCSR systems

Knowledge Transfer in Deep Convolutional Neural Nets

Artificial Neural Networks

Lecture 1: Basic Concepts of Machine Learning

Machine Learning and Data Mining. Ensembles of Learners. Prof. Alexander Ihler

University of Groningen. Systemen, planning, netwerken Bosman, Aart

arxiv: v1 [cs.cv] 10 May 2017

Exploration. CS : Deep Reinforcement Learning Sergey Levine

Twitter Sentiment Classification on Sanders Data using Hybrid Approach

Discriminative Learning of Beam-Search Heuristics for Planning

MASTER OF SCIENCE (M.S.) MAJOR IN COMPUTER SCIENCE

Machine Learning from Garden Path Sentences: The Application of Computational Linguistics

Forget catastrophic forgetting: AI that learns after deployment

INPE São José dos Campos

QuickStroke: An Incremental On-line Chinese Handwriting Recognition System

A study of speaker adaptation for DNN-based speech synthesis

Using Deep Convolutional Neural Networks in Monte Carlo Tree Search

Learning Methods for Fuzzy Systems

Knowledge-Based - Systems

Distributed Learning of Multilingual DNN Feature Extractors using GPUs

Courses in English. Application Development Technology. Artificial Intelligence. 2017/18 Spring Semester. Database access

A New Perspective on Combining GMM and DNN Frameworks for Speaker Adaptation

An Introduction to Simio for Beginners

Mathematics process categories

Rule Learning With Negation: Issues Regarding Effectiveness

TD(λ) and Q-Learning Based Ludo Players

Introduction to Simulation

Top US Tech Talent for the Top China Tech Company

Abstractions and the Brain

CS224d Deep Learning for Natural Language Processing. Richard Socher, PhD

Learning Structural Correspondences Across Different Linguistic Domains with Synchronous Neural Language Models

An empirical study of learning speed in backpropagation

Using the Attribute Hierarchy Method to Make Diagnostic Inferences about Examinees Cognitive Skills in Algebra on the SAT

An OO Framework for building Intelligence and Learning properties in Software Agents

Evolutive Neural Net Fuzzy Filtering: Basic Description

Introduction to Ensemble Learning Featuring Successes in the Netflix Prize Competition

Notes on The Sciences of the Artificial Adapted from a shorter document written for course (Deciding What to Design) 1

SORT: Second-Order Response Transform for Visual Recognition

Deep Neural Network Language Models

PREDICTING SPEECH RECOGNITION CONFIDENCE USING DEEP LEARNING WITH WORD IDENTITY AND SCORE FEATURES

arxiv: v1 [cs.dc] 19 May 2017

Semi-Supervised GMM and DNN Acoustic Model Training with Multi-system Combination and Confidence Re-calibration

Calibration of Confidence Measures in Speech Recognition

Reinforcement Learning by Comparing Immediate Reward

POS tagging of Chinese Buddhist texts using Recurrent Neural Networks

Predicting Student Attrition in MOOCs using Sentiment Analysis and Neural Networks

Training a Neural Network to Answer 8th Grade Science Questions Steven Hewitt, An Ju, Katherine Stasaski

Test Effort Estimation Using Neural Network

CS 446: Machine Learning

Softprop: Softmax Neural Network Backpropagation Learning

arxiv: v1 [cs.lg] 7 Apr 2015

Objectives. Chapter 2: The Representation of Knowledge. Expert Systems: Principles and Programming, Fourth Edition

COMPUTER-ASSISTED INDEPENDENT STUDY IN MULTIVARIATE CALCULUS

Specification and Evaluation of Machine Translation Toy Systems - Criteria for laboratory assignments

Empowering Public Education Through Online Learning

Speech Recognition at ICSI: Broadcast News and beyond

arxiv: v1 [cs.cl] 27 Apr 2016

Knowledge based expert systems D H A N A N J A Y K A L B A N D E

Grade Band: High School Unit 1 Unit Target: Government Unit Topic: The Constitution and Me. What Is the Constitution? The United States Government

Iterative Cross-Training: An Algorithm for Learning from Unlabeled Web Pages

A Review: Speech Recognition with Deep Learning Methods

DIGITAL GAMING & INTERACTIVE MEDIA BACHELOR S DEGREE. Junior Year. Summer (Bridge Quarter) Fall Winter Spring GAME Credits.

Second Exam: Natural Language Parsing with Neural Networks

Automating the E-learning Personalization

A Case Study: News Classification Based on Term Frequency

Massachusetts Institute of Technology Tel: Massachusetts Avenue Room 32-D558 MA 02139

GACE Computer Science Assessment Test at a Glance

The Good Judgment Project: A large scale test of different methods of combining expert predictions

Switchboard Language Model Improvement with Conversational Data from Gigaword

AI Agent for Ice Hockey Atari 2600

Lecture 2: Quantifiers and Approximation

ISFA2008U_120 A SCHEDULING REINFORCEMENT LEARNING ALGORITHM

Transcription:

Introduction to AI Math in Machine Learning seminar (MiML) McGill Math and Stats (McMaS)

Background AI Artificial Intelligence is loosely defined as intelligence exhibited by machines Operationally: R&D in CS academic sub-disciplines: Computer Vision, Natural Language Processing (NLP), Robotics, etc

Artificial General Intelligence (AGI) AI : specific tasks, AGI : general cognitive abilities. (AGI) is a small research area within AI: build machines that can successfully perform any task that a human might do On account of this ambitious goal, AGI has high visibility, disproportionate to its size or present level of success, among futurists, science fiction writers, and the public.

Perspectives on Research in Artificial Intelligence and Artificial General Intelligence Relevant to DoD taken from a study by JASON for the Department of Defence (Dod)

Historical Context AI coined in 1956. Perceptrons 1960 implied machines could learn from data Decline in 1969 - perceptron no universal function approximator - only linear discriminator, can t learn XOR 1980 resurgence in AI expert systems. Learning rules. Petered out 1990s academic AI in doldrums Improved computers led, in 1997 to IBM Deep Blue beats champion Gary Kasparov in chess. Chess, once believed to require human intelligence, fell to a special-purpose very fast search algorithm.

Don t confuse AI with AGI 1997 NYT in response to Deep Blue: to play a decent game of Go [requires human intelligence] when that happens, will be a sign that AI is as good as the real thing 2016 NYT wrong: Google s AlphaGO beats world Champion Lee Sedol. Did not involve breakthrough - also using hybrid of DNN with massively parallel tree-search and Reinforcement Learning DNN require massive amounts of data which can be found labelled on the internet, or in the databases of private companies, like Facebook or Google, generated from a fast computer playing a lifetime of games

2010: Deep Learning Revolution Neural Networks have been around for half a century. Popular in the 1990 s for solving simple tasks. Starting around 2010, new hardware, Graphics Processor Units (GPU)s, became available, which allowed for much larger, and deeper networks. large labelled data sets become available, allowed for training.

2010: Deep Learning Revolution The large data set ImageNet was available in 2005. In 2012 Alexnet, trained on GPUs, won the 2012 ImageNet competition, with an error of 15.3%, more that 10% better than the runner up. Canadian (U Toronto) team: Alex Krizhevsky, Geoffrey Hinton, and Ilya Sutskever. Between 2011 and 2015, error rate for image captioning by computer fell from 25% to 3%, better than accepted human figure of 5% more than 95% prediction correct caption (green column)

DNN and Images

DNN exceed human performance in: some kinds of image recognition spoken word recognition the game of Go (long thought to require generalized human intelligence AGI) self-driving cars: now more limited by policy than tech.

rapidly advancing areas Reinforcement Learning graphical and Bayes models, esp. with probability programming models generative models (creating artificial images) more likely DL will become essential building block of a hybrid approach

Reinforcement Learning Learn how to play Atari from raw image pixels. Learn how to beat Go champions (Huge branching factor) Robots learning how to walk Big in Montreal: Google DeepMind and Microsoft Research both work in this area

Generative Models A generative model takes a input random vectors and outputs realistic images (of a certain class)

Generative Models discriminator: has learned what a picture looks like. generator: tries to generate a believable picture.

Hardware Both training (finding the best weights) and inference (evaluating the output of the network on a data point) are computationally intensive. Effort is measured in Joules. Hardware, software, and algorithms have evolved together. Currently, need to use Graphics Processing Units (GPU)s, rather than CPUs.

Hardware Want to do inference on mobile devices, need custom architectures, and custom hardware. Currently engineering practice to take trained networks and make them smaller. Also build custom chips with power source Research problem: design and train architectures with efficient inference in mind

Machine Learning vs DL Traditional Machine Learning (ML) can t compete with the raw performance of Deep Learning 425 500 - amazon.com However ML has performance guarantees which are important in the many applications where errors are costly.

Error Estimates Using probability (Central Limit Theorem) and linear or parametric models, can fit data, and also estimate the probability of an error Deep Learning models lack these estimates on errors!

<latexit sha1_base64="3mrmvwbthhzx8q0rthpq9fwqi7o=">aaab/3icbvbns8naej3ur1q/ooixl4tfeiss9kixoedfywx7aw0im+2mxbqbhn2ngmip/huvhpti1b/hzx/jthxr1gcdj/dmmjkxjjwp7tifvmfpewv1rbhe2tjc2t6xd/eakk4loq0s81i2a6wozxftaky5bsesyhfw2gqglxo/duulynf0o7oeegl3ixyygrwrfpsgqxeoq1kb7nyg7k2dosbnvl12ks4u6ie486rcq2nua4c6b390ezfjby004vipjltntjdjqrnhdftqpoommaxxn3ymjbcgysun94/qsvf6kiylquijqfp7isdcquweplngpvdz3kt8z+ukojz3chylqayrms0ku450jczhob6tlgiegykjzozwrazyyqjnzcutwslli6rzrbhoxb02avrhhiicwhgcgatnuimrqemdcdzae7zaq/vopvtj623wwrc+z/bhd6z3l3yxld8=</latexit> <latexit sha1_base64="kifqgqw5gdu6inuwanqn5k7mjb0=">aaab/3icbvdlssnafj3uv62vqodgzwarbkek3ehgkajisoj9qbvczdjph85mwsxedbelf8wnc6w49qvcu/nvnlyi2nrgwugce7n3nibhvgnh+bqkc4tlyyvf1dla+sbmlr2901rxkjfp4jjfsh0grrgvpkgpzqsdsij4wegrgjyp/dynkyrg4lpncfe46gkauyy0kxx7l4nnsktsdm99cu9mhcpap75ddirobpchulokximjepeejuq+/deny5xyijrmskmow020lyopkwzkwoqmiiqid1cpdawvibpl5zp7h/dqkcgmymlkadhrf0/kicuv8cb0cqt7atybi/95nvrhp15orzjqivb0uzqyqgm4dgogvbkswwyiwpkawyhui4mwnpgvtahzl8+tzrxiohx3yqrrbvmuwt44aefabsegbi5bhtqabvfgetydf+vberjg1uu0twb9z+ycp7devgd8xje+</latexit> <latexit sha1_base64="kifqgqw5gdu6inuwanqn5k7mjb0=">aaab/3icbvdlssnafj3uv62vqodgzwarbkek3ehgkajisoj9qbvczdjph85mwsxedbelf8wnc6w49qvcu/nvnlyi2nrgwugce7n3nibhvgnh+bqkc4tlyyvf1dla+sbmlr2901rxkjfp4jjfsh0grrgvpkgpzqsdsij4wegrgjyp/dynkyrg4lpncfe46gkauyy0kxx7l4nnsktsdm99cu9mhcpap75ddirobpchulokximjepeejuq+/deny5xyijrmskmow020lyopkwzkwoqmiiqid1cpdawvibpl5zp7h/dqkcgmymlkadhrf0/kicuv8cb0cqt7atybi/95nvrhp15orzjqivb0uzqyqgm4dgogvbkswwyiwpkawyhui4mwnpgvtahzl8+tzrxiohx3yqrrbvmuwt44aefabsegbi5bhtqabvfgetydf+vberjg1uu0twb9z+ycp7devgd8xje+</latexit> <latexit sha1_base64="tlykp0jeujhmlgyozak1q58na7m=">aaab/3icbvdlssnafl2pr1pfucgnm8eicejjutgnuhdjsoj9qbvczdpph84kywaihtifv+lghsju/q13/o3tb6ktby4czrmxe+8jes6udpwvq7c0vlk6vlwvbwxube/yu3tnfaes0aajeszbavaus4g2nnocthnjsqg4bqxdy7hfuqvssti60vlcpyh7eqszwdpivn2qoqvuvaladz5d96zoueaz3y47fwcc9epcevkggeq+/dntxsqvnnkey6u6bjxrxo6lzottuambkppgmsr92je0woiql5/cp0lhrumhmjamio0m6u+jhaulmhgytoh1qm17y/e/r5pq8nzlwzskmkzkuihmodixgoebekxsonlmccasmvsrgwcjitarluwicy8vkma14jov99op16qzoipwcedwai6cqq2uoa4nipaat/acr9aj9wy9we/t1oi1m9mhp7a+vggnzzrx</latexit> <latexit sha1_base64="cvgmpzvx4hqlf7sowtvlzxz8suo=">aaaccxicbva9swnbej2lxzfqjfralayhaql3abqrajywfomyd0hi2ntskiw7d8funngeaw38kzywsrd1h9jz+vpcjcka+gdg8d4mm/pcgdolbfvdsiwtr6yujddtg5tb6e3mzm5n+aektep87sugixxlzknvztsnjubslfxo6+7wbolxb6huzpeudbtqtsb9j/uywdpinqy6q6eopvhf4ov4kl5ur7kop5eevs1fr3a+k8nabxsk9eocezitpcevtwaodzlvra5pqke9tthwqukua92osdsmcdpktujfa0ygue+bhnpyunwop5+m0kfruqjns1oerlp190smhvkrce2nwhqg5r2j+j/xdhxvpb0zlwg19chsus/ksptoegvqmkmj5pehmehmbkvkgcum2osxmiesvlxiaswcyxecikmjcdmkyr8oiacohemjzqemvsbwd4/wdc/wg/vkja3xwwvc+p7zgz+w3r4aidoamg==</latexit> <latexit sha1_base64="midjfsw67e+258dkov97re2uxtu=">aaaccxicbvdpswjbfj61x2zlwx2dgjjaiwtxs10couuhdhqtcmoyo446olo7zmxg2+kxlv0rxtou0rx/oft/q/9esxpr2gcppr7vpd57nxswkpvlfriphcwl5zx0amztfso7aw5t16qfckwc7dnfnfwkcamecrrvjdqcqrb3gam7w9per18tianvxaooig2o+h7tuyyuljomviunscvpn6or+ikco6n8vegkjm7y0afv6jg5q2hnah+iputy5ey4+nm3n650zpdw18chj57cdenzteubasdikiozgwvaosqbwkpuj01npcsjbmett0bwqctd2poflk/bifp7ikzcyoi7upmjnzczxil+5zvd1ttux9qlqku8pf3ucxlupkxigv0qcfys0grhqfwtea+qqfjp8di6hlmx50mtvlstol3vaztafgmwc/zbhtjgcjtbgagab2bwdx7bm3gxhowny2y8tlttxvfmdvgd4+0l/mmbma==</latexit> <latexit sha1_base64="midjfsw67e+258dkov97re2uxtu=">aaaccxicbvdpswjbfj61x2zlwx2dgjjaiwtxs10couuhdhqtcmoyo446olo7zmxg2+kxlv0rxtou0rx/oft/q/9esxpr2gcppr7vpd57nxswkpvlfriphcwl5zx0amztfso7aw5t16qfckwc7dnfnfwkcamecrrvjdqcqrb3gam7w9per18tianvxaooig2o+h7tuyyuljomviunscvpn6or+ikco6n8vegkjm7y0afv6jg5q2hnah+iputy5ey4+nm3n650zpdw18chj57cdenzteubasdikiozgwvaosqbwkpuj01npcsjbmett0bwqctd2poflk/bifp7ikzcyoi7upmjnzczxil+5zvd1ttux9qlqku8pf3ucxlupkxigv0qcfys0grhqfwtea+qqfjp8di6hlmx50mtvlstol3vaztafgmwc/zbhtjgcjtbgagab2bwdx7bm3gxhowny2y8tlttxvfmdvgd4+0l/mmbma==</latexit> <latexit sha1_base64="v8zsjk0tq8ctlw6tmbrjzh6dnco=">aaaccxicbva9swnben3zm8avu0ubxsakioeujtzcwmbcioqxbjiy9jz7yzldvwn3tzyptdb+frslrwz9b3b+g/esijr4yodx3gwz8/yiuaud58tawfxaxlnnrexxnza3tu2d3boky4mjh0mwyqapfgfuee9tzugzkgrxn5ggpzzl/mytkyqg4loneelw1bc0obhpi3vtea9pyvvrpkc36rw58ebfpjrjhn0vkyon1lulttkza/4qd5yuwbs1rv3z7ou45krozjbslbcs6u6kpkaykvg+hsssitxefdiyvcbovccdfzkch0bpwscuposgy/x3riq4ugn3tsdheqbmvuz8z2vfojjppfressyctxyfmym6hfksseclwzolhiasqbkv4ggscgstxt6empfypklxyq5tdi+dqruyjsmh9sebkaixhimqoac14aemhsateagv1qp1bl1z75pwbws6swf+wpr4bimzl/s=</latexit> Neural Network Architecture y = X w i x i + b i z = ReLU (y) = max(y, 0)

Convolutional Neural Nets Deep NN: allows different weights everywhere. Convolutional NN: special case, for images, where weights are nonzero only for nearby neighbors (at the input level and later). In addition, for each layer, the pattern of the weights is the same at every location. Significantly reduces the total number of weights per layer, allowing for much deeper networks.

More architecture

<latexit sha1_base64="bhdihtjtp20hlfmpsytytweynsc=">aaacdhicbvhlbtnafb2bvzgvfbysyhehrprivwrnqyu2lbqabysikbzsbfnj8xuyynhs5he3svif/b07pomnayypiwi50khnzrmpmxpzrnbtouih59+5e+/+g4ohwapht54+6x0+v9c1vqwnrba1usqprseltgw3aq8ahbtkbv7mi7onfrlepxktv5hvg2lfz5kxnfhjqkz37qnfooscggrhdcbdyxa0r6ehxa3cfrgclqving2tfhdawfx71petn3nokc/mxqw2yzbmg6timuvanxwateni8kvls0hkrrneabisbaumo0gigd00d52oytdmgcrp1utho2gb8afen0gf7om8631piprzcqvhgmo9jcenstuqdgcc10fintaulegmpw5kwqfou61pazhytaflrdyrbrbs3xudrbrevbnlrkiz65vahvyfnrwmpek7lhtrulldonikmdvsngafv8imwdlamelurcdm1llk3j4cz8ktl98gf+nrhi3iz+p+6dnejgpyirwlaxktd+sufctnzeiy+em99mb74/3yx/t9/2ix6nv7mhfkn/bhvwevwlf3</latexit> <latexit sha1_base64="bhdihtjtp20hlfmpsytytweynsc=">aaacdhicbvhlbtnafb2bvzgvfbysyhehrprivwrnqyu2lbqabysikbzsbfnj8xuyynhs5he3svif/b07pomnayypiwi50khnzrmpmxpzrnbtouih59+5e+/+g4ohwapht54+6x0+v9c1vqwnrba1usqprseltgw3aq8ahbtkbv7mi7onfrlepxktv5hvg2lfz5kxnfhjqkz37qnfooscggrhdcbdyxa0r6ehxa3cfrgclqving2tfhdawfx71petn3nokc/mxqw2yzbmg6timuvanxwateni8kvls0hkrrneabisbaumo0gigd00d52oytdmgcrp1utho2gb8afen0gf7om8631piprzcqvhgmo9jcenstuqdgcc10fintaulegmpw5kwqfou61pazhytaflrdyrbrbs3xudrbrevbnlrkiz65vahvyfnrwmpek7lhtrulldonikmdvsngafv8imwdlamelurcdm1llk3j4cz8ktl98gf+nrhi3iz+p+6dnejgpyirwlaxktd+sufctnzeiy+em99mb74/3yx/t9/2ix6nv7mhfkn/bhvwevwlf3</latexit> <latexit sha1_base64="bhdihtjtp20hlfmpsytytweynsc=">aaacdhicbvhlbtnafb2bvzgvfbysyhehrprivwrnqyu2lbqabysikbzsbfnj8xuyynhs5he3svif/b07pomnayypiwi50khnzrmpmxpzrnbtouih59+5e+/+g4ohwapht54+6x0+v9c1vqwnrba1usqprseltgw3aq8ahbtkbv7mi7onfrlepxktv5hvg2lfz5kxnfhjqkz37qnfooscggrhdcbdyxa0r6ehxa3cfrgclqving2tfhdawfx71petn3nokc/mxqw2yzbmg6timuvanxwateni8kvls0hkrrneabisbaumo0gigd00d52oytdmgcrp1utho2gb8afen0gf7om8631piprzcqvhgmo9jcenstuqdgcc10fintaulegmpw5kwqfou61pazhytaflrdyrbrbs3xudrbrevbnlrkiz65vahvyfnrwmpek7lhtrulldonikmdvsngafv8imwdlamelurcdm1llk3j4cz8ktl98gf+nrhi3iz+p+6dnejgpyirwlaxktd+sufctnzeiy+em99mb74/3yx/t9/2ix6nv7mhfkn/bhvwevwlf3</latexit> <latexit sha1_base64="bhdihtjtp20hlfmpsytytweynsc=">aaacdhicbvhlbtnafb2bvzgvfbysyhehrprivwrnqyu2lbqabysikbzsbfnj8xuyynhs5he3svif/b07pomnayypiwi50khnzrmpmxpzrnbtouih59+5e+/+g4ohwapht54+6x0+v9c1vqwnrba1usqprseltgw3aq8ahbtkbv7mi7onfrlepxktv5hvg2lfz5kxnfhjqkz37qnfooscggrhdcbdyxa0r6ehxa3cfrgclqving2tfhdawfx71petn3nokc/mxqw2yzbmg6timuvanxwateni8kvls0hkrrneabisbaumo0gigd00d52oytdmgcrp1utho2gb8afen0gf7om8631piprzcqvhgmo9jcenstuqdgcc10fintaulegmpw5kwqfou61pazhytaflrdyrbrbs3xudrbrevbnlrkiz65vahvyfnrwmpek7lhtrulldonikmdvsngafv8imwdlamelurcdm1llk3j4cz8ktl98gf+nrhi3iz+p+6dnejgpyirwlaxktd+sufctnzeiy+em99mb74/3yx/t9/2ix6nv7mhfkn/bhvwevwlf3</latexit> <latexit sha1_base64="qq9/liwmtihlk7bmx0smtir6pdy=">aaacc3icbvc7sgnbfl3rm8zx1njmsbaigbcbqm2eqboliwjmaxmss5njmmr2dpmznyqlvy2f4c/ywchi6w/y+tdoehfnphdhcm693hupf3kmtg1/wkvlk6tr64mn5obw9s5uam+/qojielohaq9k3cokcizortpnat2ufpsepzvvujr4tvsqfqvetr6ftoxjnmbdrra2kptkd9uxydljdi6gbyfyqkndgzocexy7q3szhr67qyydt6dap8szj5kionm4aycym/podgis+vrowrfsdacq6lampwae03gygskayjlapdowvgcfqly8/wwmjozsqd1amhiatdxfezh2lrr5nun0se6rew8i/uc1it09a8vmhjgmgswwdsoodiamwaaok5ropjiee8nmryj0screm/isjosflxdjtzb37lxzzdiowqwjoiq0zmgbuyjcbzshagtu4bge4cw6t56sv+tt1rpkfc8cwb9y71/wujp9</latexit> <latexit sha1_base64="srpjo9nk3lz6/ikjm3yxgf88ae0=">aaacc3icbvc7sgnbfj2nrxhfuuubiugibmjucruramkslbiwd0g2y+zsjbkyo7vmzbrikt7gt/axbbqusfuh7pwzczkiaokbc4dz7uxee9yqualm88nilc2vrk4l11mbm1vbo+ndvbomiofjdqcsee0xsciojzvffspnubdku4w03ef54jeuija04jdqfblbrz1ouxqjpsunnrl2yp63xvamdjsc5qgnha7bhlkmoun4krseoemswtcngd/emifzejy+qz5+xlec9hvbc3dke64wq1k2rgko7bgjrtej41q7kireeib6pkuprz6rdjz9zqwptelbbib0cqwn6u+jgplsjnxxd/pi9ew8nxh/81qr6p7amevhpajhs0xdieevwekw0kocymvgmiasql4v4j4sccsdx0qhspdyiqkxc5zzsko6jtkyiqkoqabkgavoqamcgwqoaqxuwd14as/grffgvbivs9ae8t2zd/7aepscqb+cmg==</latexit> <latexit sha1_base64="srpjo9nk3lz6/ikjm3yxgf88ae0=">aaacc3icbvc7sgnbfj2nrxhfuuubiugibmjucruramkslbiwd0g2y+zsjbkyo7vmzbrikt7gt/axbbqusfuh7pwzczkiaokbc4dz7uxee9yqualm88nilc2vrk4l11mbm1vbo+ndvbomiofjdqcsee0xsciojzvffspnubdku4w03ef54jeuija04jdqfblbrz1ouxqjpsunnrl2yp63xvamdjsc5qgnha7bhlkmoun4krseoemswtcngd/emifzejy+qz5+xlec9hvbc3dke64wq1k2rgko7bgjrtej41q7kireeib6pkuprz6rdjz9zqwptelbbib0cqwn6u+jgplsjnxxd/pi9ew8nxh/81qr6p7amevhpajhs0xdieevwekw0kocymvgmiasql4v4j4sccsdx0qhspdyiqkxc5zzsko6jtkyiqkoqabkgavoqamcgwqoaqxuwd14as/grffgvbivs9ae8t2zd/7aepscqb+cmg==</latexit> <latexit sha1_base64="purzvfojsyoixn1fec1w6f1xk1w=">aaacc3icbvdlssnafj34rpuvdelmabeqhzj0oxuh0i0lfxxsa9o0tcatduhkemymhhk6d+ovuhghift/wj1/4/sbaoubc4dz7uxee7yyuaks68tyw9/y3nro7er39/ypds2j45ameofje0cseh0pscioj01ffsodwbaueoy0vvf96rfviza04ndqhbmnranoa4qr0pjrftj+xsv2bf7btm9hgfrk5bdhkceqm8kbunrumkwrys0af4i9tipggyzrfvb8ccch4qozjgxxrsbkyzbqfdmyyfcsswker2hauppyfblpzlnfjvbmkz4miqglkzhtf09kkjryhhq6m0rqkje9qfif101ucolklmejihzpfwujgyqc02cgtwxbio01qvhqfsveqyqqvjq+va5h5evv0qpwbkti31rfwn0rrw6cggioartcgbq4bg3qbbg8gcfwal6nr+pzedpe561rxmlmbpyb8fentqmy6g==</latexit> Training Given data x i, labels y(x i )andnetworku(x; w) withweightsw min w L(w) 1 n X sum is over a large number (millions) of data points. Instead approximate sum by a random subset (mini-batch) of hundreds of data points. minimize over weights by stochastic gradient descent (SGD): taking a small step in the gradient direction. Step size is called learning rate. SGD has a faster version: Nesterov s Momentum, which adds a momentum term to the update. The gradient is computed (automatically by software) using the chain rule w n+1 = w n + dt n r w L(w) i `(u(x i ; w),y(x i ))

<latexit sha1_base64="7xgexthan5otpeedgk0uhkj/lx0=">aaacb3icbvbns0jbfl3pvsy+rjzbdelgbplew1sbqhat0cigp0dn5o2jds6b95izl4i6a9op6a+0avfe2/5cu/5no0audubyd+fcy8w9xsiz0rb9acxm5hcwl+llizxvtfwn5ozwuqwrjlraah7isocv5uzqgmaa03iokfy9tktejzfys7dukhaik90lac3hlcgajgbtphpy9zzdpucn6gludlgvhorxqkbbd1b3r916mmvn7dhqd3gmssqljh5uacbft35ugwgjfco04vipiuogutbhujpc6tbrjrqnmengfq0ykrbpva0/vmoi9o3sqm1amhiajdxfg33sk9xzptppy91w095i/m+rrlp5uuszeuaacjj5qblxpam0cgu1mkre854hmehm/opig0tmtikuyukyoxmwfn2my2ecs5ngdiaiww7sqrocoiysneeeckdgdh7hgv6se+vjerxejqmx63tng/7aev8cn8oyfg==</latexit> <latexit sha1_base64="6ay+e2rpliaoi69aik1hsh9eqbe=">aaacb3icbvdlsgmxfm3uv62vuzecbitqecrmlnsnuohgxeul9gfthtjp2ozmmkossdrpd278ch/ajqulupux3pkzytqkapxa5r7ouzfkhi9kvcrlejcsc/mli0vj5dtk6tr6hrm5vzzbjdap4yafouohsrjlpksoyqqacoj8j5gk182p/co1ezig/fl1q9lwuzvtfsviack1d88zvqn4ci/g7rdwssgpczgc9aauc+w4ztrkwhpab2lpknqoht0xrx83bdd8qzcdhpmek8yqldxbcvujrkjrzmgwvy8kcrhuojapacqrt2qjntwxhptaacjwihrxbsfqz40y+vl2fu9p+kh15kw3fv/zapfqntriysniey6nd7uibluax6hajhuek9bxbgfb9v8h7icbsnlrpxqif07+s8po1raydlgnkqdtjmeo2amzyinjkannoabkainb8acewmi4mx6nz+nlopowvna2ws8yr58kypqb</latexit> <latexit sha1_base64="6ay+e2rpliaoi69aik1hsh9eqbe=">aaacb3icbvdlsgmxfm3uv62vuzecbitqecrmlnsnuohgxeul9gfthtjp2ozmmkossdrpd278ch/ajqulupux3pkzytqkapxa5r7ouzfkhi9kvcrlejcsc/mli0vj5dtk6tr6hrm5vzzbjdap4yafouohsrjlpksoyqqacoj8j5gk182p/co1ezig/fl1q9lwuzvtfsviack1d88zvqn4ci/g7rdwssgpczgc9aauc+w4ztrkwhpab2lpknqoht0xrx83bdd8qzcdhpmek8yqldxbcvujrkjrzmgwvy8kcrhuojapacqrt2qjntwxhptaacjwihrxbsfqz40y+vl2fu9p+kh15kw3fv/zapfqntriysniey6nd7uibluax6hajhuek9bxbgfb9v8h7icbsnlrpxqif07+s8po1raydlgnkqdtjmeo2amzyinjkannoabkainb8acewmi4mx6nz+nlopowvna2ws8yr58kypqb</latexit> <latexit sha1_base64="vkhpjlkeqzrmybifhjqbizone+0=">aaacb3icbvdlsgmxfm3uv62vuzecbitqecrmbhqjflorcvhbpqadh0yatqgzzjbklgxanrt/xy0lrdz6c+78gzntew09clmhc+4lucepgjxksr6mznlyyupadj23sbm1vwpu7tvkgatmqjhkowj4sbjgoakqqhhprikgwgek7vflqv+/j0lskn+qyutcahu57vcmljy88/cqmdibf/a6baewrsjjwcjhaddyndvhm/nw0zoa/hb7nutbdbxp/gy1qxwhhcvmkjrn24mumychkgzkngvfkkqi91gxndxlkcdstsz3jogxvtqwewpdxmgj+nsjqyguw8dxkwfsptnvpej/xjnwnxm3otykfef4+lanzlcfma0ftqkgwlghjgglqv8kcq8jhjwolqddwdh5kdscom0v7rsrxyrp4sica3aecsagz6aelkefvaegd+ajvibx49f4nt6m9+loxpjt7im/md6+are0lus=</latexit> <latexit sha1_base64="awvnnrfn/h+ajixbwucdpoxt5fs=">aaacbxicbvc7sgnbfl0bxzg+opzadayhiotdfgojbnkiweqwd0iwoduzjenmz5ezwupyplhxk+xtlbsx9r/s/bsniyhgd1zu4zx7mbnhczlt2ry/rmtc/mliuni5tbk6tr6r3tyqqccshjzjwanz87cinala1kxzwgslxb7hadxrfcd+9yzkxqjxpqchdx3ceazncnzgaqz3z7p9a3sklsbtedvoqbgpbbr2h02nmc7yoxsc9e2cwzipokp7awaondpvjvzaip8ktthwqu7kq+3gwgpgob2lgpgiisy93kf1qwx2qxljyrujtg+ufmoh0ptqakl+3iixr9ta98ykj3vxzxpj8t+vhun2irszeuaacjj9qb1xpam0jgs1mkre84ehmehm/opif0tmtakuzul4c/jfusnnhdvnxjo0ijbfenzgd7lgwdeu4axkuayct/aat/bs3vmp1ov1oh1nwf872/al1tsnabax2q==</latexit> <latexit sha1_base64="uxo1olhgseo3jgo0njbjeeyaw0y=">aaacbxicbvdlsgmxfm3uv62vqktdbitqecpmf+pgkhqj4qif+4b2gdjppg3nziyky6ntbtz4fe7dufckw//bnt8jpq2ith643mm595lc44amsmwah0ziyxfpesw5mlpb39jcsm/vvguqcuwqogcbqltieky5qsiqgkmhgidfzatmdotjv3zdhkqbv1b9kng+anpquyyulpz0/mw2dwtp4dw4hcmmcsvlayed3scxnhtgzjktwb9izzjmaz48leeftyun/d5sbtjycveyiskbvj5udoyeopiryaozsrii3evt0tcui59io55cmyshwmlblxc6uiit9fdgjhwp+76rj32konlwg4v/ey1iewd2thkykclx9cevylafcbwjbffbsgj9trawvp8v4g4sccsdxeqhmhfypknmc5azs8o6jskyign2wahiagucggk4acvqarjcgufwdf6me+pjgbmv09ge8b2zc/7aepsc0w6z9g==</latexit> <latexit sha1_base64="uxo1olhgseo3jgo0njbjeeyaw0y=">aaacbxicbvdlsgmxfm3uv62vqktdbitqecpmf+pgkhqj4qif+4b2gdjppg3nziyky6ntbtz4fe7dufckw//bnt8jpq2ith643mm595lc44amsmwah0ziyxfpesw5mlpb39jcsm/vvguqcuwqogcbqltieky5qsiqgkmhgidfzatmdotjv3zdhkqbv1b9kng+anpquyyulpz0/mw2dwtp4dw4hcmmcsvlayed3scxnhtgzjktwb9izzjmaz48leeftyun/d5sbtjycveyiskbvj5udoyeopiryaozsrii3evt0tcui59io55cmyshwmlblxc6uiit9fdgjhwp+76rj32konlwg4v/ey1iewd2thkykclx9cevylafcbwjbffbsgj9trawvp8v4g4sccsdxeqhmhfypknmc5azs8o6jskyign2wahiagucggk4acvqarjcgufwdf6me+pjgbmv09ge8b2zc/7aepsc0w6z9g==</latexit> <latexit sha1_base64="qureyautr9ztow0pxyawnj0me+u=">aaacbxicbvdlsgmxfm34rpu16lixwsjuhdltjw6eqjciliryb7tdkekzbwgmgzkmpbtdupfx3lhqxk3/4m6/mdmw0dydl3s4516se4kyuaud58tawl5zxvvpbgq3t7z3du29/zosicskigutshegrrjlpkqpzqqrs4kigjf60cunfv2eseufv9odmhgr6naauoy0kxz76drfp4wx8cztz7bfykwz4hduh/mub+ecgjmb/chupmmbgsq+/dlqc5xehgvmkfjntxhrb4ikppircbavkbij3emd0jsuo4gobzi5ygxpjnkgozcmuiyt9ffgeevkdalatezid9w8l4r/ec1ehxfekpi40ytj6unhwqawmi0etqkkwlobiqhlav4kcrdjhlujlmtcwdh5kdskbdcpuldorlsexzebh+ay5ielzkejxiekqaimhsateagv1qp1bl1z79prjwu2cwd+wpr4bt/4lky=</latexit> Technical Details Training CNNs is buggy. Gradients can be zero, causing training to stall, or can blow up. Lots of hacks or heuristics used to help. J(w) =L(w)+ w 2 2 J(w) =L(w)+ w 1 Regularization: the loss function is nonconvex. So add an extra term to make it convex, and keep the weights small. Called weight decay. Data augmentation: cutout: change the images at each stage, keeping same labels Dropout: randomly set half the weights to zero at each iteration. (Heuristic which helps) batch normalization: normalize the input data to each neuron, to be mean zero var = 1.

Challenges for rigorous deep learning it is not clear that the existing AI paradigm is immediately amenable to any sort of software engineering validation and verification. This is a serious issue, and is a potential roadblock to DoD s use of these modern AI systems, especially when considering the liability and accountability of using AI in lethal systems. JASON report (italics mine)

Evolution of engineering discipline

Importance of -ilities Reliability maintainability accountability verifiability evolvability attackability

Challenge: Adversarial Examples Goodfellow, Explaining and Harnessing Adversarial Examples, 2015

Hot current topic: next time we will talk about our progress on it.