A Shallow Introduction to Deep Learning by Rafael Espericueta

Similar documents
Module 12. Machine Learning. Version 2 CSE IIT, Kharagpur

Artificial Neural Networks written examination

A Neural Network GUI Tested on Text-To-Phoneme Mapping

Python Machine Learning

An Introduction to Simio for Beginners

Artificial Neural Networks

LEGO MINDSTORMS Education EV3 Coding Activities

Lecture 10: Reinforcement Learning

Lecture 1: Machine Learning Basics

Axiom 2013 Team Description Paper

Reinforcement Learning by Comparing Immediate Reward

Evolution of Symbolisation in Chimpanzees and Neural Nets

System Implementation for SemEval-2017 Task 4 Subtask A Based on Interpolated Deep Neural Networks

Notes on The Sciences of the Artificial Adapted from a shorter document written for course (Deciding What to Design) 1

Contents. Foreword... 5

Executive Guide to Simulation for Health

Evolutive Neural Net Fuzzy Filtering: Basic Description

Testing A Moving Target: How Do We Test Machine Learning Systems? Peter Varhol Technology Strategy Research, USA

Seminar - Organic Computing

Synthesis Essay: The 7 Habits of a Highly Effective Teacher: What Graduate School Has Taught Me By: Kamille Samborski

White Paper. The Art of Learning

OPTIMIZATINON OF TRAINING SETS FOR HEBBIAN-LEARNING- BASED CLASSIFIERS

Getting Started with Deliberate Practice

University of Groningen. Systemen, planning, netwerken Bosman, Aart

Knowledge Transfer in Deep Convolutional Neural Nets

Forget catastrophic forgetting: AI that learns after deployment

INPE São José dos Campos

Generative models and adversarial training

Proposal of Pattern Recognition as a necessary and sufficient principle to Cognitive Science

CSC200: Lecture 4. Allan Borodin

Introduction and Motivation

Circuit Simulators: A Revolutionary E-Learning Platform

Using focal point learning to improve human machine tacit coordination

Rover Races Grades: 3-5 Prep Time: ~45 Minutes Lesson Time: ~105 minutes

Learning Optimal Dialogue Strategies: A Case Study of a Spoken Dialogue Agent for

Lecture 1: Basic Concepts of Machine Learning

2013 DISCOVER BCS NATIONAL CHAMPIONSHIP GAME NICK SABAN PRESS CONFERENCE

On the Combined Behavior of Autonomous Resource Management Agents

Learning Methods for Fuzzy Systems

Deep search. Enhancing a search bar using machine learning. Ilgün Ilgün & Cedric Reichenbach

ADVANCED MACHINE LEARNING WITH PYTHON BY JOHN HEARTY DOWNLOAD EBOOK : ADVANCED MACHINE LEARNING WITH PYTHON BY JOHN HEARTY PDF

Radius STEM Readiness TM

Innovative Methods for Teaching Engineering Courses

No Child Left Behind Bill Signing Address. delivered 8 January 2002, Hamilton, Ohio

Predicting Students Performance with SimStudent: Learning Cognitive Skills from Observation

PREP S SPEAKER LISTENER TECHNIQUE COACHING MANUAL

The Round Earth Project. Collaborative VR for Elementary School Kids

File # for photo

Rule Learning With Negation: Issues Regarding Effectiveness

a) analyse sentences, so you know what s going on and how to use that information to help you find the answer.

WE GAVE A LAWYER BASIC MATH SKILLS, AND YOU WON T BELIEVE WHAT HAPPENED NEXT

COMPUTER-ASSISTED INDEPENDENT STUDY IN MULTIVARIATE CALCULUS

A Reinforcement Learning Variant for Control Scheduling

Course Outline. Course Grading. Where to go for help. Academic Integrity. EE-589 Introduction to Neural Networks NN 1 EE

Mathematics process categories

Knowledge based expert systems D H A N A N J A Y K A L B A N D E

Learning Structural Correspondences Across Different Linguistic Domains with Synchronous Neural Language Models

Learning to Schedule Straight-Line Code

SARDNET: A Self-Organizing Feature Map for Sequences

Discriminative Learning of Beam-Search Heuristics for Planning

How to make an A in Physics 101/102. Submitted by students who earned an A in PHYS 101 and PHYS 102.

AGENDA LEARNING THEORIES LEARNING THEORIES. Advanced Learning Theories 2/22/2016

Major Milestones, Team Activities, and Individual Deliverables

EECS 571 PRINCIPLES OF REAL-TIME COMPUTING Fall 10. Instructor: Kang G. Shin, 4605 CSE, ;

TD(λ) and Q-Learning Based Ludo Players

4.0 CAPACITY AND UTILIZATION

Test Effort Estimation Using Neural Network

Human Emotion Recognition From Speech

Designing a Computer to Play Nim: A Mini-Capstone Project in Digital Design I

A CONVERSATION WITH GERALD HINES

Exploration. CS : Deep Reinforcement Learning Sergey Levine

5 Guidelines for Learning to Spell

Chapter 4 - Fractions

HIERARCHICAL DEEP LEARNING ARCHITECTURE FOR 10K OBJECTS CLASSIFICATION

UNDERSTANDING DECISION-MAKING IN RUGBY By. Dave Hadfield Sport Psychologist & Coaching Consultant Wellington and Hurricanes Rugby.

"Be who you are and say what you feel, because those who mind don't matter and

Extending Place Value with Whole Numbers to 1,000,000

Lesson 12. Lesson 12. Suggested Lesson Structure. Round to Different Place Values (6 minutes) Fluency Practice (12 minutes)

Probabilistic Latent Semantic Analysis

Machine Learning from Garden Path Sentences: The Application of Computational Linguistics

KLI: Infer KCs from repeated assessment events. Do you know what you know? Ken Koedinger HCI & Psychology CMU Director of LearnLab

B. How to write a research paper

Cognitive Thinking Style Sample Report

How to learn writing english online free >>>CLICK HERE<<<

Page 1 of 11. Curriculum Map: Grade 4 Math Course: Math 4 Sub-topic: General. Grade(s): None specified

A Simple VQA Model with a Few Tricks and Image Features from Bottom-up Attention

Course Content Concepts

5. UPPER INTERMEDIATE

An OO Framework for building Intelligence and Learning properties in Software Agents

The Effect of Discourse Markers on the Speaking Production of EFL Students. Iman Moradimanesh

Software Maintenance

Model Ensemble for Click Prediction in Bing Search Ads

Unsupervised Learning of Word Semantic Embedding using the Deep Structured Semantic Model

LEARN TO PROGRAM, SECOND EDITION (THE FACETS OF RUBY SERIES) BY CHRIS PINE

A Pumpkin Grows. Written by Linda D. Bullock and illustrated by Debby Fisher

Speech Recognition at ICSI: Broadcast News and beyond

Graduate Division Annual Report Key Findings

MYCIN. The MYCIN Task

QuickStroke: An Incremental On-line Chinese Handwriting Recognition System

The Strong Minimalist Thesis and Bounded Optimality

Transcription:

Traditional AI vs Deep Learning A Shallow Introduction to Deep Learning by Rafael Espericueta Deep learning is one form of machine learning which is part of the field of artificial intelligence. Basically deep learning refers to artificial neural networks, and what make them deep is the presence of more than two layers, an input layer, one or more so called hidden layers, and an output layer. Traditional AI required a team of human experts as well as a team of expert programmers to create a program that basically has been hard-wired to deal with every conceivable option. Such systems have had success in many realms, but are limited to the logic programmed into them. They tend to be very brittle, buggy, and don't adjust well to minor changes in their inputs. Machine learning in traditional AI works due to the programmer's cleverness in selecting features in the data that can be used by various learning algorithms. The newer deep learning approach requires a far smaller team. One creates a neural net architecture capable of learning to do the desired task, and then having it learn how to do so, given lots of labeled examples (supervised learning). A more subtle version of this, where the feedback isn't so immediate (reinforcement learning), is used in cases where one doesn't have sufficient labeled data. You tell the system what output you want, given the input, and it figures out how to best accomplish that task. The features needed for the learning to take place are automatically selected in the early layers of the neural network, rather than needing to be hand-crafted by a clever monkey as in traditional AI. Any solution to the problem of general intelligence will require such an ability. One great accomplishment of traditional AI was IBM's grandmaster defeating chess program of the '90s, Deep Blue. Many years of effort on the part of programmers and chess masters alike were required to program in all the if this is true, then that, else if this than that, ad absurdum. This brute force approach requiring teams of experts and programmers working in tandem has not been a successful strategy when applied to more difficult problems, like vision and the other unsolved problems in AI. One of these more challenging problems has been to create a go-playing program that can defeat top human players. Go is an ancient game of strategy originating in China thousands of years ago. In Trevanian's best-selling novel Shibumi it was mentioned that Go is to chess as poetry is to doubleentry accounting. In any case, despite the surprising simplicity of the game's rules, go has a game tree with a vastly higher degree of branching than in chess. It's been estimated that the number of possible go games exceeds the number of possible chess games by a factor of. For decades this has been the holy grail of AI, and most experts in the field didn't believe we would attain this goal for at least another decade. A legendary $1,000,000 prize was offered for the first computer program that could defeat a human professional go player. That prize wasn't claimed by its expiration year 2000. Nonetheless, a $1,000,000 prize was finally claimed by the Google DeepMind team, for the success of their program AlphaGo, in 2016. In a match watched by millions around the world, AlphaGo defeated the 9-dan go master, Lee Sodol, in 4 out of 5 games in a televised and Internet-streamed match. Almost all had been expecting the machine to lose to this top-ranked go master, as no other go-playing program had come close to the level of a professional human player.

As great of an accomplishment this all was for the DeepMind team, more significant than this accomplishment was the way it was accomplished, using deep learning. Rather than the tedious traditional AI method described above (that was so successful in the case of chess), what the DeepMind team did was to create a deep neural network with random initial weights, that was capable of learning how to play a strong game of go. It was trained on about 100,000 strong human amateur games, trying to learn to predict where a human would play; then it played itself many millions of games to hone its skills. Interestingly, fairly early in its training it could easily defeat all those who had programmed it. This same neural network is capable of learning many other tasks besides just playing go. The AlphaGo project took far fewer man-hours than did the Deep Blue (chess) project, and AlphaGo attained its mastery far faster than any human has ever attained a comparable skill level. Interestingly, AlphaGo has improved significantly since its landmark victory. After AlphaGo's historic victory, many in the AI world began to wonder how many other hitherto unsolvable problems would yield to the power of deep learning. Indeed, the deep learning principles used in AlphaGo are generally applicable, and have already helped cross off many problems from AI's unsolved problems list. The rapidity with which this is happening is notable. Google realized the importance of DeepMind's work back in 2014, purchasing the company for about half a billion dollars. Recently Google open sourced it's own internal deep learning development framework, TensorFlow, now (by far) the most popular deep learning platform. Putting this tool into the hands of thousands of researchers and knowledge engineers around the world seemed a better strategy than trying to do it all internally. There are so many conceivable applications, the more people exploring the possibilities, the better. And those who come up with an interesting application of deep learning may find their resulting start-up facing a buy-out offer from Google, along with a lucrative job offer. Another of deep learning's recent accomplishments was in the field of computational vision. A deep neural network attained slightly better than human performance on the recognition of 1000 objects in the ImageNet dataset. The deep learning approach performed better than both human and previous AI attempts using more traditional techniques. Advances in computer vision are leading to a plethora of advances in applications of AI in diverse areas, including autonomous cars, drones, and more generally, robotics. In addition to the ImageNet example (and many other advances in computer vision), there have been comparable advances in other subfields of AI, including speech recognition and synthesis. Even automatic language translation has made huge advances using deep learning. Applications abound in the medical field, and promise to revolutionize the practice of medicine. Recently Google's DeepMind used deep learning to improve the power usage at its large data centers, by 40%. They are now negotiating to apply this technology to the entire electrical grid of Great Briton. This one breakthrough alone holds great promise to significantly improve the efficiency of all the worlds power grids. One begins to wonder if there's any area where deep learning techniques can't be fruitfully applied. The History of Artificial Neural Networks Artificial neural networks have been with us for about as long as digital computers. Many of the early pioneers of computer science were interested in this idea, since it seems so suggestive of the way biological brains work. After all, our brains form a sort of existence proof that artificial neural networks might lead to a system capable of intelligent perception and cognition.

Despite researchers' early interest in neural networks, it was only recently that we've developed the techniques needed to make deep learning work. The main reason for this long delay concerns Moore's Law every decade, computers are 1,000 times faster. We simply had to wait until sufficient computer power was available. As a tipping point was reached, the engineering of such networks underwent a rapid evolution. Thanks to computer gamers, GPUs (graphics processing units) were created that each contain thousands of computer cores. These allow certain computations such as are needed for graphics processing to be performed in parallel, and thus thousands of times faster than is possible for conventional CPUs. It turns out that GPUs can also be used to implement neural networks, and their general availability and low cost helped provide the computer power needed for successful neural network implementation. In the 1990's, AI researchers believed that neural networks weren't practical (which was pretty much true, given what passed for computers back then), and as a result researchers in the neural network field had great difficulty publishing papers at all. The advances mentioned above, along with many others through the years, have now turned the tides. Now it's becoming difficult to obtain funding for AI research that doesn't involve deep learning. GoogLeNet The yearly ImageNet competition is a competition to automatically identify 1000 objects in images. In earlier competitions, only tiny improvements over the previous years winning entry were sufficient to win, but in 2014 Google's entry to the ImageNet competition used deep learning to easily defeat all its rivals by a healthy margin. Their winning entry was a neural network architecture called GoogLeNet. Figure 1: Schematic of the GoogLeNet artificial network The diagram in Figure 1 is actually a simplification of the actual neural net. Many of the rectangles above represent large collections of parallel node layers. This network is capable of discerning a thousand different common objects that may be in an image, for example flowers. There is one particular layer within the GoogLeNet network that is maximally excited whenever it sees flowers. When any image is input to the network, the flower detecting layer tries to see flowers in the image. If one outputs that layer, one can see where the network was beginning to hallucinate flowers in the input image. I found that by feeding the image with the beginnings of flower hallucinations back into the input to the network, and outputting the flower detector results, the hallucinations became more vivid. After about five such feedback loops, the hallucinations become quite vivid, and then there is little further change. In Figure 2 you can see a photo of my wife Julie (off the coast of New Zealand), along with 5 iterates of flower hallucinations.

Figure 2a: Original picture Figure 2b: The hallucinations begin! Figure 2c: Hallucinations deepen... Figure 2d: And deepen... Figure 2e: The changes become less noticeable. Figure 2f: Further iterations change very little. I created animations of over a hundred sequences such as the above (the above, animated, is here), exploring the various inception layers of GoogLeNet. Not all these layers are as recognizable as the flower detector. Generally it's not individual layers that detect anything, but combinations of these layers. Using these layers as inputs, subsequent layers are able to accomplish their object recognition tasks. To see more of these animations, click on a thumbnail below (excepting the first):

Notice how the above thumbnail images, each the result of 5 hallucination iterations, nonetheless as a thumbnail resembles the original image (if you squint!), which is a bit surprising. This shows that much information from the original image is preserved in each of the hallucinated versions of it. The hallucinatory inception layers of GoogLeNet can be used for many other purposes besides the recognition of objects in images. If the last layers of the network are discarded, the earlier layers can serve as a starting point for other AI tasks. The vast amount of time that Google used in training GoogLeNet on millions of images can be leveraged to solve more specialized tasks, for example facial recognition of a particular person. One interesting application is termed style transfer. A neural network, grafted to the end of GoogLeNet (minus it's later layers), allows one to train the network to recognize the style of a particular artist. And what a network can recognize, it can also hallucinate. So with style transfer, one may input a photo, and get an output that resembles a particular artist's rendition of that photo. Deep learning has achieved comparable successes in the auditory realm as well as the visual, with the understanding of spoken speech using recurrent neural networks. It's now possible to automate the captioning of video with better than human-level performance. Music generation has also recently achieved surprising successes via deep learning. Doing the Math Consider the simple neural network depicted in Figure 3. We're going to slowly walk through this example to introduce the basic concepts. Figure 3: A simple deep neural network. To see what this neural network does, suppose the input values are and. The blue paths connecting the circular nodes (the neurons) have numeric weights which are applied to the input values as follows: To get the values of the hidden layer nodes and, we need to put the above results through a simple nonlinear filter. For this example, we'll use the so called sigmoid function (there are other possible nonlinear functions we could use here as well): Then the hidden layer neurons take on the values: We do this again to obtain the output value. This output node often also has a nonlinear function applied, but for this example we'll just output the value directly:

The above process can be written more succinctly using matrix notation. If and, then. Similarly, with, we have. This process is called feed forward, and is how the trained network makes its predictions. Next we examine the learning part of the process. What exactly constitutes learning for such a network? The behavior of such a network on given inputs is entirely determined by the weights along the paths, which can be gathered into weight matrices. These weights are ordinarily initially chosen with small random values. For our network, these values were picked arbitrarily, with and. To obtain a network that's useful, these weights need to be learned rather than being given a priori. In supervised learning we learn the weights using labeled training data. The training data consists of pairs of values like, along with corresponding labels, the output values observed (or desired) for those inputs. In the above example, if the label for our were in fact 0.128196834 (the output value we observed above), then our network would have correctly computed this value, and the weights would be right for this input/label pair. If the label were something else, then the weights would need to be modified in such a way that the output would be closer to the label for that input. The actual learning takes place via a process called back-propagation that allows us to propagate the observed error back through the network, adjusting all the weights in such a way that the network, given that input again, would compute a value closer to the label. In this way, given a number of labeled inputs, the network can iteratively modify its weights, learning the correct function input/output pairs. Memorizing the input/output pairs isn't the point though we want the network to learn to make reasonable predictions for inputs it hasn't seen. We want our neural net to generalize from its input training data set to predict the output from data it hasn't encountered. The back-propagation process amounts to minimizing a multivariate error function, by moving the weights small steps in the direction if the error function's negative gradient. A function's gradient points in the direction of maximum increase of the function; we seek the minimum of the function, and so need to go in the direction of the function's maximum decrease, the negative of its gradient. Back propagation starts with a training example. Suppose the initial training example were

, with all the weights as above. Again, this means that we want our network to output 3.0 when the input is. We've already calculated our network output (with the given network weights) for this input to be 0.128196834. The error at the output node is then computed: Error =, and we need to propagate this error backwards through the successive layers of the network using the same weights as we used before for the forward propagation process. A learning rate multiplier like is used to take a small step in the right direction (without this, one well might overshoot the optimal solution). Often this learning rate is decayed as the learning process proceeds to help the iterates converge, but for this example we'll keep it constant. We need to use the chain rule (from calculus) to correctly compute the gradient of the composition of matrix products and our nonlinear sigmoid function,. So let's propagate this error back through the network, updating the path weights as we go. The weight on the path from the top hidden node to the output node is then modified as follows: Similarly we perform an update on the weight on the path from the bottom hidden node to the output node: Next consider the sigmoid function. The chain-rule requires us to multiply our above values by the derivative of. As it turns out (the proof of this is left as an exercise): The values are precisely the values we obtained as our previous output from during the forward propagation process. By saving this value, we can easily now compute this derivative. Recall that the output from the top hidden node, just after was applied, was 0.9677045353. The corresponding derivative is Similarly for the lower hidden node, As we back-propagate the Error, it first is multiplied by the original path weight, and then times the derivative we just computed:..

Similarly: Now we adjust the weights from the input layer to the hidden layer using these back-propagated errors: It can be shown that with these new weights, our neural network will yield an output that's closer to the target value than our first attempt. By iterating this process many times, we can get ever closer to our target. The power of this process is that we can end up with a neural network that can accurately estimate values for inputs it hasn't seen before. The power of a neural network is that once trained, it can generalize to correctly deal with data it's never seen before.