Learning Blackjack. Anne-Marie Bausch. May 31, 2016

Similar documents
Module 12. Machine Learning. Version 2 CSE IIT, Kharagpur

Artificial Neural Networks written examination

Axiom 2013 Team Description Paper

Python Machine Learning

(Sub)Gradient Descent

QuickStroke: An Incremental On-line Chinese Handwriting Recognition System

Lecture 1: Machine Learning Basics

A Neural Network GUI Tested on Text-To-Phoneme Mapping

INPE São José dos Campos

Using Deep Convolutional Neural Networks in Monte Carlo Tree Search

Challenges in Deep Reinforcement Learning. Sergey Levine UC Berkeley

OPTIMIZATINON OF TRAINING SETS FOR HEBBIAN-LEARNING- BASED CLASSIFIERS

Learning Methods for Fuzzy Systems

Forget catastrophic forgetting: AI that learns after deployment

Artificial Neural Networks

Exploration. CS : Deep Reinforcement Learning Sergey Levine

Machine Learning and Data Mining. Ensembles of Learners. Prof. Alexander Ihler

Laboratorio di Intelligenza Artificiale e Robotica

System Implementation for SemEval-2017 Task 4 Subtask A Based on Interpolated Deep Neural Networks

Course Outline. Course Grading. Where to go for help. Academic Integrity. EE-589 Introduction to Neural Networks NN 1 EE

Introduction to Ensemble Learning Featuring Successes in the Netflix Prize Competition

Deep search. Enhancing a search bar using machine learning. Ilgün Ilgün & Cedric Reichenbach

Framewise Phoneme Classification with Bidirectional LSTM and Other Neural Network Architectures

Evolutive Neural Net Fuzzy Filtering: Basic Description

Softprop: Softmax Neural Network Backpropagation Learning

Testing A Moving Target: How Do We Test Machine Learning Systems? Peter Varhol Technology Strategy Research, USA

A New Perspective on Combining GMM and DNN Frameworks for Speaker Adaptation

An OO Framework for building Intelligence and Learning properties in Software Agents

Laboratorio di Intelligenza Artificiale e Robotica

An empirical study of learning speed in backpropagation

Speaker Identification by Comparison of Smart Methods. Abstract

Autoregressive product of multi-frame predictions can improve the accuracy of hybrid models

TD(λ) and Q-Learning Based Ludo Players

Unsupervised Learning of Word Semantic Embedding using the Deep Structured Semantic Model

Test Effort Estimation Using Neural Network

Learning Structural Correspondences Across Different Linguistic Domains with Synchronous Neural Language Models

Evolution of Symbolisation in Chimpanzees and Neural Nets

Knowledge Transfer in Deep Convolutional Neural Nets

CSL465/603 - Machine Learning

HIERARCHICAL DEEP LEARNING ARCHITECTURE FOR 10K OBJECTS CLASSIFICATION

ADVANCED MACHINE LEARNING WITH PYTHON BY JOHN HEARTY DOWNLOAD EBOOK : ADVANCED MACHINE LEARNING WITH PYTHON BY JOHN HEARTY PDF

The Good Judgment Project: A large scale test of different methods of combining expert predictions

Calibration of Confidence Measures in Speech Recognition

Learning to Schedule Straight-Line Code

Model Ensemble for Click Prediction in Bing Search Ads

A Simple VQA Model with a Few Tricks and Image Features from Bottom-up Attention

A Review: Speech Recognition with Deep Learning Methods

A study of speaker adaptation for DNN-based speech synthesis

Knowledge-Based - Systems

DIRECT ADAPTATION OF HYBRID DNN/HMM MODEL FOR FAST SPEAKER ADAPTATION IN LVCSR BASED ON SPEAKER CODE

Deep Neural Network Language Models

Analysis of Hybrid Soft and Hard Computing Techniques for Forex Monitoring Systems

UNIDIRECTIONAL LONG SHORT-TERM MEMORY RECURRENT NEURAL NETWORK WITH RECURRENT OUTPUT LAYER FOR LOW-LATENCY SPEECH SYNTHESIS. Heiga Zen, Haşim Sak

*** * * * COUNCIL * * CONSEIL OFEUROPE * * * DE L'EUROPE. Proceedings of the 9th Symposium on Legal Data Processing in Europe

FF+FPG: Guiding a Policy-Gradient Planner

Classification Using ANN: A Review

The Evolution of Random Phenomena

SARDNET: A Self-Organizing Feature Map for Sequences

Second Exam: Natural Language Parsing with Neural Networks

On-the-Fly Customization of Automated Essay Scoring

A Reinforcement Learning Variant for Control Scheduling

Phonetic- and Speaker-Discriminant Features for Speaker Recognition. Research Project

Reinforcement Learning by Comparing Immediate Reward

Lecture 1: Basic Concepts of Machine Learning

MYCIN. The MYCIN Task

Automatic Pronunciation Checker

Getting Started with Deliberate Practice

arxiv: v1 [cs.lg] 15 Jun 2015

POS tagging of Chinese Buddhist texts using Recurrent Neural Networks

Semi-Supervised GMM and DNN Acoustic Model Training with Multi-system Combination and Confidence Re-calibration

ISFA2008U_120 A SCHEDULING REINFORCEMENT LEARNING ALGORITHM

Human Emotion Recognition From Speech

Predicting Student Attrition in MOOCs using Sentiment Analysis and Neural Networks

Discriminative Learning of Beam-Search Heuristics for Planning

Using focal point learning to improve human machine tacit coordination

While you are waiting... socrative.com, room number SIMLANG2016

Telekooperation Seminar

How People Learn Physics

Lecture 6: Applications

CS Machine Learning

An Introduction to Simio for Beginners

ENME 605 Advanced Control Systems, Fall 2015 Department of Mechanical Engineering

Georgetown University at TREC 2017 Dynamic Domain Track

Modeling function word errors in DNN-HMM based LVCSR systems

Truth Inference in Crowdsourcing: Is the Problem Solved?

PREDICTING SPEECH RECOGNITION CONFIDENCE USING DEEP LEARNING WITH WORD IDENTITY AND SCORE FEATURES

Proposal of Pattern Recognition as a necessary and sufficient principle to Cognitive Science

Seminar - Organic Computing

Top US Tech Talent for the Top China Tech Company

Attributed Social Network Embedding

Automatic Discretization of Actions and States in Monte-Carlo Tree Search

Time series prediction

Issues in the Mining of Heart Failure Datasets

Improvements to the Pruning Behavior of DNN Acoustic Models

A Latent Semantic Model with Convolutional-Pooling Structure for Information Retrieval

Syntactic systematicity in sentence processing with a recurrent self-organizing network

Machine Learning from Garden Path Sentences: The Application of Computational Linguistics

Distributed Learning of Multilingual DNN Feature Extractors using GPUs

Learning Methods in Multilingual Speech Recognition

WHEN THERE IS A mismatch between the acoustic

Transcription:

Learning Blackjack Anne-Marie Bausch ETH, D-MATH May 31, 2016

Table of Contents 1 2

Perceptron A perceptron is the most basic artificial neuron (developed in the 1950s and 1960s). The input X R n, w 1,..., w n R are called weights and the output Y {0, 1}. The output depends on some treshold value τ: 0, if W X = w j x j τ, j output = 1, if W X = w j x j > τ. j

Bias Next, we introduce what is known as the perceptron s bias B, B := τ. This gives us a new formula for the output, { 0, if W X + B 0, output = 1, if W X + B > 0. Example NAND gate:

Sigmoid Neuron Problem: Small change in input can change output a lot Solution: Sigmoid Neuron Input X R n [ Output ] = σ(x W + B) = (1 + exp( X W B)) 1 0, 1, where σ(z) := 1 1+exp( z) is called the sigmoid function.

Given an input X, as well as some training and testing data, we want to find a function f W,B such that f W,B : X Y, where Y denotes the output. How do we choose the weights and the bias?

Example: XOR Gate

Learning Algorithm A learning algorithm chooses weights and biases without interference of programmer. Smoothness in σ: output j δoutput δw j w j + δoutput δb B

How to update weights and bias How does the learning algorithm update the weights (and the bias)? argmin W,B f W,B (X ) Y 2 One method to do this is gradient descent Choose appropriate learning rate! Example Digit Recognition (1990s) Youtube Video

Example One image consists of 28x28 pixels which explains why the input layer has 784 neurons

3 main types of learning Supervised Learning (SL) Learning some mapping from inputs to outputs. Example: Classifying Digits Unsupervised Learning (UL) Given input and no output, what kinds of patterns can you find? Example: Visual input is at first too complex, have to reduce number of dimensions Reinforcement Learning (RL) Learning method interacts with its environment by producing actions a 1, a 2,... that produce rewards or punishments r 1, r 2,....Example: Human learning

Why was there a recent boost in the employment of neural? The evolution of neural stagnated because with more than 2 hidden layers proved to be too difficult. The main problems and solutions are: Huge amount of Data Big Data Number of weights (capacity of computers) capacity of computers improved (Parallelism, GPUs) Theoretical limits Difficult ( See next slide)

Theoretical Limits Back-propagated error signals either shrink rapidly (exponentially in the number of layers) or grow out of bounds 3 solutions: (a) unsupervised pre-training faciliates subsequent supervised credit assignment through back-propagation (1991). (b) LSTM-like (since 1997) avoid problem through special architecture. (c) Today, fast GPU-based computers allow for propagating errors a few layers further down within reasonable time

Main rules Origin: Ancient China more than 2500 years ago al: Gain the most points White gets 6.5 points for moving second Get points for territory at the end of game Get points for prisoners Stone is captured if it has no more liberties (liberties are supply chains ) Not allowed to commit suicide Ko-Rule: Not allowed to play such that game is again as before

End of Game The game is over when both players have passed consecutively Prisonners are removed and points are counted!

DeepMind was founded in 2010 as a startup in Cambridge ogle bought DeepMind for $500M in 2014 beat European champion Fan Hui (2-dan) in October 2015 beat Lee Sedol (9-dan), one of the best players in the world in March 2016 (4 out of 5 games) Victory of AI in was thought to be 10 years into the future 1920 CPUs and 280 GPUs used during match against Lee Sedol This equals around $1M without counting the electricity used for training and playing Next Game attacked by ogle DeepMind: Starcraft

Difficulty: Search space of future moves is larger than the number of particles in the known universe (MCTS)

Part 1 Multi-Layered Network Supervised-learning (SL) al: Look at board position and choose next best move (does not care about winning, just about next move) is trained on millions of example moves made by strong human players on KGS (Kiseido Server) it matches strong human players about 57% of time (mismatches arenot necessarily mistakes)

Part 2 2 additional versions of policy : A stronger move picker and a faster move picker Stronger version uses RL trained more intensively by playing game to the end (is trained by millions of training games against previous editions of itself, it does no reading, i.e., it does not try to simulate any future moves) needed for creating enough training data for value network Faster version is called rollout network does not look at entire board but at smaller window around previous move about 1000 times faster!

Multi-Layered Network Estimates probability of each player winning the game Is useful for speeding up reading: If particular position is bad, can skip any more moves along that line of play Trained on millions of example board positions which were randomly picked between two copies of s strong move-picker

MCTS accomplishes reading and exploring Full-Power system then uses all of its brains in the following way: Choose a few possible next moves using the basic move picker (stronger version made weaker!) Evaluate each next move using value network and a deeper MC simulation (called rollout, uses fast move picker) Get 2 independent guesses use parameter to combine 2 guesses (optimal parameter is 0.5)

How the strength of varies

References Mastering the game of with deep neural and tree search, Nature Volume 259, 2016 http://neuralanddeeplearning.com/chap1.html https://www.dcine.com/2016/01/28/alphago/ Wikipedia (game)