This webinar will be recorded. Please engage, use the Questions function during the presentation! MACHINE LEARNING WITH SAS SAS NORDIC FANS WEBINAR 21. MARCH 2017 Gert Nissen Technical Client Manager Georg Morsing Senior Manager Kaare Brandt Petersen Education & Academic
INTRODUCTION GETTING STARTED Agenda Introduction What is Machine Learning? Advanced Models used in Machine Learning Unstructured data Who-am-I Nordic Director, Education & Academic Ph.d. Mathematical Modelling What-about-you?
INTRODUCTION WHY IS MACHINE LEARNING HOT? 1 The Game Go machine beats the human world champion 2 Speaking Chinese when you speak English 3 Looking at pictures and understand what you see Team Alpha Go developed an algorithm beating the world champion Lee Sedol in spring 2016. Former Kaggle president Jeremy Howard presented this example in his TED Talk: Speach-to-text + translation + text to speach modulated. ImageNET example from Stanford 2014 text formed by algorithm.
INTRODUCTION WHAT IS MACHINE LEARNING? Arthur Samuel (1901-1990), USA Pioneer in computer games First self-learning program playing checkers, 1959 [Machine learning is the] field of study that gives computers the ability to learn without being explicitly programmed
INTRODUCTION THEORY VS DATA Theory of what happened Function derived from theory Theory based model fitted to data Data of what happened Function which can adapt to just about every data pattern Data driven modelling
In machine learning, data speaks louder than theory
ADVANCED MODELS A WAY TO DEAL WITH A COMPLEX REALITY
APPROACHES BE FLEXIBLE (ADAPTABLE TO MULTIPLE REALITIES)
ADVANCED MODELS OVERFITTING AND BALANCE BETWEEN FLEXIBILITY AND DATA POINTS Model complexity (flexibility) Underlying process Complex Overfitting Overfitting Good fit Fitted function Data point Overfitting Good fit Good fit Potentially good models Too simple models Simple Poor fit Poor fit Poor fit Small Large Data Amount
ADVANCED MODELS DATA PARTITIONING IS A WAY TO FIND THE BALANCE BETWEEN FLEXIBILITY AND DATA POINTS Data set 40% Training data Find the parameter values (given the flexibility) 30% Validation data Find the right level of flexibility 30% Test data Estimate performance
SOME MODELS USED IN MACHINE LEARNING K-Nearest Neighbours Decision Trees Neural Networks Support Vector Machines Flexibility controlled by the number of neighbours included, K. Flexibility controlled by the number of leaf nodes (boxes), which again is controlled by a number of options, such as performance on the validation set, minimum number of observations for splitting, etc. Flexibility typically controlled by the early stopping, that is starting from small weights corresponding to a linear model then letting these grow and change but stopping when the validation error is increasing. Flexibility controlled by the so-called kernel width; a parameter which determines a typical lenght of the data shape.
SOME MODELS USED IN MACHINE LEARNING Ensemble Learning Bagging example: Random Forests Boosting example: Adaptive Boosting Flexibility first and foremost controlled by the individual model handles, but the ensemble approach itself (the bagging) is a regularizer, so there may in fact be a need for overall flexibility adjustment this is in some case handled by the number of submodels. Flexibility controlled by the number of trees and the individual flexibility of the trees (the number of leaf-nodes of the trees). Flexibility controlled by the number of boosting steps (T).
HOW TO IN SAS MACHINE LEARNING METHODS IN SAS ENTERPRISE MINER
HOW TO IN SAS COURSE Machine Learning with SAS 2 day course Hands-on using SAS Enterprise Miner Next: Copenhagen, April 25-26 Stockholm, May 9-10
UNSTRUCTURED DATA AND DEEP LEARNING
SOUND SOME SOUND WHAT CAN YOU HEAR? This is what sound looks like for an algorithm 44,1 khz sampling 44.100 numbers per sec 3 minutes equals 7,938,000 numbers
IMAGES THE MNIST DATA SET MNIST data set Handwritten digits Famous ML benchmark data set 70.000 images 28x28 grayscale = 784 values per image Table 70.000 rows 785 columns in total (784 input + 1 target)
IMAGES THIS IS IMAGES OF HANDWRITTEN DIGITS
Images IMAGES TRADITIONAL APPROACH TO IMAGES Image no 21355: 28x28=784 values 1 2 Features 1 2 10 1 Feature extraction 21355 N 8 2 2 10 key values to represent the image content
DEEP LEARNING WHAT IS DEEP LEARNING? Geoffrey Hinton (1947-*), Godfather of Deep Learning Born in England, Lives in Canada University of Toronto [Deep] learning is a set of methods that allows a machine to be fed with raw data and to automatically discover the representations needed for detection or classification
DEEP LEARNING DEEP LEARNING OVER-SIMPLIFIED INTO ONE SLIDE 1 Unsupervised part for finding the optimal representation Input and output must match (as best possible). Then the middle layer act as a compressed representaiton of the full image 2 Supervised learning on the optimal representation = Alive
DEEP LEARNING THE CAT PROBLEM Extracting image features of a cat but cats have many forms Brutto list of 1.000.000.0000 images Amazon Mechanical Turk: * 48940 persons categorizing and sort * 15.000.000 img in 22.000 categories * 62.000 images of cats Convoluted neural networks (Hinton et al.) 24 millions nodes 140 millions parametes 15.000 million connections Source: Fei Fei Li, Director of Stanford AI & Vision Lab, TED Talk 2015
CONCLUSIONS sas.com
HOW TO IN SAS MACHINE LEARNING IN SAS VIYA (AND MANY ADVANCED METHODS COMING UP IN 2017) More info: SAS User Forum in the Nordics, May & June Source: http://video.sas.com/detail/videos/#category/videos/sas-viya-data-mining-and-machine-learning
HOW TO IN SAS COURSE Machine Learning with SAS 2 day course Hands-on using SAS Enterprise Miner Next: Copenhagen, April 25-26 Stockholm, May 9-10
SAS COMMUNITY NORDIC HTTP://COMMUNITIES.SAS.COM/NORDIC Get the presentation from today and continue your learning Join the Nordic SAS Online Community and receive regular activity updates
NORDIC WEBINAR SERIES SIGN UP AT WWW.SAS.COM/NORDIC-USERS Date Title Area January 5.1. News in SAS 9.4 M4 All February 2.2. Efficient SAS programming Programming 7.2. SAS Studio version 3.6 Programming 28.2. Calculating values and creating parameters in SAS Visual Analytics Visual Analytics March 17.3. SAS Environment Manager Administration, Data Management 21.3. Machine Learning with SAS Analytics April 20.4. News from SAS Global Forum All 26.4. Graph Builder and Maps with SAS Visual Analytics Visual Analytics May 10.5. New versions of SAS Visual Analytics Visual Analytics Note: Date and topics are preliminary. Changes can occur.