MACHINE LEARNING WITH SAS

Similar documents
Python Machine Learning

Introduction to Ensemble Learning Featuring Successes in the Netflix Prize Competition

System Implementation for SemEval-2017 Task 4 Subtask A Based on Interpolated Deep Neural Networks

CS Machine Learning

Lecture 1: Basic Concepts of Machine Learning

Lecture 1: Machine Learning Basics

Generative models and adversarial training

(Sub)Gradient Descent

Assignment 1: Predicting Amazon Review Ratings

QuickStroke: An Incremental On-line Chinese Handwriting Recognition System

Module 12. Machine Learning. Version 2 CSE IIT, Kharagpur

Deep search. Enhancing a search bar using machine learning. Ilgün Ilgün & Cedric Reichenbach

CSL465/603 - Machine Learning

ADVANCED MACHINE LEARNING WITH PYTHON BY JOHN HEARTY DOWNLOAD EBOOK : ADVANCED MACHINE LEARNING WITH PYTHON BY JOHN HEARTY PDF

Machine Learning and Data Mining. Ensembles of Learners. Prof. Alexander Ihler

ScienceDirect. A Framework for Clustering Cardiac Patient s Records Using Unsupervised Learning Techniques

Learning From the Past with Experiment Databases

arxiv: v1 [cs.lg] 15 Jun 2015

Analyzing sentiments in tweets for Tesla Model 3 using SAS Enterprise Miner and SAS Sentiment Analysis Studio

A New Perspective on Combining GMM and DNN Frameworks for Speaker Adaptation

On-Line Data Analytics

TextGraphs: Graph-based algorithms for Natural Language Processing

Axiom 2013 Team Description Paper

Word Segmentation of Off-line Handwritten Documents

Model Ensemble for Click Prediction in Bing Search Ads

A study of speaker adaptation for DNN-based speech synthesis

Universidade do Minho Escola de Engenharia

Laboratorio di Intelligenza Artificiale e Robotica

A Case Study: News Classification Based on Term Frequency

Linking Task: Identifying authors and book titles in verbose queries

The 9 th International Scientific Conference elearning and software for Education Bucharest, April 25-26, / X

Laboratorio di Intelligenza Artificiale e Robotica

Business Analytics and Information Tech COURSE NUMBER: 33:136:494 COURSE TITLE: Data Mining and Business Intelligence

OCR for Arabic using SIFT Descriptors With Online Failure Prediction

Welcome to. ECML/PKDD 2004 Community meeting

Softprop: Softmax Neural Network Backpropagation Learning

Predicting Student Attrition in MOOCs using Sentiment Analysis and Neural Networks

Rule discovery in Web-based educational systems using Grammar-Based Genetic Programming

Learning Methods for Fuzzy Systems

Learning Methods in Multilingual Speech Recognition

Probabilistic Latent Semantic Analysis

HIERARCHICAL DEEP LEARNING ARCHITECTURE FOR 10K OBJECTS CLASSIFICATION

An OO Framework for building Intelligence and Learning properties in Software Agents

Historical maintenance relevant information roadmap for a self-learning maintenance prediction procedural approach

A Simple VQA Model with a Few Tricks and Image Features from Bottom-up Attention

TD(λ) and Q-Learning Based Ludo Players

Artificial Neural Networks written examination

Postprint.

arxiv: v2 [cs.cv] 30 Mar 2017

Machine Learning from Garden Path Sentences: The Application of Computational Linguistics

Computerized Adaptive Psychological Testing A Personalisation Perspective

MULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question.

Using Web Searches on Important Words to Create Background Sets for LSI Classification

CS 1103 Computer Science I Honors. Fall Instructor Muller. Syllabus

Knowledge Transfer in Deep Convolutional Neural Nets

THE enormous growth of unstructured data, including

Speech Emotion Recognition Using Support Vector Machine

Switchboard Language Model Improvement with Conversational Data from Gigaword

MYCIN. The MYCIN Task

Unsupervised Learning of Word Semantic Embedding using the Deep Structured Semantic Model

OPTIMIZATINON OF TRAINING SETS FOR HEBBIAN-LEARNING- BASED CLASSIFIERS

Testing A Moving Target: How Do We Test Machine Learning Systems? Peter Varhol Technology Strategy Research, USA

CLASSIFICATION OF TEXT DOCUMENTS USING INTEGER REPRESENTATION AND REGRESSION: AN INTEGRATED APPROACH

Issues in the Mining of Heart Failure Datasets

Chinese Language Parsing with Maximum-Entropy-Inspired Parser

AUTOMATIC DETECTION OF PROLONGED FRICATIVE PHONEMES WITH THE HIDDEN MARKOV MODELS APPROACH 1. INTRODUCTION

SINGLE DOCUMENT AUTOMATIC TEXT SUMMARIZATION USING TERM FREQUENCY-INVERSE DOCUMENT FREQUENCY (TF-IDF)

Twitter Sentiment Classification on Sanders Data using Hybrid Approach

Impact of Cluster Validity Measures on Performance of Hybrid Models Based on K-means and Decision Trees

Exploration. CS : Deep Reinforcement Learning Sergey Levine

Course Outline. Course Grading. Where to go for help. Academic Integrity. EE-589 Introduction to Neural Networks NN 1 EE

WHEN THERE IS A mismatch between the acoustic

arxiv: v1 [cs.cv] 10 May 2017

Rule Learning With Negation: Issues Regarding Effectiveness

Semantic Segmentation with Histological Image Data: Cancer Cell vs. Stroma

Human Emotion Recognition From Speech

Purdue Data Summit Communication of Big Data Analytics. New SAT Predictive Validity Case Study

Introduction to Simulation

Discriminative Learning of Beam-Search Heuristics for Planning

A Vector Space Approach for Aspect-Based Sentiment Analysis

Memory-based grammatical error correction

Top US Tech Talent for the Top China Tech Company

Semi-supervised methods of text processing, and an application to medical concept extraction. Yacine Jernite Text-as-Data series September 17.

Taxonomy-Regularized Semantic Deep Convolutional Neural Networks

Improvements to the Pruning Behavior of DNN Acoustic Models

Training a Neural Network to Answer 8th Grade Science Questions Steven Hewitt, An Ju, Katherine Stasaski

Speech Recognition at ICSI: Broadcast News and beyond

Why Did My Detector Do That?!

Multivariate k-nearest Neighbor Regression for Time Series data -

Attributed Social Network Embedding

ADVANCES IN DEEP NEURAL NETWORK APPROACHES TO SPEAKER RECOGNITION

Reducing Features to Improve Bug Prediction

Autoregressive product of multi-frame predictions can improve the accuracy of hybrid models

A Neural Network GUI Tested on Text-To-Phoneme Mapping

Text-mining the Estonian National Electronic Health Record

Using the Attribute Hierarchy Method to Make Diagnostic Inferences about Examinees Cognitive Skills in Algebra on the SAT

Analysis of Emotion Recognition System through Speech Signal Using KNN & GMM Classifier

CS4491/CS 7265 BIG DATA ANALYTICS INTRODUCTION TO THE COURSE. Mingon Kang, PhD Computer Science, Kennesaw State University

University of Groningen. Systemen, planning, netwerken Bosman, Aart

Second Exam: Natural Language Parsing with Neural Networks

Transcription:

This webinar will be recorded. Please engage, use the Questions function during the presentation! MACHINE LEARNING WITH SAS SAS NORDIC FANS WEBINAR 21. MARCH 2017 Gert Nissen Technical Client Manager Georg Morsing Senior Manager Kaare Brandt Petersen Education & Academic

INTRODUCTION GETTING STARTED Agenda Introduction What is Machine Learning? Advanced Models used in Machine Learning Unstructured data Who-am-I Nordic Director, Education & Academic Ph.d. Mathematical Modelling What-about-you?

INTRODUCTION WHY IS MACHINE LEARNING HOT? 1 The Game Go machine beats the human world champion 2 Speaking Chinese when you speak English 3 Looking at pictures and understand what you see Team Alpha Go developed an algorithm beating the world champion Lee Sedol in spring 2016. Former Kaggle president Jeremy Howard presented this example in his TED Talk: Speach-to-text + translation + text to speach modulated. ImageNET example from Stanford 2014 text formed by algorithm.

INTRODUCTION WHAT IS MACHINE LEARNING? Arthur Samuel (1901-1990), USA Pioneer in computer games First self-learning program playing checkers, 1959 [Machine learning is the] field of study that gives computers the ability to learn without being explicitly programmed

INTRODUCTION THEORY VS DATA Theory of what happened Function derived from theory Theory based model fitted to data Data of what happened Function which can adapt to just about every data pattern Data driven modelling

In machine learning, data speaks louder than theory

ADVANCED MODELS A WAY TO DEAL WITH A COMPLEX REALITY

APPROACHES BE FLEXIBLE (ADAPTABLE TO MULTIPLE REALITIES)

ADVANCED MODELS OVERFITTING AND BALANCE BETWEEN FLEXIBILITY AND DATA POINTS Model complexity (flexibility) Underlying process Complex Overfitting Overfitting Good fit Fitted function Data point Overfitting Good fit Good fit Potentially good models Too simple models Simple Poor fit Poor fit Poor fit Small Large Data Amount

ADVANCED MODELS DATA PARTITIONING IS A WAY TO FIND THE BALANCE BETWEEN FLEXIBILITY AND DATA POINTS Data set 40% Training data Find the parameter values (given the flexibility) 30% Validation data Find the right level of flexibility 30% Test data Estimate performance

SOME MODELS USED IN MACHINE LEARNING K-Nearest Neighbours Decision Trees Neural Networks Support Vector Machines Flexibility controlled by the number of neighbours included, K. Flexibility controlled by the number of leaf nodes (boxes), which again is controlled by a number of options, such as performance on the validation set, minimum number of observations for splitting, etc. Flexibility typically controlled by the early stopping, that is starting from small weights corresponding to a linear model then letting these grow and change but stopping when the validation error is increasing. Flexibility controlled by the so-called kernel width; a parameter which determines a typical lenght of the data shape.

SOME MODELS USED IN MACHINE LEARNING Ensemble Learning Bagging example: Random Forests Boosting example: Adaptive Boosting Flexibility first and foremost controlled by the individual model handles, but the ensemble approach itself (the bagging) is a regularizer, so there may in fact be a need for overall flexibility adjustment this is in some case handled by the number of submodels. Flexibility controlled by the number of trees and the individual flexibility of the trees (the number of leaf-nodes of the trees). Flexibility controlled by the number of boosting steps (T).

HOW TO IN SAS MACHINE LEARNING METHODS IN SAS ENTERPRISE MINER

HOW TO IN SAS COURSE Machine Learning with SAS 2 day course Hands-on using SAS Enterprise Miner Next: Copenhagen, April 25-26 Stockholm, May 9-10

UNSTRUCTURED DATA AND DEEP LEARNING

SOUND SOME SOUND WHAT CAN YOU HEAR? This is what sound looks like for an algorithm 44,1 khz sampling 44.100 numbers per sec 3 minutes equals 7,938,000 numbers

IMAGES THE MNIST DATA SET MNIST data set Handwritten digits Famous ML benchmark data set 70.000 images 28x28 grayscale = 784 values per image Table 70.000 rows 785 columns in total (784 input + 1 target)

IMAGES THIS IS IMAGES OF HANDWRITTEN DIGITS

Images IMAGES TRADITIONAL APPROACH TO IMAGES Image no 21355: 28x28=784 values 1 2 Features 1 2 10 1 Feature extraction 21355 N 8 2 2 10 key values to represent the image content

DEEP LEARNING WHAT IS DEEP LEARNING? Geoffrey Hinton (1947-*), Godfather of Deep Learning Born in England, Lives in Canada University of Toronto [Deep] learning is a set of methods that allows a machine to be fed with raw data and to automatically discover the representations needed for detection or classification

DEEP LEARNING DEEP LEARNING OVER-SIMPLIFIED INTO ONE SLIDE 1 Unsupervised part for finding the optimal representation Input and output must match (as best possible). Then the middle layer act as a compressed representaiton of the full image 2 Supervised learning on the optimal representation = Alive

DEEP LEARNING THE CAT PROBLEM Extracting image features of a cat but cats have many forms Brutto list of 1.000.000.0000 images Amazon Mechanical Turk: * 48940 persons categorizing and sort * 15.000.000 img in 22.000 categories * 62.000 images of cats Convoluted neural networks (Hinton et al.) 24 millions nodes 140 millions parametes 15.000 million connections Source: Fei Fei Li, Director of Stanford AI & Vision Lab, TED Talk 2015

CONCLUSIONS sas.com

HOW TO IN SAS MACHINE LEARNING IN SAS VIYA (AND MANY ADVANCED METHODS COMING UP IN 2017) More info: SAS User Forum in the Nordics, May & June Source: http://video.sas.com/detail/videos/#category/videos/sas-viya-data-mining-and-machine-learning

HOW TO IN SAS COURSE Machine Learning with SAS 2 day course Hands-on using SAS Enterprise Miner Next: Copenhagen, April 25-26 Stockholm, May 9-10

SAS COMMUNITY NORDIC HTTP://COMMUNITIES.SAS.COM/NORDIC Get the presentation from today and continue your learning Join the Nordic SAS Online Community and receive regular activity updates

NORDIC WEBINAR SERIES SIGN UP AT WWW.SAS.COM/NORDIC-USERS Date Title Area January 5.1. News in SAS 9.4 M4 All February 2.2. Efficient SAS programming Programming 7.2. SAS Studio version 3.6 Programming 28.2. Calculating values and creating parameters in SAS Visual Analytics Visual Analytics March 17.3. SAS Environment Manager Administration, Data Management 21.3. Machine Learning with SAS Analytics April 20.4. News from SAS Global Forum All 26.4. Graph Builder and Maps with SAS Visual Analytics Visual Analytics May 10.5. New versions of SAS Visual Analytics Visual Analytics Note: Date and topics are preliminary. Changes can occur.