Machine Learning with MATLAB Antti Löytynoja Application Engineer

Similar documents
Python Machine Learning

Lecture 1: Machine Learning Basics

Time series prediction

The 9 th International Scientific Conference elearning and software for Education Bucharest, April 25-26, / X

Impact of Cluster Validity Measures on Performance of Hybrid Models Based on K-means and Decision Trees

Speech Emotion Recognition Using Support Vector Machine

CSL465/603 - Machine Learning

CS4491/CS 7265 BIG DATA ANALYTICS INTRODUCTION TO THE COURSE. Mingon Kang, PhD Computer Science, Kennesaw State University

Generative models and adversarial training

Human Emotion Recognition From Speech

(Sub)Gradient Descent

The Method of Immersion the Problem of Comparing Technical Objects in an Expert Shell in the Class of Artificial Intelligence Algorithms

Multivariate k-nearest Neighbor Regression for Time Series data -

ADVANCED MACHINE LEARNING WITH PYTHON BY JOHN HEARTY DOWNLOAD EBOOK : ADVANCED MACHINE LEARNING WITH PYTHON BY JOHN HEARTY PDF

Introduction to Ensemble Learning Featuring Successes in the Netflix Prize Competition

Analysis of Emotion Recognition System through Speech Signal Using KNN & GMM Classifier

A Case Study: News Classification Based on Term Frequency

Business Analytics and Information Tech COURSE NUMBER: 33:136:494 COURSE TITLE: Data Mining and Business Intelligence

Module 12. Machine Learning. Version 2 CSE IIT, Kharagpur

Rule Learning With Negation: Issues Regarding Effectiveness

Learning From the Past with Experiment Databases

Lecture 1: Basic Concepts of Machine Learning

Machine Learning and Data Mining. Ensembles of Learners. Prof. Alexander Ihler

A New Perspective on Combining GMM and DNN Frameworks for Speaker Adaptation

Australian Journal of Basic and Applied Sciences

Learning Methods for Fuzzy Systems

Historical maintenance relevant information roadmap for a self-learning maintenance prediction procedural approach

Reducing Features to Improve Bug Prediction

Analysis of Hybrid Soft and Hard Computing Techniques for Forex Monitoring Systems

A Decision Tree Analysis of the Transfer Student Emma Gunu, MS Research Analyst Robert M Roe, PhD Executive Director of Institutional Research and

Rule Learning with Negation: Issues Regarding Effectiveness

Word Segmentation of Off-line Handwritten Documents

Twitter Sentiment Classification on Sanders Data using Hybrid Approach

Softprop: Softmax Neural Network Backpropagation Learning

Speech Segmentation Using Probabilistic Phonetic Feature Hierarchy and Support Vector Machines

Assignment 1: Predicting Amazon Review Ratings

QuickStroke: An Incremental On-line Chinese Handwriting Recognition System

Course Outline. Course Grading. Where to go for help. Academic Integrity. EE-589 Introduction to Neural Networks NN 1 EE

Universidade do Minho Escola de Engenharia

OPTIMIZATINON OF TRAINING SETS FOR HEBBIAN-LEARNING- BASED CLASSIFIERS

CLASSIFICATION OF TEXT DOCUMENTS USING INTEGER REPRESENTATION AND REGRESSION: AN INTEGRATED APPROACH

Purdue Data Summit Communication of Big Data Analytics. New SAT Predictive Validity Case Study

A Neural Network GUI Tested on Text-To-Phoneme Mapping

Learning Methods in Multilingual Speech Recognition

Data Fusion Through Statistical Matching

CS 446: Machine Learning

Handling Concept Drifts Using Dynamic Selection of Classifiers

Statistics and Data Analytics Minor

Comparison of EM and Two-Step Cluster Method for Mixed Data: An Application

CS Machine Learning

Applications of data mining algorithms to analysis of medical data

Artificial Neural Networks written examination

Mining Association Rules in Student s Assessment Data

A study of speaker adaptation for DNN-based speech synthesis

arxiv: v1 [cs.lg] 15 Jun 2015

Using Web Searches on Important Words to Create Background Sets for LSI Classification

arxiv: v2 [cs.cv] 30 Mar 2017

Speech Recognition at ICSI: Broadcast News and beyond

Activity Recognition from Accelerometer Data

BUAD 425 Data Analysis for Decision Making Syllabus Fall 2015

Welcome to. ECML/PKDD 2004 Community meeting

Medical Complexity: A Pragmatic Theory

Modeling function word errors in DNN-HMM based LVCSR systems

Laboratorio di Intelligenza Artificiale e Robotica

INPE São José dos Campos

LOUISIANA HIGH SCHOOL RALLY ASSOCIATION

On the Formation of Phoneme Categories in DNN Acoustic Models

Switchboard Language Model Improvement with Conversational Data from Gigaword

Semi-supervised methods of text processing, and an application to medical concept extraction. Yacine Jernite Text-as-Data series September 17.

Indian Institute of Technology, Kanpur

Evolutive Neural Net Fuzzy Filtering: Basic Description

Experiment Databases: Towards an Improved Experimental Methodology in Machine Learning

Modeling function word errors in DNN-HMM based LVCSR systems

Testing A Moving Target: How Do We Test Machine Learning Systems? Peter Varhol Technology Strategy Research, USA

ACTL5103 Stochastic Modelling For Actuaries. Course Outline Semester 2, 2014

Phonetic- and Speaker-Discriminant Features for Speaker Recognition. Research Project

Circuit Simulators: A Revolutionary E-Learning Platform

System Implementation for SemEval-2017 Task 4 Subtask A Based on Interpolated Deep Neural Networks

Bayllocator: A proactive system to predict server utilization and dynamically allocate memory resources using Bayesian networks and ballooning

The Use of Statistical, Computational and Modelling Tools in Higher Learning Institutions: A Case Study of the University of Dodoma

Large-Scale Web Page Classification. Sathi T Marath. Submitted in partial fulfilment of the requirements. for the degree of Doctor of Philosophy

Knowledge Elicitation Tool Classification. Janet E. Burge. Artificial Intelligence Research Group. Worcester Polytechnic Institute

Using the Attribute Hierarchy Method to Make Diagnostic Inferences about Examinees Cognitive Skills in Algebra on the SAT

Calibration of Confidence Measures in Speech Recognition

Autoregressive product of multi-frame predictions can improve the accuracy of hybrid models

Evolution of Symbolisation in Chimpanzees and Neural Nets

Axiom 2013 Team Description Paper

Bug triage in open source systems: a review

School of Innovative Technologies and Engineering

Iterative Cross-Training: An Algorithm for Learning from Unlabeled Web Pages

OCR for Arabic using SIFT Descriptors With Online Failure Prediction

Reduce the Failure Rate of the Screwing Process with Six Sigma Approach

*Net Perceptions, Inc West 78th Street Suite 300 Minneapolis, MN

Computational Data Analysis Techniques In Economics And Finance

OFFICE SUPPORT SPECIALIST Technical Diploma

BUILDING CONTEXT-DEPENDENT DNN ACOUSTIC MODELS USING KULLBACK-LEIBLER DIVERGENCE-BASED STATE TYING

2/15/13. POS Tagging Problem. Part-of-Speech Tagging. Example English Part-of-Speech Tagsets. More Details of the Problem. Typical Problem Cases

48 contact hours using STANDARD version of Study & Solutions Kit

SELECCIÓN DE CURSOS CAMPUS CIUDAD DE MÉXICO. Instructions for Course Selection

AUTOMATIC DETECTION OF PROLONGED FRICATIVE PHONEMES WITH THE HIDDEN MARKOV MODELS APPROACH 1. INTRODUCTION

Transcription:

Machine Learning with MATLAB Antti Löytynoja Application Engineer 2014 The MathWorks, Inc. 1

Goals Overview of machine learning Machine learning models & techniques available in MATLAB MATLAB as an interactive environment for evaluating and choosing the best algorithm 2

What is Machine Learning? 1 0.9 Algorithms and techniques used for data analytics (think data analysis) Obtain valuable information from the data 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 Why is it called learning? 0-0.1 0 0.1 0.2 0.3 0.4 0.5 0.6 Systems learn from initial training data Use resulting model (or knowledge) to predict outcomes or classes of new samples MPG Acceleration Displacement 40 20 20 10 400 200 Weight 4000 2000 Horsepower 200 150 100 50 20 40 10 20 200 400 2000 4000 50 100150200 MPG Acceleration Displacement Weight Horsepow er 3

Machine Learning Characteristics and Examples Characteristics Lots of data (many variables) System too complex to know the governing equation (e.g., black-box modeling) Examples Pattern recognition (speech, images) Financial algorithms (credit rating, algorithmic trading) AAA AA 93.68% 2.44% 5.55% 92.60% 0.59% 4.03% 0.18% 0.73% 0.00% 0.15% 0.00% 0.00% 0.00% 0.00% 0.00% 0.06% A 0.14% 4.18% 91.02% 3.90% 0.60% 0.08% 0.00% 0.08% Energy forecasting (load, price) BBB BB 0.03% 0.03% 0.23% 0.12% 7.49% 0.73% 87.86% 8.27% 3.78% 86.74% 0.39% 3.28% 0.06% 0.18% 0.16% 0.64% Biology (tumour detection, drug discovery) B CCC D 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.11% 0.82% 9.64% 85.37% 2.41% 0.00% 0.37% 1.84% 6.24% 81.88% 0.00% 0.00% 0.00% 0.00% 0.00% 1.64% 9.67% 100.00% AAA AA A BBB BB B CCC D 4

Challenges Machine Learning Significant technical expertise required No one size fits all solution Locked into Black Box solutions Time required to conduct the analysis 5

Overview Machine Learning Type of Learning Categories of Algorithms Unsupervised Learning Clustering Machine Learning Group and interpret data based only on input data Supervised Learning Develop predictive model based on both input and output data Classification Regression 6

Unsupervised Learning k-means, Fuzzy C-Means Hierarchical Clustering Neural Networks Gaussian Mixture Hidden Markov Model 7

Supervised Learning Regression Neural Networks Decision Trees Ensemble Methods Non-linear Reg. (GLM, Logistic) Linear Regression Classification Support Vector Machines Discriminant Analysis Naive Bayes Nearest Neighbor 8

Supervised Learning - Workflow Speed up Computations Select Model Data Train the Model Use for Prediction Import Data Explore Data Prepare Data Known data Known responses Model Model New Data Predicted Responses Measure Accuracy 9

Demo Bank Marketing Campaign Goal: Predict if customer would subscribe to bank term deposit based on different attributes 100 Bank Marketing Campaign Misclassification Rate Approach: Import historical data Divide data into training and testing sets Percentage 90 80 70 60 50 40 No Misclassified Yes Misclassified Train a classifier using different models 30 20 Measure accuracy and compare models 10 0 Neural Net Logistic Regression Discriminant Analysis k-nearest Neighbors Naive Bayes Support VM Decision Trees TreeBagger Reduced TB Data set downloaded from UCI Machine Learning repository http://archive.ics.uci.edu/ml/datasets/bank+marketing 11

Demo Bank Marketing Campaign Numerous predictive models with rich documentation Also available: decision trees, neural networks, naïve Bayes etc. Interactive visualizations and apps to aid discovery Quick prototyping; Focus on modeling not programming There s more 100 90 80 70 Bank Marketing Campaign Misclassification Rate Methods to simplify model Percentage 60 50 40 No Misclassified Yes Misclassified 30 20 Integrate algorithms into enterprise applications 10 0 Neural Net Logistic Regression Discriminant Analysis k-nearest Neighbors Naive Bayes Support VM Decision Trees TreeBagger Reduced TB 12

Clustering What MATLAB has to offer Numerous clustering functions with rich documentation Hierarchial, k-means, Gaussian Mixture, Hidden Markov Interactive visualizations to aid discovery Automatically determine the correct number of clusters (R2013b): evalclusters Viewable source; not a black box Data Point # Data Point # 500 1000 1500 2000 2500 3000 3500 4000 500 1000 1500 2000 2500 3000 Hierarchical Clustering 1000 2000 3000 4000 k-means Clustering 1.8 1.6 1.4 1.2 1 0.8 0.6 0.4 0.2 0.8 0.7 0.6 0.5 0.4 0.3 0.2 Dist Metric:spearman Dist Metric:cosine 3500 0.1 Rapid exploration & development 4000 1000 2000 3000 4000 Data Point # 0 14

Learn More : Machine Learning with MATLAB http://www.mathworks.com/discovery /machine-learning.html Data Driven Fitting with MATLAB Classification with MATLAB Regression with MATLAB Multivariate Classification in the Life Sciences Electricity Load and Price Forecasting Credit Risk Modeling with MATLAB 15

Questions and answers 16