Machine Learning with MATLAB Antti Löytynoja Application Engineer 2014 The MathWorks, Inc. 1
Goals Overview of machine learning Machine learning models & techniques available in MATLAB MATLAB as an interactive environment for evaluating and choosing the best algorithm 2
What is Machine Learning? 1 0.9 Algorithms and techniques used for data analytics (think data analysis) Obtain valuable information from the data 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 Why is it called learning? 0-0.1 0 0.1 0.2 0.3 0.4 0.5 0.6 Systems learn from initial training data Use resulting model (or knowledge) to predict outcomes or classes of new samples MPG Acceleration Displacement 40 20 20 10 400 200 Weight 4000 2000 Horsepower 200 150 100 50 20 40 10 20 200 400 2000 4000 50 100150200 MPG Acceleration Displacement Weight Horsepow er 3
Machine Learning Characteristics and Examples Characteristics Lots of data (many variables) System too complex to know the governing equation (e.g., black-box modeling) Examples Pattern recognition (speech, images) Financial algorithms (credit rating, algorithmic trading) AAA AA 93.68% 2.44% 5.55% 92.60% 0.59% 4.03% 0.18% 0.73% 0.00% 0.15% 0.00% 0.00% 0.00% 0.00% 0.00% 0.06% A 0.14% 4.18% 91.02% 3.90% 0.60% 0.08% 0.00% 0.08% Energy forecasting (load, price) BBB BB 0.03% 0.03% 0.23% 0.12% 7.49% 0.73% 87.86% 8.27% 3.78% 86.74% 0.39% 3.28% 0.06% 0.18% 0.16% 0.64% Biology (tumour detection, drug discovery) B CCC D 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.11% 0.82% 9.64% 85.37% 2.41% 0.00% 0.37% 1.84% 6.24% 81.88% 0.00% 0.00% 0.00% 0.00% 0.00% 1.64% 9.67% 100.00% AAA AA A BBB BB B CCC D 4
Challenges Machine Learning Significant technical expertise required No one size fits all solution Locked into Black Box solutions Time required to conduct the analysis 5
Overview Machine Learning Type of Learning Categories of Algorithms Unsupervised Learning Clustering Machine Learning Group and interpret data based only on input data Supervised Learning Develop predictive model based on both input and output data Classification Regression 6
Unsupervised Learning k-means, Fuzzy C-Means Hierarchical Clustering Neural Networks Gaussian Mixture Hidden Markov Model 7
Supervised Learning Regression Neural Networks Decision Trees Ensemble Methods Non-linear Reg. (GLM, Logistic) Linear Regression Classification Support Vector Machines Discriminant Analysis Naive Bayes Nearest Neighbor 8
Supervised Learning - Workflow Speed up Computations Select Model Data Train the Model Use for Prediction Import Data Explore Data Prepare Data Known data Known responses Model Model New Data Predicted Responses Measure Accuracy 9
Demo Bank Marketing Campaign Goal: Predict if customer would subscribe to bank term deposit based on different attributes 100 Bank Marketing Campaign Misclassification Rate Approach: Import historical data Divide data into training and testing sets Percentage 90 80 70 60 50 40 No Misclassified Yes Misclassified Train a classifier using different models 30 20 Measure accuracy and compare models 10 0 Neural Net Logistic Regression Discriminant Analysis k-nearest Neighbors Naive Bayes Support VM Decision Trees TreeBagger Reduced TB Data set downloaded from UCI Machine Learning repository http://archive.ics.uci.edu/ml/datasets/bank+marketing 11
Demo Bank Marketing Campaign Numerous predictive models with rich documentation Also available: decision trees, neural networks, naïve Bayes etc. Interactive visualizations and apps to aid discovery Quick prototyping; Focus on modeling not programming There s more 100 90 80 70 Bank Marketing Campaign Misclassification Rate Methods to simplify model Percentage 60 50 40 No Misclassified Yes Misclassified 30 20 Integrate algorithms into enterprise applications 10 0 Neural Net Logistic Regression Discriminant Analysis k-nearest Neighbors Naive Bayes Support VM Decision Trees TreeBagger Reduced TB 12
Clustering What MATLAB has to offer Numerous clustering functions with rich documentation Hierarchial, k-means, Gaussian Mixture, Hidden Markov Interactive visualizations to aid discovery Automatically determine the correct number of clusters (R2013b): evalclusters Viewable source; not a black box Data Point # Data Point # 500 1000 1500 2000 2500 3000 3500 4000 500 1000 1500 2000 2500 3000 Hierarchical Clustering 1000 2000 3000 4000 k-means Clustering 1.8 1.6 1.4 1.2 1 0.8 0.6 0.4 0.2 0.8 0.7 0.6 0.5 0.4 0.3 0.2 Dist Metric:spearman Dist Metric:cosine 3500 0.1 Rapid exploration & development 4000 1000 2000 3000 4000 Data Point # 0 14
Learn More : Machine Learning with MATLAB http://www.mathworks.com/discovery /machine-learning.html Data Driven Fitting with MATLAB Classification with MATLAB Regression with MATLAB Multivariate Classification in the Life Sciences Electricity Load and Price Forecasting Credit Risk Modeling with MATLAB 15
Questions and answers 16