Principles of Machine Learning

Similar documents
Python Machine Learning

CS Machine Learning

Generative models and adversarial training

Rule Learning With Negation: Issues Regarding Effectiveness

Lecture 1: Machine Learning Basics

Reducing Features to Improve Bug Prediction

Model Ensemble for Click Prediction in Bing Search Ads

Module 12. Machine Learning. Version 2 CSE IIT, Kharagpur

Learning From the Past with Experiment Databases

Rule Learning with Negation: Issues Regarding Effectiveness

Assignment 1: Predicting Amazon Review Ratings

Introduction to Ensemble Learning Featuring Successes in the Netflix Prize Competition

Switchboard Language Model Improvement with Conversational Data from Gigaword

WE GAVE A LAWYER BASIC MATH SKILLS, AND YOU WON T BELIEVE WHAT HAPPENED NEXT

System Implementation for SemEval-2017 Task 4 Subtask A Based on Interpolated Deep Neural Networks

Predicting Student Attrition in MOOCs using Sentiment Analysis and Neural Networks

Indian Institute of Technology, Kanpur

Knowledge Transfer in Deep Convolutional Neural Nets

The 9 th International Scientific Conference elearning and software for Education Bucharest, April 25-26, / X

(Sub)Gradient Descent

Word Segmentation of Off-line Handwritten Documents

Calibration of Confidence Measures in Speech Recognition

Human Emotion Recognition From Speech

Axiom 2013 Team Description Paper

MOODLE 2.0 GLOSSARY TUTORIALS

Deep search. Enhancing a search bar using machine learning. Ilgün Ilgün & Cedric Reichenbach

DegreeWorks Advisor Reference Guide

Managing the Student View of the Grade Center

Artificial Neural Networks written examination

ADVANCED MACHINE LEARNING WITH PYTHON BY JOHN HEARTY DOWNLOAD EBOOK : ADVANCED MACHINE LEARNING WITH PYTHON BY JOHN HEARTY PDF

Universidade do Minho Escola de Engenharia

Semi-supervised methods of text processing, and an application to medical concept extraction. Yacine Jernite Text-as-Data series September 17.

Evolutive Neural Net Fuzzy Filtering: Basic Description

Applications of data mining algorithms to analysis of medical data

Test Effort Estimation Using Neural Network

INPE São José dos Campos

Twitter Sentiment Classification on Sanders Data using Hybrid Approach

Machine Learning from Garden Path Sentences: The Application of Computational Linguistics

Linking Task: Identifying authors and book titles in verbose queries

arxiv: v1 [cs.lg] 15 Jun 2015

Speech Recognition at ICSI: Broadcast News and beyond

Create Quiz Questions

Learning Methods for Fuzzy Systems

Modeling function word errors in DNN-HMM based LVCSR systems

Experiments with SMS Translation and Stochastic Gradient Descent in Spanish Text Author Profiling

Australian Journal of Basic and Applied Sciences

Evaluating and Comparing Classifiers: Review, Some Recommendations and Limitations

Issues in the Mining of Heart Failure Datasets

OPTIMIZATINON OF TRAINING SETS FOR HEBBIAN-LEARNING- BASED CLASSIFIERS

Analysis of Hybrid Soft and Hard Computing Techniques for Forex Monitoring Systems

Course Outline. Course Grading. Where to go for help. Academic Integrity. EE-589 Introduction to Neural Networks NN 1 EE

Your School and You. Guide for Administrators

CS 446: Machine Learning

Using dialogue context to improve parsing performance in dialogue systems

Business Analytics and Information Tech COURSE NUMBER: 33:136:494 COURSE TITLE: Data Mining and Business Intelligence

Modeling function word errors in DNN-HMM based LVCSR systems

Truth Inference in Crowdsourcing: Is the Problem Solved?

Lecture 1: Basic Concepts of Machine Learning

Softprop: Softmax Neural Network Backpropagation Learning

QuickStroke: An Incremental On-line Chinese Handwriting Recognition System

arxiv: v1 [cs.cv] 10 May 2017

Using focal point learning to improve human machine tacit coordination

arxiv: v2 [cs.cv] 30 Mar 2017

Multi-label classification via multi-target regression on data streams

Houghton Mifflin Online Assessment System Walkthrough Guide

CSL465/603 - Machine Learning

Optimizing to Arbitrary NLP Metrics using Ensemble Selection

Laboratorio di Intelligenza Artificiale e Robotica

Machine Learning and Data Mining. Ensembles of Learners. Prof. Alexander Ihler

PREDICTING SPEECH RECOGNITION CONFIDENCE USING DEEP LEARNING WITH WORD IDENTITY AND SCORE FEATURES

OCR for Arabic using SIFT Descriptors With Online Failure Prediction

A Neural Network GUI Tested on Text-To-Phoneme Mapping

A Simple VQA Model with a Few Tricks and Image Features from Bottom-up Attention

MULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question.

Defragmenting Textual Data by Leveraging the Syntactic Structure of the English Language

EDCI 699 Statistics: Content, Process, Application COURSE SYLLABUS: SPRING 2016

A Vector Space Approach for Aspect-Based Sentiment Analysis

Automatic Pronunciation Checker

Semi-Supervised GMM and DNN Acoustic Model Training with Multi-system Combination and Confidence Re-calibration

Detecting Wikipedia Vandalism using Machine Learning Notebook for PAN at CLEF 2011

Phonetic- and Speaker-Discriminant Features for Speaker Recognition. Research Project

Impact of Cluster Validity Measures on Performance of Hybrid Models Based on K-means and Decision Trees

Evolution of Symbolisation in Chimpanzees and Neural Nets

Iterative Cross-Training: An Algorithm for Learning from Unlabeled Web Pages

Training a Neural Network to Answer 8th Grade Science Questions Steven Hewitt, An Ju, Katherine Stasaski

SINGLE DOCUMENT AUTOMATIC TEXT SUMMARIZATION USING TERM FREQUENCY-INVERSE DOCUMENT FREQUENCY (TF-IDF)

Seminar - Organic Computing

A study of speaker adaptation for DNN-based speech synthesis

Learning Structural Correspondences Across Different Linguistic Domains with Synchronous Neural Language Models

Multi-tasks Deep Learning Model for classifying MRI images of AD/MCI Patients

Multivariate k-nearest Neighbor Regression for Time Series data -

I-COMPETERE: Using Applied Intelligence in search of competency gaps in software project managers.

Learning Methods in Multilingual Speech Recognition

STT 231 Test 1. Fill in the Letter of Your Choice to Each Question in the Scantron. Each question is worth 2 point.

A Decision Tree Analysis of the Transfer Student Emma Gunu, MS Research Analyst Robert M Roe, PhD Executive Director of Institutional Research and

Laboratorio di Intelligenza Artificiale e Robotica

Historical maintenance relevant information roadmap for a self-learning maintenance prediction procedural approach

Feature Selection based on Sampling and C4.5 Algorithm to Improve the Quality of Text Classification using Naïve Bayes

On the Combined Behavior of Autonomous Resource Management Agents

Exposé for a Master s Thesis

Transcription:

Principles of Machine Learning Lab 5 - Optimization-Based Machine Learning Models Overview In this lab you will explore the use of optimization-based machine learning models. Optimization-based models are powerful and widely used in machine learning. Specifically, in this lab you will investigated: Neural network models for classification. Support vector machine models for regression. What You ll Need To complete this lab, you will need the following: An Azure ML account A web browser and Internet connection The lab files for this lab Note: To set up the required environment for the lab, follow the instructions in the Setup Guide for this course. In this lab you will build on the classification experiment you created in Lab 4. If you did not complete lab 4, or if you have subsequently modified the experiment you created, you can copy a clean starting experiment to your Azure ML workspace from the Cortana Intelligence Gallery using the link for your preferred programming language below: R: https://aka.ms/edx-dat203.2x-lab5-class-r Python: https://aka.ms/edx-dat203.2x-lab5-class-py Classification with Neural Network Models Neural networks are a widely used class of machine learning models. Neural network models can be used for classification or regression. In this lab, you will perform classification of the diabetes patients using a two-class neural network model. In this exercise you will compare the performance of the neural network classifier to the Ada-boosted classifier you created in the previous lab. Create a Neural Network Model 1. In Azure ML Studio, open your Boosted Classification experiment (or the corresponding starting experiment in the Cortana Intelligence Gallery as listed above), and save it as Optimization- Based Classification.

2. Add a Two Class Neural Network module to the experiment, and place it to the right of the existing modules. 3. Configure the Two Class Neural Network module as follows: Create trainer mode: Parameter Range Hidden layer specification: Fully-connected case Number of hidden nodes: 100 Use Range Builder (2): Unchecked Learning rate: 0.01, 0.02, 0.04 Number of iterations: 20, 40, 80, 160 The initial learning weights diameter: 0.1 The momentum: 0.01 The type of normalizer: Do not normalize Shuffle examples: Checked Random number seed: 123 Allow unknown categorical levels: Checked 4. Copy the Train Model, Score Model, and Evaluate Model modules that are currently used for the Boosted Tree model, and paste the copies into the experiment under the Two Class Neural Network module. 5. Edit the comment of the new Train Model module, and change it to Neural Net. 6. Connect the output of the Two Class Neural Network module to the Untrained model (left) input of the new Neural Net Train Model module. Then connect the left output of the Split Data module to the Dataset (right) input of the new Neural Net Train Model module. 7. Connect the output of the new Neural Net Train Model module to the Trained Model (left) input of the new Score Model module. Then connect the right output of the Split Data module to the Dataset (right) input of the new Score Model module. 8. Connect the output of the new Score Model module to the Scored dataset to compare (right) input of the existing Evaluate Model module to the left input of which the Scored Model module for the Boosted Tree model is already connected. 9. Connect the output of the new Score Model module to the Scored dataset (left) input of the new Evaluate Model module. Then ensure that the bottom portion of your experiment looks like this:

Compare Model Performance 1. Save and run the experiment. 2. When your experiment has finished running, visualize the output of the Evaluate Model module that is connected to both the Boosted Tree and Neural Net models, and examine the ROC curve. The Scored dataset (Blue) curve represents the Boosted Tree model, and the Scored dataset to compare (Red) curve represents the Neural Net model. The higher and further to the left the curve, the better the performance of the model. 3. Scroll down further in the visualization and examine the Accuracy, Recall, and AUC model performance metrics, which indicate the accuracy and area under the curve of the Boosted Tree model. 4. Visualize the output of the new Evaluate Model module that is connected to only the Neural Net model, and examine the Accuracy, Recall, and AUC model performance metrics, which indicate the accuracy and area under the curve of the new two-class neural network model. Compare this with the same metrics for the boosted tree model the model with the higher metrics is performing more accurately. In particular; the lower the Recall metric, the higher the number of false negatives which in this scenario represent an undesirable situation where patients that need to be readmitted to hospital may not be identified. Support Vector Machine Classification In the previous exercise you compared the performance of a neural network classifier to an Ada-boosted classifier. In this exercise, you will apply a support vector machine classifier to the diabetes patient dataset and compare the performance to the Ada-boosted decision tree classifier. Create a Support Vector Machine Model 1. In your Optimization-Based Classification experiment add a Two Class Support Vector Machine module to the experiment, and place it to the right of the existing modules. 2. Configure the Two Class Support Vector Machine module as follows: Create trainer mode: Parameter Range Number of iterations: Use 1, 10, 100 Lambda: 0.00001, 0.0001, 0.001, 0.1 Normalize features: Unchecked Project to the unit-sphere: Unchecked Random number seed: 123 Allow unknown categorical levels: Checked 3. Copy the Train Model, Score Model, and Evaluate Model modules that are currently used for the Neural Net model, and paste the copies into the experiment under the Two Class Support Vector Machine module. 4. Edit the comment of the new Train Model module, and change it to SVM. 5. Connect the output of the Two Class Support Vector Machine module to the Untrained model (left) input of the new SVM Train Model module. Then connect the left output of the Split Data module to the Dataset (right) input of the new SVM Train Model module. 6. Connect the output of the new SVM Train Model module to the Trained Model (left) input of the new Score Model module. Then connect the right output of the Split Data module to the Dataset (right) input of the new Score Model module. 7. Connect the output of the new Score Model module to the Scored dataset to compare (right) input of the existing Evaluate Model module to the left input of which the Scored Model

module for the Boosted Tree model is already connected this will replace the connection from the Neural Net model. 8. Connect the output of the new Score Model module to the Scored dataset (left) input of the new Evaluate Model module. Then ensure that the bottom portion of your experiment looks like this: Compare Model Performance 1. Save and run the experiment. 2. When your experiment has finished running, visualize the output of the Evaluate Model module that is connected to both the Boosted Tree and SVM models, and examine the ROC curve. The Scored dataset (Blue) curve represents the Boosted Tree model, and the Scored dataset to compare (Red) curve represents the SVM model. The higher and further to the left the curve, the better the performance of the model. 3. Scroll down further in the visualization of the down and examine the Accuracy, Recall, and AUC model performance metrics, which indicate the accuracy and area under the curve of the Boosted Tree model. 4. Visualize the output of the new Evaluate Model module that is connected to only the SVM model, and examine the Accuracy, Recall, and AUC model performance metrics, which indicate the accuracy and area under the curve of the new two-class neural network model. Compare this with the same metrics for the boosted tree model the model with the higher metrics is performing more accurately. In particular; the lower the Recall metric, the higher the number of false negatives which in this scenario represent an undesirable situation where patients that need to be readmitted to hospital may not be identified. Summary In this experiment you have created and evaluated classifiers using two widely used optimization-based machine learning models: The neural network classifier. The support vector machine classifier.

Note: In this lab, you should have been able to determine the classification model type that worked best for the features and labels in the diabetes classification dataset. However, when you approach any other dataset there is no reason to believe that any particular machine learning model will have the best performance. Testing and comparing multiple machine learning models on a given problem is usually the best approach. The performance achieved with any particular machine learning model can change after performing feature engineering. After performing a feature engineering step, it is usually a good idea to test and compare several machine learning models.