Machine Learning & Business Value. By Kush Patel, Data Scientist Resident at Galvanize

Similar documents
Python Machine Learning

Introduction to Ensemble Learning Featuring Successes in the Netflix Prize Competition

CS Machine Learning

Machine Learning and Data Mining. Ensembles of Learners. Prof. Alexander Ihler

Lecture 1: Machine Learning Basics

Learning From the Past with Experiment Databases

(Sub)Gradient Descent

Twitter Sentiment Classification on Sanders Data using Hybrid Approach

Universidade do Minho Escola de Engenharia

Module 12. Machine Learning. Version 2 CSE IIT, Kharagpur

Probabilistic Latent Semantic Analysis

Australian Journal of Basic and Applied Sciences

Evaluating and Comparing Classifiers: Review, Some Recommendations and Limitations

Business Analytics and Information Tech COURSE NUMBER: 33:136:494 COURSE TITLE: Data Mining and Business Intelligence

CSL465/603 - Machine Learning

Issues in the Mining of Heart Failure Datasets

The 9 th International Scientific Conference elearning and software for Education Bucharest, April 25-26, / X

Probability and Statistics Curriculum Pacing Guide

Assignment 1: Predicting Amazon Review Ratings

Reducing Features to Improve Bug Prediction

FRAMEWORK FOR IDENTIFYING THE MOST LIKELY SUCCESSFUL UNDERPRIVILEGED TERTIARY STUDY BURSARY APPLICANTS

Word Segmentation of Off-line Handwritten Documents

Rule Learning With Negation: Issues Regarding Effectiveness

Axiom 2013 Team Description Paper

OPTIMIZATINON OF TRAINING SETS FOR HEBBIAN-LEARNING- BASED CLASSIFIERS

Softprop: Softmax Neural Network Backpropagation Learning

Applications of data mining algorithms to analysis of medical data

Large-Scale Web Page Classification. Sathi T Marath. Submitted in partial fulfilment of the requirements. for the degree of Doctor of Philosophy

Lecture 1: Basic Concepts of Machine Learning

THE ROLE OF DECISION TREES IN NATURAL LANGUAGE PROCESSING

Multivariate k-nearest Neighbor Regression for Time Series data -

Switchboard Language Model Improvement with Conversational Data from Gigaword

Human Emotion Recognition From Speech

A Case Study: News Classification Based on Term Frequency

Impact of Cluster Validity Measures on Performance of Hybrid Models Based on K-means and Decision Trees

System Implementation for SemEval-2017 Task 4 Subtask A Based on Interpolated Deep Neural Networks

Language Acquisition Fall 2010/Winter Lexical Categories. Afra Alishahi, Heiner Drenhaus

For Jury Evaluation. The Road to Enlightenment: Generating Insight and Predicting Consumer Actions in Digital Markets

ScienceDirect. A Framework for Clustering Cardiac Patient s Records Using Unsupervised Learning Techniques

Evolutive Neural Net Fuzzy Filtering: Basic Description

Speech Emotion Recognition Using Support Vector Machine

Time series prediction

Predicting Student Attrition in MOOCs using Sentiment Analysis and Neural Networks

Learning Methods in Multilingual Speech Recognition

Historical maintenance relevant information roadmap for a self-learning maintenance prediction procedural approach

A study of speaker adaptation for DNN-based speech synthesis

Rule Learning with Negation: Issues Regarding Effectiveness

Model Ensemble for Click Prediction in Bing Search Ads

A Decision Tree Analysis of the Transfer Student Emma Gunu, MS Research Analyst Robert M Roe, PhD Executive Director of Institutional Research and

Disambiguation of Thai Personal Name from Online News Articles

Machine Learning from Garden Path Sentences: The Application of Computational Linguistics

Semi-Supervised Face Detection

Seminar - Organic Computing

Purdue Data Summit Communication of Big Data Analytics. New SAT Predictive Validity Case Study

Using Web Searches on Important Words to Create Background Sets for LSI Classification

Why Did My Detector Do That?!

Indian Institute of Technology, Kanpur

A survey of multi-view machine learning

Activity Recognition from Accelerometer Data

Notes on The Sciences of the Artificial Adapted from a shorter document written for course (Deciding What to Design) 1

Continual Curiosity-Driven Skill Acquisition from High-Dimensional Video Inputs for Humanoid Robots

Optimizing to Arbitrary NLP Metrics using Ensemble Selection

Linking Task: Identifying authors and book titles in verbose queries

The Boosting Approach to Machine Learning An Overview

University of Groningen. Systemen, planning, netwerken Bosman, Aart

Ensemble Technique Utilization for Indonesian Dependency Parser

Truth Inference in Crowdsourcing: Is the Problem Solved?

Using dialogue context to improve parsing performance in dialogue systems

Comparison of EM and Two-Step Cluster Method for Mixed Data: An Application

Artificial Neural Networks written examination

Evolution of Symbolisation in Chimpanzees and Neural Nets

Data Fusion Through Statistical Matching

Exploration. CS : Deep Reinforcement Learning Sergey Levine

Algebra 1, Quarter 3, Unit 3.1. Line of Best Fit. Overview

A New Perspective on Combining GMM and DNN Frameworks for Speaker Adaptation

Hierarchical Linear Models I: Introduction ICPSR 2015

Universityy. The content of

Active Learning. Yingyu Liang Computer Sciences 760 Fall

ADVANCED MACHINE LEARNING WITH PYTHON BY JOHN HEARTY DOWNLOAD EBOOK : ADVANCED MACHINE LEARNING WITH PYTHON BY JOHN HEARTY PDF

arxiv: v2 [cs.cv] 30 Mar 2017

A Biological Signal-Based Stress Monitoring Framework for Children Using Wearable Devices

Learning Methods for Fuzzy Systems

Detailed course syllabus

Using EEG to Improve Massive Open Online Courses Feedback Interaction

*Net Perceptions, Inc West 78th Street Suite 300 Minneapolis, MN

Predicting Students Performance with SimStudent: Learning Cognitive Skills from Observation

An Empirical Comparison of Supervised Ensemble Learning Approaches

CLASSIFICATION OF TEXT DOCUMENTS USING INTEGER REPRESENTATION AND REGRESSION: AN INTEGRATED APPROACH

Courses in English. Application Development Technology. Artificial Intelligence. 2017/18 Spring Semester. Database access

Cross-lingual Short-Text Document Classification for Facebook Comments

CROSS-LANGUAGE INFORMATION RETRIEVAL USING PARAFAC2

Edexcel GCSE. Statistics 1389 Paper 1H. June Mark Scheme. Statistics Edexcel GCSE

Probability estimates in a scenario tree

Evaluation of ecodriving performances and teaching method: comparing training and simple advice

Feature Selection based on Sampling and C4.5 Algorithm to Improve the Quality of Text Classification using Naïve Bayes

A Neural Network GUI Tested on Text-To-Phoneme Mapping

Chapters 1-5 Cumulative Assessment AP Statistics November 2008 Gillespie, Block 4

Multi-Lingual Text Leveling

Article A Novel, Gradient Boosting Framework for Sentiment Analysis in Languages where NLP Resources Are Not Plentiful: A Case Study for Modern Greek

Detecting Wikipedia Vandalism using Machine Learning Notebook for PAN at CLEF 2011

Transcription:

Machine Learning & Business Value By Kush Patel, Data Scientist Resident at Galvanize

Outline Machine Learning Supervised vs Unsupervised Linear regression Decision Tree Classifier Random Forest Classifier Cost Benefit matrix ROC Curve Profit Curves

Machine Learning

Machine learning focuses on the development of computer programs that can teach themselves to grow and change when exposed to new data. Machine learning is a subfield of computer science that evolved from the study of pattern recognition and computational learning theory in artificial intelligence

Machine Learning Technique Supervised Machine Learning: Artificial neural network Random Forests Boosting Naive bayes classifier Support vector machines (SVM) Nearest Neighbor Algorithm Unsupervised Machine Learning: Clustering (K-mean, hierarchical clustering) Blind Signal Separation Technique (PCA, SVD, NMF)

Simple Linear Regression

Definition Population: The entire pool from which a statistical sample is drawn. Sample: A group drawn from a larger population and used to estimate the characteristics of the whole population. Training Set: The sample which used to train model. Testing Set: The sample which used to evaluate model

Assumptions 1. 2. 3. 4. 5. Linearity Constant Variance Independence of errors Normality of Errors Lack of multicollinearity

Simple Linear Regression β0 is intercept -- constant β1is intercept -- constant e is error term

Simple Linear Regression For population: Y = β0+ β1x + e For sample: ŷ = estimated(β0 ) + estimated(β1)* x where: ŷ is indicate prediction of Y when X = x ŷ is estimation of Y

Evaluation

R2 -- useful? Alternatives: - Use train/test to evaluate model

Linear Regression Benefit Easy to interpret Computationally cheap to predict Computationally cheap to train Linear regression implements a statistical model that, when relationships between the independent variables and the dependent variable are almost linear, shows optimal results. Disadvantage: Linear regression is often inappropriately used to model non-linear relationships. Linear regression is limited to predicting numeric output. -- logistic regression

Decision Tree

Decision Tree Target Independent variable Gini impurity Information Gain

Tradeoffs of Decision Tree Pros: - Easily Interpretable - Handles missing value and outliers - Find more complex interaction - Computationally cheap to predict - Can handle irrelevant features - Mix data cons: - Computationally expensive to train - Greedy algorithm - Very easy to overfit

Regularization - Maximum Depth of tree Minimum sample split Minimum sample at leaf Maximum leaf node

Random Forest

Definitions Bootstrap: can refer to any test or metric that relies on random sampling with replacement. (each random sample contains ⅔ of population ) Ensemble method: A technique for combining many weak learners in an attempt to produce a strong learner Example: 5 completely independent classifier with accuracy of 70% for each. Majority vote accuracy is 83.7%

How to build Random Forest CreateRandomForest(data, num_trees, num_features): Repeat num_trees times: Create a random sample of the test data with replacement Build a decision tree with that sample (only consider num_features features at each node) Return the list of the decision trees created

Tradeoffs of Random Forest Pros: - Handles missing value and outliers - Find more complex interaction - Computationally cheap to predict - Can handle irrelevant features - Mix data - Better accuracy - One of best out of box algorithms - Easy to Parallelize - It runs efficiently on large databases Cons: - Can overfit - Feature importance toward Continuous / categorical variable

Business Value

Confusion Matrix (TP) (FN) (FP) (TN)

Sensitivity & Specificity Sensitivity (also called the true positive rate, or the recall in some fields) measures the proportion of positives that are correctly identified as such (e.g., the percentage of sick people who are correctly identified as having the condition). Sensitivity = TP/P = TP/(TP + FN) Specificity (also called the true negative rate) measures the proportion of negatives that are correctly identified as such (e.g., the percentage of healthy people who are correctly identified as not having the condition). Specificity = TN/N = TN/(TN + FP)

Receiver Operating Characteristic

Matrix Of Probability

Cost-Benefit Matrix

Expected Profit

Profit Curve

Questions???