Statistics vs Machine Learning ASC 15 th November 2018

Similar documents
Python Machine Learning

Lecture 1: Machine Learning Basics

Introduction to Ensemble Learning Featuring Successes in the Netflix Prize Competition

(Sub)Gradient Descent

Module 12. Machine Learning. Version 2 CSE IIT, Kharagpur

CS Machine Learning

Artificial Neural Networks written examination

Learning From the Past with Experiment Databases

Probability and Statistics Curriculum Pacing Guide

STA 225: Introductory Statistics (CT)

Lecture 1: Basic Concepts of Machine Learning

Machine Learning and Data Mining. Ensembles of Learners. Prof. Alexander Ihler

Issues in the Mining of Heart Failure Datasets

Twitter Sentiment Classification on Sanders Data using Hybrid Approach

System Implementation for SemEval-2017 Task 4 Subtask A Based on Interpolated Deep Neural Networks

Rule Learning With Negation: Issues Regarding Effectiveness

Course Outline. Course Grading. Where to go for help. Academic Integrity. EE-589 Introduction to Neural Networks NN 1 EE

12- A whirlwind tour of statistics

The 9 th International Scientific Conference elearning and software for Education Bucharest, April 25-26, / X

Generative models and adversarial training

Impact of Cluster Validity Measures on Performance of Hybrid Models Based on K-means and Decision Trees

CS4491/CS 7265 BIG DATA ANALYTICS INTRODUCTION TO THE COURSE. Mingon Kang, PhD Computer Science, Kennesaw State University

A Decision Tree Analysis of the Transfer Student Emma Gunu, MS Research Analyst Robert M Roe, PhD Executive Director of Institutional Research and

Universidade do Minho Escola de Engenharia

A Neural Network GUI Tested on Text-To-Phoneme Mapping

Quantitative analysis with statistics (and ponies) (Some slides, pony-based examples from Blase Ur)

ADVANCED MACHINE LEARNING WITH PYTHON BY JOHN HEARTY DOWNLOAD EBOOK : ADVANCED MACHINE LEARNING WITH PYTHON BY JOHN HEARTY PDF

Assignment 1: Predicting Amazon Review Ratings

QuickStroke: An Incremental On-line Chinese Handwriting Recognition System

Model Ensemble for Click Prediction in Bing Search Ads

Purdue Data Summit Communication of Big Data Analytics. New SAT Predictive Validity Case Study

Knowledge Transfer in Deep Convolutional Neural Nets

Business Analytics and Information Tech COURSE NUMBER: 33:136:494 COURSE TITLE: Data Mining and Business Intelligence

The Boosting Approach to Machine Learning An Overview

Human Emotion Recognition From Speech

PREDICTING SPEECH RECOGNITION CONFIDENCE USING DEEP LEARNING WITH WORD IDENTITY AND SCORE FEATURES

CSL465/603 - Machine Learning

Lahore University of Management Sciences. FINN 321 Econometrics Fall Semester 2017

Algebra 1, Quarter 3, Unit 3.1. Line of Best Fit. Overview

Exploration. CS : Deep Reinforcement Learning Sergey Levine

An Empirical Analysis of the Effects of Mexican American Studies Participation on Student Achievement within Tucson Unified School District

Phonetic- and Speaker-Discriminant Features for Speaker Recognition. Research Project

Historical maintenance relevant information roadmap for a self-learning maintenance prediction procedural approach

Switchboard Language Model Improvement with Conversational Data from Gigaword

Speech Segmentation Using Probabilistic Phonetic Feature Hierarchy and Support Vector Machines

Rule Learning with Negation: Issues Regarding Effectiveness

PROFESSIONAL TREATMENT OF TEACHERS AND STUDENT ACADEMIC ACHIEVEMENT. James B. Chapman. Dissertation submitted to the Faculty of the Virginia

A study of speaker adaptation for DNN-based speech synthesis

arxiv: v2 [cs.cv] 30 Mar 2017

Applications of data mining algorithms to analysis of medical data

Probabilistic Latent Semantic Analysis

Predicting Student Attrition in MOOCs using Sentiment Analysis and Neural Networks

Autoregressive product of multi-frame predictions can improve the accuracy of hybrid models

*Net Perceptions, Inc West 78th Street Suite 300 Minneapolis, MN

Analysis of Hybrid Soft and Hard Computing Techniques for Forex Monitoring Systems

Algebra 2- Semester 2 Review

arxiv: v1 [cs.lg] 15 Jun 2015

Learning Methods for Fuzzy Systems

Evidence for Reliability, Validity and Learning Effectiveness

Modeling function word errors in DNN-HMM based LVCSR systems

A New Perspective on Combining GMM and DNN Frameworks for Speaker Adaptation

Comparison of EM and Two-Step Cluster Method for Mixed Data: An Application

CS 446: Machine Learning

Deep search. Enhancing a search bar using machine learning. Ilgün Ilgün & Cedric Reichenbach

Instructor: Mario D. Garrett, Ph.D. Phone: Office: Hepner Hall (HH) 100

COMPUTER-ASSISTED INDEPENDENT STUDY IN MULTIVARIATE CALCULUS

Iowa School District Profiles. Le Mars

Ryerson University Sociology SOC 483: Advanced Research and Statistics

AGS THE GREAT REVIEW GAME FOR PRE-ALGEBRA (CD) CORRELATED TO CALIFORNIA CONTENT STANDARDS

Mining Association Rules in Student s Assessment Data

Linking Task: Identifying authors and book titles in verbose queries

TIMSS ADVANCED 2015 USER GUIDE FOR THE INTERNATIONAL DATABASE. Pierre Foy

Individual Differences & Item Effects: How to test them, & how to test them well

Cultivating DNN Diversity for Large Scale Video Labelling

Analysis of Emotion Recognition System through Speech Signal Using KNN & GMM Classifier

Certified Six Sigma Professionals International Certification Courses in Six Sigma Green Belt

CLASSIFICATION OF TEXT DOCUMENTS USING INTEGER REPRESENTATION AND REGRESSION: AN INTEGRATED APPROACH

Learning Structural Correspondences Across Different Linguistic Domains with Synchronous Neural Language Models

Artificial Neural Networks

PHD COURSE INTERMEDIATE STATISTICS USING SPSS, 2018

School of Innovative Technologies and Engineering

Corpus Linguistics (L615)

FUZZY EXPERT. Dr. Kasim M. Al-Aubidy. Philadelphia University. Computer Eng. Dept February 2002 University of Damascus-Syria

Data Fusion Through Statistical Matching

A survey of multi-view machine learning

HIERARCHICAL DEEP LEARNING ARCHITECTURE FOR 10K OBJECTS CLASSIFICATION

Modeling function word errors in DNN-HMM based LVCSR systems

Analysis of Enzyme Kinetic Data

A Case Study: News Classification Based on Term Frequency

Speech Emotion Recognition Using Support Vector Machine

Spring 2014 SYLLABUS Michigan State University STT 430: Probability and Statistics for Engineering

Challenges in Deep Reinforcement Learning. Sergey Levine UC Berkeley

have to be modeled) or isolated words. Output of the system is a grapheme-tophoneme conversion system which takes as its input the spelling of words,

Time series prediction

Activity Recognition from Accelerometer Data

INPE São José dos Campos

MYCIN. The MYCIN Task

Softprop: Softmax Neural Network Backpropagation Learning

stateorvalue to each variable in a given set. We use p(x = xjy = y) (or p(xjy) as a shorthand) to denote the probability that X = x given Y = y. We al

Transcription:

Statistics vs Machine Learning ASC 15 th November 2018 Jarlath Quinn www.sv-europe.com A SELECT INTERNATIONAL COMPANY

Contents Machine Learning is Hot How did we get here? Techniques and Terms An Example Why Machine Learning matters Why Statistics still matters What can we expect in the future? 2

Machine Learning is Hot 3

4

Where is Statistics in all this? 5

Let s compare search terms on a leading recruitment site Machine Learning Statistics 6

How did we get here?

Timeline of Statistics and Machine Learning 17th Century Development of probability theory (Cardano, Pascal, Fermat) John Gaunt - Natural and Political Observations upon the Bills of Mortality (1663) 18 th Century Introduction of Bayes theorem Late 19 th and early 20 th Century Introduction of standard deviation, correlation, linear regression (Galton, K. Pearson) Null Hypothesis, Variance (Fisher) Type II Error, Statistical Power, Confidence Intervals (E. Pearson, Neyman) All of these tools pre-date modern computing by utilising distributional approaches 8

Timeline of Statistics and Machine Learning 17th Century Development of probability theory (Cardano, Pascal, Fermat) John Gaunt - Natural and Political Observations upon the Bills of Mortality (1663) 18 th Century Introduction of Bayes theorem Late 19 th and early 20 th Century Introduction of standard deviation, correlation, linear regression (Galton, K. Pearson) Null Hypothesis, Variance (Fisher) Type II Error, Statistical Power, Confidence Intervals (E. Pearson, Neyman) All of these tools pre-date modern computing by utilising distributional approaches 9

Timeline of Statistics and Machine Learning 1951 First neural network (Minsky & Edmonds) the SNARC 1957 Invention of the Perceptron (Rosenblatt) 1967 Introduction of Nearest Neighbour algorithm 1970 Introduction of Backpropagation (Linnainmaa) 1975 Decision Tree algorithm (Quinlan) ID3 1989 - Convolutional neural network used to read digits (LeCun) 1992 Modern Support Vector Machines developed (Boser, Guyon & Vapnik) 1995 Random Forest method proposed (Ho) 2003 Adaptive Boosting (AdaBoost) wins Godel Prize (Freund, Schapire) 2011 Alexnet Deep Learning network (Krizhevsky, Sutskever, Hinton) 2016 - Google's AlphaGo program beats professional human player 10

The New Terms on the Block 11

My CPU is a neural-net processor; a learning computer. The more contact I have with humans, the more I learn. 12

But Statistics is still very important 13

Techniques and Terms

Statistics vs Machine Learning Two analytical cultures 15

Stats Speak vs ML Speak Statistics Parameters Fitting Covariate Dummy coding Regression/Classification Density estimation, clustering Dependent Independent Machine Learning Weights Learning Feature One hot encoding Supervised Learning Unsupervised Learning Target Predictor

4 broad families of mining (Machine Learning) algorithms* Type Overview Statistical (Core) Machine Learning Often based on some form of regression that originate in traditional statistics. Including: Linear Regression, Logistic Regression Discriminant Analysis Structured Equation Modelling/Latent Class (may be confirmatory) Generalized Linear Models (GLM) Cluster/PCA/Factor Derived from research into Information Science and AI. Including: Neural - Multi-Layer Perceptron (MLP), Radial Basis Function (RBF) Support Vector Machines (SVM) Deep Learning Self organising maps We used to say only this was ML Rule Induction (/Decision Trees) Association/Sequence Rule induction algorithms use criteria from Statistics and Machine Learning to derive rules and trees that make predictions including: Classification and Regression Trees (CaRT) CHAID C5 Gradient Boosted Trees Random Forests Find Associations and Sequences in the form IF A and B then C will happen with X% confidence CARMA, Apriori, SPADE, etc. *Not an exhaustive list by any means 17

An Example

Data from 1994 US Census Contains range of demographic and employment related fields Goal is to predict whether respondent earns over $50K Data is randomly split (50/50) into separate Training and Testing groups

A Statistical Approach: Binary Logistic Regression

A Statistical Approach: Binary Logistic Regression note the optional elements that can be added to the analysis output

A Statistical Approach: Binary Logistic Regression First Output Large table showing how categorical fields have been dummy coded

A Statistical Approach: Binary Logistic Regression An Omnibus test used to check that the final model is an improvement over the baseline Model summary showing increase in fit at each step in the Forward LR method

A Statistical Approach: Binary Logistic Regression Classification table showing overall classification accuracy at final step

A Statistical Approach: Binary Logistic Regression Model Coefficients Table

A Statistical Approach: Binary Logistic Regression note the optional elements that can be added to the analysis Table showing effect on model if term removed

A Statistical Approach: Binary Logistic Regression note the optional elements that can be added to the analysis Table showing variable not in the equation at each step

A Data Mining/ML Approach: Multiple methods tested using a automatic classifier

A Data Mining/ML Approach: Multiple methods tested using a automatic classifier

A Data Mining/ML Approach: Multiple methods tested using a automatic classifier

A Data Mining/ML Approach: Multiple methods tested using a automatic classifier Browsing the first (boosted) C5 model, we can see that it consists of a series of rules that have been generated in 10 separate passes of the data

A Data Mining/ML Approach: Multiple methods tested using a automatic classifier Each set of rules corresponds to an individual decision tree

A Data Mining/ML Approach: Multiple methods tested using a automatic classifier The LSVM (Linear Support Vector Machine) shows a predictor importance chart and a series of coefficients or feature weights similar to a statistical model but not much else in the way of detailed output

A Data Mining/ML Approach: Multiple methods tested using a automatic classifier The (bootstrap aggregated) Neural Network shows information about the how often a field was used and the accuracy of the sub-models but no details about what those models consist of

The C5 boosted models has the highest accuracy on the test group

Did Statistics get left behind? Had we incorporated computing methodology from its inception as a fundamental statistical tool (as opposed to simply a convenient way to apply our existing tools) many of the other data related fields would not have needed to exist. They would have been part of our field. Jerome H. Friedman from Data Mining and Statistics What s the Connection -1997 The statistical community has been committed to the almost exclusive use of data models. This commitment has led to irrelevant theory, questionable conclusions, and has kept statisticians from working on a large range of interesting current problems. Algorithmic modeling, both in theory and practice, has developed rapidly in fields outside statistics. It can be used both on large complex data sets and as a more accurate and informative alternative to data modeling on smaller data sets. Leo Breiman from Statistical Modeling: The Two Cultures - 2001 36

Why Machine Learning matters

Why Machine Learning? Certain difficult problems can only be properly addressed with ML There are many situations where accuracy is king It s often much more flexible than statistical approaches E.g. Neural Networks are not bound by algebraic equations in the same way the Regressions are It s where the majority of R&D is focussed It is at the heart of hard AI applications Because we can. We often have data that is more like the population now. So we don t need samples. 38

Why Statistics still matters

Why Statistics? ML is usually a sledgehammer when you need a nutcracker There are many situations where transparency is king Making inferences about populations from samples is by far the most common analytical problem that people deal with To answer a lot of questions we only have samples of data e.g. most ad-hoc surveys, clinical trials and scientific experiments Statistics is still The Science of Variation it s how new things are discovered in data 40

What can we expect in the future?

Future Predictions We see a growing issue in the availability of statistical expertise on the supply side ML and AI will continue to grow and the, arguably purist, distinction between them may disappear There will be more ML tools available to Citizen Data Scientists e.g. line of business Cloud-based, often niche and through smarter/simpler User Interfaces There will be more productivity and automation There is currently a preponderance of tools that claim to automate data preparation for example 42