The Machine Learning Landscape

Similar documents
Machine Learning and Data Mining. Ensembles of Learners. Prof. Alexander Ihler

Introduction to Ensemble Learning Featuring Successes in the Netflix Prize Competition

Python Machine Learning

(Sub)Gradient Descent

Lecture 1: Machine Learning Basics

CS Machine Learning

Assignment 1: Predicting Amazon Review Ratings

CSL465/603 - Machine Learning

Module 12. Machine Learning. Version 2 CSE IIT, Kharagpur

Active Learning. Yingyu Liang Computer Sciences 760 Fall

Learning From the Past with Experiment Databases

Artificial Neural Networks written examination

Softprop: Softmax Neural Network Backpropagation Learning

Model Ensemble for Click Prediction in Bing Search Ads

arxiv: v2 [cs.cv] 30 Mar 2017

Universidade do Minho Escola de Engenharia

Lecture 1: Basic Concepts of Machine Learning

Business Analytics and Information Tech COURSE NUMBER: 33:136:494 COURSE TITLE: Data Mining and Business Intelligence

Probability and Statistics Curriculum Pacing Guide

Impact of Cluster Validity Measures on Performance of Hybrid Models Based on K-means and Decision Trees

Generative models and adversarial training

arxiv: v1 [cs.lg] 15 Jun 2015

QuickStroke: An Incremental On-line Chinese Handwriting Recognition System

FRAMEWORK FOR IDENTIFYING THE MOST LIKELY SUCCESSFUL UNDERPRIVILEGED TERTIARY STUDY BURSARY APPLICANTS

Twitter Sentiment Classification on Sanders Data using Hybrid Approach

Learning Methods in Multilingual Speech Recognition

Version Space. Term 2012/2013 LSI - FIB. Javier Béjar cbea (LSI - FIB) Version Space Term 2012/ / 18

Exploration. CS : Deep Reinforcement Learning Sergey Levine

Axiom 2013 Team Description Paper

A Decision Tree Analysis of the Transfer Student Emma Gunu, MS Research Analyst Robert M Roe, PhD Executive Director of Institutional Research and

Course Outline. Course Grading. Where to go for help. Academic Integrity. EE-589 Introduction to Neural Networks NN 1 EE

An Empirical Comparison of Supervised Ensemble Learning Approaches

Chapter 2 Rule Learning in a Nutshell

ADVANCED MACHINE LEARNING WITH PYTHON BY JOHN HEARTY DOWNLOAD EBOOK : ADVANCED MACHINE LEARNING WITH PYTHON BY JOHN HEARTY PDF

Deep search. Enhancing a search bar using machine learning. Ilgün Ilgün & Cedric Reichenbach

Learning Structural Correspondences Across Different Linguistic Domains with Synchronous Neural Language Models

The 9 th International Scientific Conference elearning and software for Education Bucharest, April 25-26, / X

Individual Differences & Item Effects: How to test them, & how to test them well

Chinese Language Parsing with Maximum-Entropy-Inspired Parser

Iterative Cross-Training: An Algorithm for Learning from Unlabeled Web Pages

Cultivating DNN Diversity for Large Scale Video Labelling

Challenges in Deep Reinforcement Learning. Sergey Levine UC Berkeley

A Simple VQA Model with a Few Tricks and Image Features from Bottom-up Attention

WHEN THERE IS A mismatch between the acoustic

System Implementation for SemEval-2017 Task 4 Subtask A Based on Interpolated Deep Neural Networks

TIMSS ADVANCED 2015 USER GUIDE FOR THE INTERNATIONAL DATABASE. Pierre Foy

Human Emotion Recognition From Speech

Purdue Data Summit Communication of Big Data Analytics. New SAT Predictive Validity Case Study

Reducing Features to Improve Bug Prediction

Analysis of Hybrid Soft and Hard Computing Techniques for Forex Monitoring Systems

Autoregressive product of multi-frame predictions can improve the accuracy of hybrid models

Knowledge Transfer in Deep Convolutional Neural Nets

The Boosting Approach to Machine Learning An Overview

Unsupervised Learning of Word Semantic Embedding using the Deep Structured Semantic Model

Semi-supervised methods of text processing, and an application to medical concept extraction. Yacine Jernite Text-as-Data series September 17.

Semi-Supervised Face Detection

Calibration of Confidence Measures in Speech Recognition

INPE São José dos Campos

Probabilistic Latent Semantic Analysis

Detailed course syllabus

Multi-label classification via multi-target regression on data streams

CS4491/CS 7265 BIG DATA ANALYTICS INTRODUCTION TO THE COURSE. Mingon Kang, PhD Computer Science, Kennesaw State University

Word learning as Bayesian inference

Machine Learning from Garden Path Sentences: The Application of Computational Linguistics

A Deep Bag-of-Features Model for Music Auto-Tagging

Switchboard Language Model Improvement with Conversational Data from Gigaword

Article A Novel, Gradient Boosting Framework for Sentiment Analysis in Languages where NLP Resources Are Not Plentiful: A Case Study for Modern Greek

Masterarbeit. Im Studiengang Informatik. Predicting protein contacts by combining information from sequence and physicochemistry

Australian Journal of Basic and Applied Sciences

CS 446: Machine Learning

A Vector Space Approach for Aspect-Based Sentiment Analysis

Rule Learning With Negation: Issues Regarding Effectiveness

PREDICTING SPEECH RECOGNITION CONFIDENCE USING DEEP LEARNING WITH WORD IDENTITY AND SCORE FEATURES

Mining Association Rules in Student s Assessment Data

A survey of multi-view machine learning

A Case Study: News Classification Based on Term Frequency

A Review: Speech Recognition with Deep Learning Methods

Rule Learning with Negation: Issues Regarding Effectiveness

Experiment Databases: Towards an Improved Experimental Methodology in Machine Learning

Learning Distributed Linguistic Classes

Attributed Social Network Embedding

Algebra 1, Quarter 3, Unit 3.1. Line of Best Fit. Overview

Word Segmentation of Off-line Handwritten Documents

ScienceDirect. A Framework for Clustering Cardiac Patient s Records Using Unsupervised Learning Techniques

Learning to Schedule Straight-Line Code

Short vs. Extended Answer Questions in Computer Science Exams

Revisiting the role of prosody in early language acquisition. Megha Sundara UCLA Phonetics Lab

Robot Learning Simultaneously a Task and How to Interpret Human Instructions

TRANSFER LEARNING IN MIR: SHARING LEARNED LATENT REPRESENTATIONS FOR MUSIC AUDIO CLASSIFICATION AND SIMILARITY

HIERARCHICAL DEEP LEARNING ARCHITECTURE FOR 10K OBJECTS CLASSIFICATION

ENME 605 Advanced Control Systems, Fall 2015 Department of Mechanical Engineering

A Model to Predict 24-Hour Urinary Creatinine Level Using Repeated Measurements

Quantitative analysis with statistics (and ponies) (Some slides, pony-based examples from Blase Ur)

Algebra 2- Semester 2 Review

An empirical study of learning speed in backpropagation

A study of speaker adaptation for DNN-based speech synthesis

12- A whirlwind tour of statistics

Issues in the Mining of Heart Failure Datasets

Analysis of Enzyme Kinetic Data

Historical maintenance relevant information roadmap for a self-learning maintenance prediction procedural approach

Transcription:

The Machine Learning Landscape Vineet Bansal Research Software Engineer, Center for Statistics & Machine Learning vineetb@princeton.edu Oct 31, 2018

What is ML? A field of study that gives computers the ability to learn without being explicitly programmed. A machine learning system is trained rather than explicitly programmed.

Types of ML Systems Supervised Learning Training Data contains desired solutions, or labels Features

Types of ML Systems Unsupervised Learning Training Data is unlabeled

Types of ML Systems Reinforcement Learning Training Data does not contain target output, but instead contains some possible output together with a measure of how good that output is. <input>, <correct output> <input>, <some output>, <grade for this output>

Classification vs Regression

ML Landscape

Unsupervised Learning - Clustering Clustering Color clusters of points in a homogenous cloud of data. Use Cases Behavioral Segmentation in Marketing Useful as a pre processing step before applying other classification algorithms. Cluster ID could be added as feature for each data point.

k means Algorithm Unsupervised Learning - Clustering Guess some cluster centers Repeat until converged E Step: assign points to the nearest cluster center M Step: set the cluster centers to the mean

Choosing k Unsupervised Learning - Clustering

ML Landscape

Linear Regression

Linear Regression X = ϴ= 1 x 1 1 x 1 2 x 1 n 1 x 2 1 x 2 2 x 2 n 1 x 1m x 2m x m n ϴ 0 ϴ 1 ϴ 2 ϴ n y y 1 y 2 y m y y 1 y 2 y m Define a Hypothesis h ϴ (X) = y X ϴ Define a Cost Function (a measure of how bad we re doing) MSE X, h ϴ 1 m y i yi 2 Repeat until convergence: Calculate Cost Function on chosen ϴ Calculate slope of Cost Function Tweak ϴ so as to move downhill (reduce Cost Function value) ϴ is now optimized for our training data.

Logistic Regression Used to estimate the probability that an instance belongs to a particular class.

Logistic Regression

Logistic Regression No closed form solution, but we can use Gradient Descent!

ML Landscape

Overfitting and Underfitting

Bias-Variance Tradeoff

Regularization How to ensure that we re not overfitting to our training data? Impose a small penalty on model complexity. l1 penalty (Lasso Regression) l2 penalty (Ridge Regression)

Testing and Validation

K-fold Cross Validation

Decision Tree Basic Idea Construct a tree to ask a series of questions from your data.

Let s see how it works on a real dataset. Decision Tree

Decision Tree How is the tree built? Define a Cost Function that measures the impurity of a node. A node is pure (impurity = 0) if all training instances it applies to belong to the same class. One possible impurity measure is Gini: Search for a feature and threshold that minimizes our Cost Function. Gini scores of subsets thus produced are weighted by their size. Greedy Algorithm may not produce the optimum tree. CART Algorithm. ID3 Algorithm for non binary trees.

Decision Tree Decision Trees can be used for regression! Minimize MSE instead of impurity.

Decision Tree Advantages White Box easily interpretable Disadvantages Prone to overfitting Regularize by setting maximum depth Comes up only with orthogonal boundaries Sensitive to training set rotation Use PCA!

ML Landscape

Ensemble Methods Basic Idea Two Decision Trees by themselves may overfit. But combining their predictions may be a good idea!

Ensemble Methods Basic Idea Two Decision Trees by themselves may overfit. But combining their predictions may be a good idea!

Bagging Bagging = Bootstrap Aggregation Use the same training algorithm for every predictor, but train them on different random subsets of the training set. Random Forest is an Ensemble of Decision Trees, generally trained via the bagging method.

Boosting Basic Idea Train several weak learners sequentially, each trying to correct the errors made by its predecessor. Adaptive Boosting (ADABoost) Give more relative weight to the misclassified instances.

Boosting Gradient Boosting Try to fit a new predictor to the residual errors made by the previous predictor. Best Performance Random Forests and Gradient Boosting Methods (implemented in the xgboost library) have been winning most competitions on Kaggle recently on structured data. Deep Learning (especially Convolutional Networks) is the clear winner for unstructured data problems (perception/speech/vision etc.)

ML Landscape

ML Landscape

Where to go from here?