Ensemble Learning CS534

Similar documents
Introduction to Ensemble Learning Featuring Successes in the Netflix Prize Competition

Machine Learning and Data Mining. Ensembles of Learners. Prof. Alexander Ihler

Python Machine Learning

(Sub)Gradient Descent

Lecture 1: Machine Learning Basics

Learning From the Past with Experiment Databases

CS Machine Learning

Generative models and adversarial training

Algebra 1, Quarter 3, Unit 3.1. Line of Best Fit. Overview

OPTIMIZATINON OF TRAINING SETS FOR HEBBIAN-LEARNING- BASED CLASSIFIERS

An Empirical Comparison of Supervised Ensemble Learning Approaches

Impact of Cluster Validity Measures on Performance of Hybrid Models Based on K-means and Decision Trees

Model Ensemble for Click Prediction in Bing Search Ads

Evolutive Neural Net Fuzzy Filtering: Basic Description

Semi-Supervised GMM and DNN Acoustic Model Training with Multi-system Combination and Confidence Re-calibration

Universidade do Minho Escola de Engenharia

Assignment 1: Predicting Amazon Review Ratings

Softprop: Softmax Neural Network Backpropagation Learning

arxiv: v1 [cs.cl] 2 Apr 2017

Probabilistic Latent Semantic Analysis

The Boosting Approach to Machine Learning An Overview

Learning Methods in Multilingual Speech Recognition

Purdue Data Summit Communication of Big Data Analytics. New SAT Predictive Validity Case Study

Experiment Databases: Towards an Improved Experimental Methodology in Machine Learning

Probability and Statistics Curriculum Pacing Guide

A survey of multi-view machine learning

ADVANCED MACHINE LEARNING WITH PYTHON BY JOHN HEARTY DOWNLOAD EBOOK : ADVANCED MACHINE LEARNING WITH PYTHON BY JOHN HEARTY PDF

Speech Recognition at ICSI: Broadcast News and beyond

The Good Judgment Project: A large scale test of different methods of combining expert predictions

Activity Recognition from Accelerometer Data

Algebra 2- Semester 2 Review

have to be modeled) or isolated words. Output of the system is a grapheme-tophoneme conversion system which takes as its input the spelling of words,

A Neural Network GUI Tested on Text-To-Phoneme Mapping

ENME 605 Advanced Control Systems, Fall 2015 Department of Mechanical Engineering

Module 12. Machine Learning. Version 2 CSE IIT, Kharagpur

Rule Learning With Negation: Issues Regarding Effectiveness

Optimizing to Arbitrary NLP Metrics using Ensemble Selection

Exploration. CS : Deep Reinforcement Learning Sergey Levine

A Case Study: News Classification Based on Term Frequency

Discriminative Learning of Beam-Search Heuristics for Planning

QuickStroke: An Incremental On-line Chinese Handwriting Recognition System

A New Perspective on Combining GMM and DNN Frameworks for Speaker Adaptation

STT 231 Test 1. Fill in the Letter of Your Choice to Each Question in the Scantron. Each question is worth 2 point.

Detailed course syllabus

The 9 th International Scientific Conference elearning and software for Education Bucharest, April 25-26, / X

Word Segmentation of Off-line Handwritten Documents

Evidence for Reliability, Validity and Learning Effectiveness

Unsupervised Learning of Word Semantic Embedding using the Deep Structured Semantic Model

Course Outline. Course Grading. Where to go for help. Academic Integrity. EE-589 Introduction to Neural Networks NN 1 EE

Twitter Sentiment Classification on Sanders Data using Hybrid Approach

Australian Journal of Basic and Applied Sciences

Rule Learning with Negation: Issues Regarding Effectiveness

Active Learning. Yingyu Liang Computer Sciences 760 Fall

Challenges in Deep Reinforcement Learning. Sergey Levine UC Berkeley

Mining Association Rules in Student s Assessment Data

w o r k i n g p a p e r s

FRAMEWORK FOR IDENTIFYING THE MOST LIKELY SUCCESSFUL UNDERPRIVILEGED TERTIARY STUDY BURSARY APPLICANTS

Calibration of Confidence Measures in Speech Recognition

Lahore University of Management Sciences. FINN 321 Econometrics Fall Semester 2017

INPE São José dos Campos

Predicting Student Attrition in MOOCs using Sentiment Analysis and Neural Networks

STAT 220 Midterm Exam, Friday, Feb. 24

A Pipelined Approach for Iterative Software Process Model

Using focal point learning to improve human machine tacit coordination

arxiv: v1 [cs.cv] 10 May 2017

Mandarin Lexical Tone Recognition: The Gating Paradigm

Assessing System Agreement and Instance Difficulty in the Lexical Sample Tasks of SENSEVAL-2

CSL465/603 - Machine Learning

WHEN THERE IS A mismatch between the acoustic

Malicious User Suppression for Cooperative Spectrum Sensing in Cognitive Radio Networks using Dixon s Outlier Detection Method

Predicting Students Performance with SimStudent: Learning Cognitive Skills from Observation

On Human Computer Interaction, HCI. Dr. Saif al Zahir Electrical and Computer Engineering Department UBC

Knowledge Transfer in Deep Convolutional Neural Nets

How to Judge the Quality of an Objective Classroom Test

Multi-label classification via multi-target regression on data streams

Using dialogue context to improve parsing performance in dialogue systems

Iterative Cross-Training: An Algorithm for Learning from Unlabeled Web Pages

A Reinforcement Learning Variant for Control Scheduling

A Simple VQA Model with a Few Tricks and Image Features from Bottom-up Attention

Decision Analysis. Decision-Making Problem. Decision Analysis. Part 1 Decision Analysis and Decision Tables. Decision Analysis, Part 1

Truth Inference in Crowdsourcing: Is the Problem Solved?

Moderator: Gary Weckman Ohio University USA

Infrared Paper Dryer Control Scheme

Analysis of Enzyme Kinetic Data

Syllabus for CHEM 4660 Introduction to Computational Chemistry Spring 2010

Reducing Features to Improve Bug Prediction

Learning Optimal Dialogue Strategies: A Case Study of a Spoken Dialogue Agent for

Evolution of Symbolisation in Chimpanzees and Neural Nets

arxiv: v1 [cs.lg] 15 Jun 2015

On the Combined Behavior of Autonomous Resource Management Agents

System Implementation for SemEval-2017 Task 4 Subtask A Based on Interpolated Deep Neural Networks

A Decision Tree Analysis of the Transfer Student Emma Gunu, MS Research Analyst Robert M Roe, PhD Executive Director of Institutional Research and

Semi-Supervised Face Detection

Constructive Induction-based Learning Agents: An Architecture and Preliminary Experiments

SARDNET: A Self-Organizing Feature Map for Sequences

Multiple regression as a practical tool for teacher preparation program evaluation

Peer Influence on Academic Achievement: Mean, Variance, and Network Effects under School Choice

Using the Attribute Hierarchy Method to Make Diagnostic Inferences about Examinees Cognitive Skills in Algebra on the SAT

Hierarchical Linear Modeling with Maximum Likelihood, Restricted Maximum Likelihood, and Fully Bayesian Estimation

Data Fusion Through Statistical Matching

Transcription:

Ensemble Learning CS534

Ensemble Learning

How to generate ensembles? There have been a wide range of methods developed We will study some popular approaches Bagging ( and Random Forest, a variant that builds de correlated trees) Boosting Both methods take a single (base) learning algorithm and generate ensembles

Base Learning Algorithm We are given a black box learning algorithm Learn referred to as the base learner.

Bootstrap Aggregating (Bagging) Leo Breiman, Bagging Predictors, Machine Learning, 24, 123 140 (1996) Create many different training sets by sampling from the original training set and learn a hypothesis for each training set. Resulting hypotheses will vary due to using different training sets Combine these hypotheses using majority vote

Bagging Algorithm Given training set S, bagging works as follows: 1. Create T bootstrap samples { of S as follows: For each : Randomly drawing S examples from S with replacement 2. For each, 3. Output With large S, each will contain 1 63.2% unique examples

Target concept Single decision tree 100 bagged decision tree

Stability of Learn A learning algorithm is unstable if small changes in the training data can produce large changes in the output hypothesis (otherwise stable) high variance Bagging will have little benefit when used with stable learning algorithms (i.e., most ensemble members will be very similar). Bagging generally works best when used with unstable yet relatively accurate base learners High variance and low bias classifiers

Random Forest An extension to bagging Builds an ensemble of de correlated decision trees One of the most successful classifiers in current practice Very fast Easy to train Many good implements available

Random Forest Classifier M features N examples... Bootstrap samples Each bootstrapped sample is used to build a tree When building the tree, each node only choose from randomly sampled features Gini index is used to select the test

Random Forest Classifier M features N examples...... Take majority vote

Random forest learns trees that makes de correlated errors

Random forest Available package: http://www.stat.berkeley.edu/~breiman/randomforests/cc_home.htm To read more: http://www stat.stanford.edu/~hastie/papers/eslii.pdf

Its iterative. Boosting Bagging : Individual classifiers were independently learned Boosting: Look at errors from previous classifiers to decide what to focus on for the next iteration over data Successive classifiers depends upon its predecessors. Result: more weights on hard examples. (the ones on which we committed mistakes in the previous iterations)

Some Boosting History The idea of boosting began with a learning theory question first asked in the late 80 s. The question was answered in 1989 by Robert Shapire resulting in the first theoretical boosting algorithm Shapire and Freund later developed a practical boosting algorithm called Adaboost Many empirical studies show that Adaboost is highly effective (very often they outperform ensembles produced by bagging)

Specifying Input Distributions AdaBoost works by invoking Learn many times on different distributions over the training data set. Need to modify base learner protocol to accept a training set distribution as an input. D(i) can be viewed as indicating to base learner Learn the importance of correctly classifying the i th training instance

AdaBoost (High level steps) AdaBoost performs L boosting rounds, the operations in each boosting round are: 1. Call Learn on data set S with distribution to produce l th ensemble member, where is the distribution of round. 2. Compute the 1 round distribution by putting more weight on instances that makes mistakes on 3. Compute a voting weight for The ensemble hypothesis returned is: H=<,,,,, >

Learning with Weights It is often straightforward to convert a base learner to take into account an input distribution D. Decision trees? Neural nets? Logistic regression? When it s not straightforward, we can resample the training data according to D

Schapire 1989. Letter recognition

(schapire, Freund, Bartlett and Lee 1998)

AdaBoost as an Additive Model We will now derive AdaBoost in a way that can be adapted in various ways This recipe will let you derive boosting style algorithms for particular learning settings of interest E.g., general mis prediction cost, semi supervised learning These boosting style algorithms will not generally be boosting algorithms in the theoretical sense but they often work quite well

AdaBoost: Iterative Learning of Additive Models Consider the final hypothesis: it takes the sign of an additive expansion of a set of base classifiers AdaBoost iteratively finds at each iteration an add to to The goal is to minimize a loss function on the training example:

Instead, Adaboost can be viewed as minimizing an exponential loss function, which is a smooth upper bound on 0/1 error: 0

Fix and optimize

Pitfall of Boosting: sensitive to noise and outliers

Summary: Bagging and Boosting Bagging Resample data points Weight of each classifier is the same Only variance reduction Robust to noise and outliers Boosting Reweight data points (modify data distribution) Weight of classifier vary depending on accuracy Reduces both bias and variance Can hurt performance with noise and outliers