Lecture. About. Overview

Similar documents
Introduction to Ensemble Learning Featuring Successes in the Netflix Prize Competition

Python Machine Learning

Machine Learning and Data Mining. Ensembles of Learners. Prof. Alexander Ihler

Learning From the Past with Experiment Databases

CS Machine Learning

Assignment 1: Predicting Amazon Review Ratings

Lecture 1: Machine Learning Basics

Universidade do Minho Escola de Engenharia

(Sub)Gradient Descent

Artificial Neural Networks written examination

STA 225: Introductory Statistics (CT)

COMPUTER-ASSISTED INDEPENDENT STUDY IN MULTIVARIATE CALCULUS

QuickStroke: An Incremental On-line Chinese Handwriting Recognition System

arxiv: v1 [cs.lg] 15 Jun 2015

A Case Study: News Classification Based on Term Frequency

Model Ensemble for Click Prediction in Bing Search Ads

Human Emotion Recognition From Speech

MULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question.

Switchboard Language Model Improvement with Conversational Data from Gigaword

Lecture 1: Basic Concepts of Machine Learning

Softprop: Softmax Neural Network Backpropagation Learning

Algebra 2- Semester 2 Review

Probabilistic Latent Semantic Analysis

Multi-label classification via multi-target regression on data streams

Purdue Data Summit Communication of Big Data Analytics. New SAT Predictive Validity Case Study

The Good Judgment Project: A large scale test of different methods of combining expert predictions

Statistical Studies: Analyzing Data III.B Student Activity Sheet 7: Using Technology

Word Segmentation of Off-line Handwritten Documents

Business Analytics and Information Tech COURSE NUMBER: 33:136:494 COURSE TITLE: Data Mining and Business Intelligence

System Implementation for SemEval-2017 Task 4 Subtask A Based on Interpolated Deep Neural Networks

Exploration. CS : Deep Reinforcement Learning Sergey Levine

CSL465/603 - Machine Learning

Activity Recognition from Accelerometer Data

Probability and Game Theory Course Syllabus

Grade 2: Using a Number Line to Order and Compare Numbers Place Value Horizontal Content Strand

Iterative Cross-Training: An Algorithm for Learning from Unlabeled Web Pages

Quantitative Research Questionnaire

Algebra 1, Quarter 3, Unit 3.1. Line of Best Fit. Overview

learning collegiate assessment]

arxiv: v1 [cs.cl] 2 Apr 2017

*Net Perceptions, Inc West 78th Street Suite 300 Minneapolis, MN

Speech Recognition at ICSI: Broadcast News and beyond

SARDNET: A Self-Organizing Feature Map for Sequences

A New Perspective on Combining GMM and DNN Frameworks for Speaker Adaptation

Introduction to Causal Inference. Problem Set 1. Required Problems

Introduction to Simulation

Module 12. Machine Learning. Version 2 CSE IIT, Kharagpur

Twitter Sentiment Classification on Sanders Data using Hybrid Approach

Semi-supervised methods of text processing, and an application to medical concept extraction. Yacine Jernite Text-as-Data series September 17.

Maximizing Learning Through Course Alignment and Experience with Different Types of Knowledge

OCR for Arabic using SIFT Descriptors With Online Failure Prediction

Learning Structural Correspondences Across Different Linguistic Domains with Synchronous Neural Language Models

Experiment Databases: Towards an Improved Experimental Methodology in Machine Learning

Applications of data mining algorithms to analysis of medical data

Reducing Features to Improve Bug Prediction

INPE São José dos Campos

Time series prediction

OPTIMIZATINON OF TRAINING SETS FOR HEBBIAN-LEARNING- BASED CLASSIFIERS

Evaluating and Comparing Classifiers: Review, Some Recommendations and Limitations

arxiv: v2 [cs.cv] 30 Mar 2017

Chinese Language Parsing with Maximum-Entropy-Inspired Parser

Page 1 of 11. Curriculum Map: Grade 4 Math Course: Math 4 Sub-topic: General. Grade(s): None specified

Speech Emotion Recognition Using Support Vector Machine

A Neural Network GUI Tested on Text-To-Phoneme Mapping

Software Maintenance

Mining Association Rules in Student s Assessment Data

Probability and Statistics Curriculum Pacing Guide

Experiments with SMS Translation and Stochastic Gradient Descent in Spanish Text Author Profiling

CS 1103 Computer Science I Honors. Fall Instructor Muller. Syllabus

TIMSS ADVANCED 2015 USER GUIDE FOR THE INTERNATIONAL DATABASE. Pierre Foy

Learning Distributed Linguistic Classes

Data Fusion Through Statistical Matching

Course Outline. Course Grading. Where to go for help. Academic Integrity. EE-589 Introduction to Neural Networks NN 1 EE

Rule Learning With Negation: Issues Regarding Effectiveness

The stages of event extraction

12- A whirlwind tour of statistics

The Boosting Approach to Machine Learning An Overview

PREDICTING SPEECH RECOGNITION CONFIDENCE USING DEEP LEARNING WITH WORD IDENTITY AND SCORE FEATURES

CS 446: Machine Learning

Analysis of Emotion Recognition System through Speech Signal Using KNN & GMM Classifier

National Survey of Student Engagement (NSSE) Temple University 2016 Results

An Empirical Comparison of Supervised Ensemble Learning Approaches

A Decision Tree Analysis of the Transfer Student Emma Gunu, MS Research Analyst Robert M Roe, PhD Executive Director of Institutional Research and

When!Identifying!Contributors!is!Costly:!An! Experiment!on!Public!Goods!

How People Learn Physics

Learning By Asking: How Children Ask Questions To Achieve Efficient Search

Issues in the Mining of Heart Failure Datasets

Analysis of Hybrid Soft and Hard Computing Techniques for Forex Monitoring Systems

The 9 th International Scientific Conference elearning and software for Education Bucharest, April 25-26, / X

The Evolution of Random Phenomena

Analysis of Enzyme Kinetic Data

Conference Presentation

Axiom 2013 Team Description Paper

Multivariate k-nearest Neighbor Regression for Time Series data -

Impact of Cluster Validity Measures on Performance of Hybrid Models Based on K-means and Decision Trees

Networks in Cognitive Science

An OO Framework for building Intelligence and Learning properties in Software Agents

Universityy. The content of

Large-Scale Web Page Classification. Sathi T Marath. Submitted in partial fulfilment of the requirements. for the degree of Doctor of Philosophy

Linking Task: Identifying authors and book titles in verbose queries

Transcription:

Lecture About Overview "Using multiple models/individuals to obtain better predictive performance than could be obtained from the constituent models." We use the same technique routinely in our lives by asking the opinion of several experts before making a decision Diversity is key Used for classification or regression (and many other things) Combine predictions of collection of hypotheses Example: generate a hundred different decision trees from the same training set an have them vote on the best classification for a new example Movie Trivia One person versus the world Illustrates ensemble learning Who wins? How does the 'world' come to a consensus? Motivation Improving the performance Model selection

Which model/parameters/etc. to use? Using low training/testing error can be misleading. What about models that all have same testing performance (possibly an infinite amount)? Prevent poor model choice Prevent consequences of a poor choice May not always beat best classifier. But reduces the risk of a bad choice. Sometimes it will beat it. Confidence estimation Naturally by voting Useful to know how sure the ensemble is about a classification Data surplus or lack Too much: partition data and train each ensemble member on a piece of the data Too little: use bootstrapping to train multiple individuals on replacement sampled subsets Appropriate solution outside of hypothesis space of model Linear classifiers cannot learn circular classification (ensemble of them can) Circular classifiers cannot learn blob classification (ensemble of them can) Example Three linear threshold hypotheses Classify positively on the unshaded side We classify as positive, an example that is classified positively by all three The triangle is the hypothesis that results, and it is not expressible in the original hypothesis space Data fusion Data comes from various sources and is unrelated Cannot be used to train a single classifier because data has different features Even if it has the same features it probably won't yield good results Example Ensemble, M = 5, majority voting Ensemble misclassifies only if three of five hypotheses misclassify (hopefully less than single hypothesis) Suppose each hypothesis h i has an error of probability e (the probability that a randomly chosen example is misclassified by it is e) Assume errors are independent

If p is small, then probability of ensemble misclassification is minuscule M = 5, e =.1, 3 of 5 for misclassification, e for ensemble = 1/10 * 1/10 * 1/10 = 1/1000 (Excercies 18.14) In practice errors are highly correlated (not independent, many hypotheses are likely misled in the same way by the data) Goal is to reduce error correlation Ingredients Diversity (most important) Errors on different examples Uncorrelated errors Use different models (MLPs, decision trees, nearest neighbor classifiers, and support vector machines) Train on different subsets of data Use different training parameters (for example, neural networks weight, # of neurons initialization) Simplicity We normally want to use simplistic models (for obvious reasons). Ensembles allow us to. Efficiency More individuals to train Ensemble Methods None Original Data 1 3 0 2 0 1 Split (X > 1) 1 3 0 2 0 1 New Data 1 4? = 0 3 2? = 1

Bagging (bootstrap aggregation) Reduces variance and overfitting Usually used with decision trees (can be used with any model) Needs many comparable classifiers One of the first effective methods One of the simplest methods Method 1. Create M new subsets of training data (sampling with replacement/bootstrap) 2. Train M models one per dataset 3. Combine outputs by averaging/voting (regression/classification) Subset 1 (Split X > 1) Subset 2 (Split Y > 1) 1 3 0 2 0 1 Subset 3 (Split X > 2) New Data 1 4? = (0, 1, 0) = 0 3 2? = (1, 1, 1) = 1

Boosting Using a collection of weak learners to make a strong learner General Approach Iteratively generate weak learners Each example in the training set has a weight (the higher the weight the more importance attached to it during training) Combine weak learners weighted by performance to make strong learner Differ in method of weighting AdaBoost Method 1. Create a weak classifier (slightly better than random guessing) 2. Iteratively train weak classifiers on a dataset in which points misclassified by the previous model are weighted more heavily 3. Weight all models according to their success 4. Combine outputs using voting/averaging with weighting Information Most popular "One of the most powerful learning ideas introduced in the last twenty years." Advantages Reuses same training set (so it can be small) Can combine any number of base-learners Disadvantages Cannot train in parallel Classifier 1 (Y > 3, weight: 3/5) Weight 1 3 0 0.5 0.5 0.5

2 0 1 2 2 Classifier 2 (X > 1, weight: 5/5) Weight 1 3 0 0.5 0.5 0.5 2 0 1 2 2 New Data 1 4? = (1: 3/5, 0: 5/5) = 0 3 2? = (0: 3/5, 1: 5/5) = 1 Demo (AdaBoost) http://cseweb.ucsd.edu/~yfreund/adaboost/index.html Settings: prediction sum, training set Hypothesis space: Decision stumps (vertical or horizontal threshold/line) Stacking (Stacked Generalization) Information Good utilization of training data Uses meta learner Attractive idea, but less widely used than bagging and boosting Can be (and normally is) used to combine models of different types (unlike bagging and boosting)

Method 1. Split data into two disjoint sets (training & testing) 2. Train several base learners on first set 3. Test base learners on second set 4. Using predictions from 3) as inputs, and correct responses as outputs, train a higher level learner Homework & Project [Described in separate document.] Resources Wikipedia article: http://en.wikipedia.org/wiki/ensemble_learning Scholarpedia article: http://www.scholarpedia.org/article/ensemble_learning Russell Norvig book chapter 18.4 Ensemble learning survey paper: "Ensemble Learning" by Martin Sewell, 2008 AdaBoost demo: http://cseweb.ucsd.edu/~yfreund/adaboost/index.html