CptS 570 Machine Learning School of EECS Washington State University. CptS Machine Learning 1

Similar documents
Introduction to Ensemble Learning Featuring Successes in the Netflix Prize Competition

Machine Learning and Data Mining. Ensembles of Learners. Prof. Alexander Ihler

Lecture 1: Machine Learning Basics

Python Machine Learning

Module 12. Machine Learning. Version 2 CSE IIT, Kharagpur

(Sub)Gradient Descent

Learning From the Past with Experiment Databases

CS Machine Learning

Generative models and adversarial training

Softprop: Softmax Neural Network Backpropagation Learning

Speech Recognition at ICSI: Broadcast News and beyond

STA 225: Introductory Statistics (CT)

Probability and Statistics Curriculum Pacing Guide

Knowledge Transfer in Deep Convolutional Neural Nets

Purdue Data Summit Communication of Big Data Analytics. New SAT Predictive Validity Case Study

Laboratorio di Intelligenza Artificiale e Robotica

Assignment 1: Predicting Amazon Review Ratings

CSL465/603 - Machine Learning

Learning Distributed Linguistic Classes

Human Emotion Recognition From Speech

Model Ensemble for Click Prediction in Bing Search Ads

Universidade do Minho Escola de Engenharia

Active Learning. Yingyu Liang Computer Sciences 760 Fall

System Implementation for SemEval-2017 Task 4 Subtask A Based on Interpolated Deep Neural Networks

QuickStroke: An Incremental On-line Chinese Handwriting Recognition System

Exploration. CS : Deep Reinforcement Learning Sergey Levine

Calibration of Confidence Measures in Speech Recognition

Word Segmentation of Off-line Handwritten Documents

Activity Recognition from Accelerometer Data

*Net Perceptions, Inc West 78th Street Suite 300 Minneapolis, MN

arxiv: v1 [cs.lg] 15 Jun 2015

Evaluating and Comparing Classifiers: Review, Some Recommendations and Limitations

Experiment Databases: Towards an Improved Experimental Methodology in Machine Learning

STT 231 Test 1. Fill in the Letter of Your Choice to Each Question in the Scantron. Each question is worth 2 point.

A Bootstrapping Model of Frequency and Context Effects in Word Learning

Speech Emotion Recognition Using Support Vector Machine

An Empirical Comparison of Supervised Ensemble Learning Approaches

Semi-Supervised GMM and DNN Acoustic Model Training with Multi-system Combination and Confidence Re-calibration

Task Types. Duration, Work and Units Prepared by

Switchboard Language Model Improvement with Conversational Data from Gigaword

Using dialogue context to improve parsing performance in dialogue systems

SARDNET: A Self-Organizing Feature Map for Sequences

A Case Study: News Classification Based on Term Frequency

OPTIMIZATINON OF TRAINING SETS FOR HEBBIAN-LEARNING- BASED CLASSIFIERS

Probabilistic Latent Semantic Analysis

Discriminative Learning of Beam-Search Heuristics for Planning

Predicting Student Attrition in MOOCs using Sentiment Analysis and Neural Networks

Reducing Features to Improve Bug Prediction

Semi-Supervised Face Detection

A study of speaker adaptation for DNN-based speech synthesis

Cooperative evolutive concept learning: an empirical study

SETTING STANDARDS FOR CRITERION- REFERENCED MEASUREMENT

ADVANCED MACHINE LEARNING WITH PYTHON BY JOHN HEARTY DOWNLOAD EBOOK : ADVANCED MACHINE LEARNING WITH PYTHON BY JOHN HEARTY PDF

Maximizing Learning Through Course Alignment and Experience with Different Types of Knowledge

Analysis of Enzyme Kinetic Data

Lecture 1: Basic Concepts of Machine Learning

Computerized Adaptive Psychological Testing A Personalisation Perspective

A Simple VQA Model with a Few Tricks and Image Features from Bottom-up Attention

PM tutor. Estimate Activity Durations Part 2. Presented by Dipo Tepede, PMP, SSBB, MBA. Empowering Excellence. Powered by POeT Solvers Limited

COMPUTER-ASSISTED INDEPENDENT STUDY IN MULTIVARIATE CALCULUS

Australian Journal of Basic and Applied Sciences

Impact of Cluster Validity Measures on Performance of Hybrid Models Based on K-means and Decision Trees

CS 446: Machine Learning

Analysis of Emotion Recognition System through Speech Signal Using KNN & GMM Classifier

Laboratorio di Intelligenza Artificiale e Robotica

2/15/13. POS Tagging Problem. Part-of-Speech Tagging. Example English Part-of-Speech Tagsets. More Details of the Problem. Typical Problem Cases

PROFESSIONAL TREATMENT OF TEACHERS AND STUDENT ACADEMIC ACHIEVEMENT. James B. Chapman. Dissertation submitted to the Faculty of the Virginia

Ensemble Technique Utilization for Indonesian Dependency Parser

A Version Space Approach to Learning Context-free Grammars

The 9 th International Scientific Conference elearning and software for Education Bucharest, April 25-26, / X

Individual Differences & Item Effects: How to test them, & how to test them well

Conference Presentation

Chinese Language Parsing with Maximum-Entropy-Inspired Parser

Learning Methods in Multilingual Speech Recognition

Axiom 2013 Team Description Paper

Chapters 1-5 Cumulative Assessment AP Statistics November 2008 Gillespie, Block 4

Learning to Rank with Selection Bias in Personal Search

WHEN THERE IS A mismatch between the acoustic

Multi-label classification via multi-target regression on data streams

Analysis of Hybrid Soft and Hard Computing Techniques for Forex Monitoring Systems

The Evolution of Random Phenomena

What is a Mental Model?

A New Perspective on Combining GMM and DNN Frameworks for Speaker Adaptation

Twitter Sentiment Classification on Sanders Data using Hybrid Approach

Artificial Neural Networks written examination

MYCIN. The MYCIN Task

When!Identifying!Contributors!is!Costly:!An! Experiment!on!Public!Goods!

FRAMEWORK FOR IDENTIFYING THE MOST LIKELY SUCCESSFUL UNDERPRIVILEGED TERTIARY STUDY BURSARY APPLICANTS

Evolutive Neural Net Fuzzy Filtering: Basic Description

Essentials of Ability Testing. Joni Lakin Assistant Professor Educational Foundations, Leadership, and Technology

arxiv: v1 [cs.cl] 2 Apr 2017

Ph.D in Advance Machine Learning (computer science) PhD submitted, degree to be awarded on convocation, sept B.Tech in Computer science and

PREDICTING SPEECH RECOGNITION CONFIDENCE USING DEEP LEARNING WITH WORD IDENTITY AND SCORE FEATURES

An OO Framework for building Intelligence and Learning properties in Software Agents

Iterative Cross-Training: An Algorithm for Learning from Unlabeled Web Pages

Algebra 1, Quarter 3, Unit 3.1. Line of Best Fit. Overview

Reduce the Failure Rate of the Screwing Process with Six Sigma Approach

Machine Learning from Garden Path Sentences: The Application of Computational Linguistics

Go fishing! Responsibility judgments when cooperation breaks down

A survey of multi-view machine learning

Transcription:

CptS 570 Machine Learning School of EECS Washington State University CptS 570 - Machine Learning 1

No one learner is always best (No Free Lunch) Combination of learners can overcome individual weaknesses How to choose learners that complement one another? How to combine their outputs to maximize accuracy? Ensemble: Weighted maority vote of several learners CptS 570 - Machine Learning 2

Different algorithms E.g., parametric vs. non-parametric Different parameter settings E.g., random initial weights in neural network Different input representations E.g., feature selection E.g., multi-modal training data (e.g., audio & video) Different training sets Bagging: Different samples of same training set Boosting/Cascading: Weight more heavily examples missed by previous learned classifier Partitioning: Mixture of experts CptS 570 - Machine Learning 3

All learners generate an output Voting, stacking One or a few learners generate output Chosen by gating function Mixture of experts Learner output weighted by accuracy and complexity Cascading, boosting CptS 570 - Machine Learning 4

L learners, K outputs d i (x) is prediction of learner for output i Regression y i L = w where 0 and d i w w = 1 L = 1 = 1 Classification Choose C i if y i = K max k = 1 y k CptS 570 - Machine Learning 5

Maority voting: w = 1/L If learner produces P(C i x), then use as weights after normalization Weight w is accuracy of learner on validation set Learn weights (stacked generalization) CptS 570 - Machine Learning 6

Example: CptS 570 - Machine Learning 7

P ( C x) = P( C x, M ) P( M ) i allmodels M where d i =P(C i x,m ) and w =P(M ) Maority voting implies uniform prior Can t include all models, so choose a few with suspected high probability i CptS 570 - Machine Learning 8

Assuming each learner is independent and better than random Then adding more learners will maintain bias, but reduce variance (i.e., error) CptS 570 - Machine Learning 9 [ ] [ ] [ ] ( ) ( ) ( ) d L d L L d L d L y d E d E L L d L E y E Var Var Var Var Var 1 1 1 1 1 1 2 2 = = = = = = =

General case Var 2 L L 1 1 ( y) = Var d = Var( d ) + Cov( di, d ) 2 2 i< If learners positively correlated, then variance (and error) increase If learners negatively correlated, then variance (and error) decrease But bias increases Voting is a form of smoothing that maintains low bias, but decreases variance CptS 570 - Machine Learning 10

Given training set X of size N Generate L different training sets, each of size N, by sampling with replacement from X Called bootstrapping Use one learning algorithm to learn L classifiers from the different training sets Learning algorithm must be unstable I.e., small changes in training set result in different classifiers E.g., decision trees, neural networks CptS 570 - Machine Learning 11

Similar to bagging, but L training sets chosen to increase negative correlation Use one learning algorithm to learn L classifiers Training set for classifier biased toward examples missed by classifier -1 Learning algorithm should be weak (not too accurate) Adaptive Boosting (AdaBoost) CptS 570 - Machine Learning 12

CptS 570 - Machine Learning 13

Each point represents 1 of 27 test domains. Dietterich Machine Learning Research: Four Current Directions, AI Magazine, Winter 1997. CptS 570 - Machine Learning 14

CptS 570 - Machine Learning 15

CptS 570 - Machine Learning 16

Weights depend on the test instance y = L = 1 w ( x) d ( x) Competitive learning Weight w (x) driven toward 1 (others to 0) for learner best at region near x CptS 570 - Machine Learning 17

Combining function f( ) is learned Train f on data not used to train base learners CptS 570 - Machine Learning 18

Ensemble need not be fixed Can modify ensemble to improve accuracy or reduce correlation of base learners Subset selection Add/remove base learners while performance improves Meta-learners Stack learners to construct new features CptS 570 - Machine Learning 19

Use classifier d only if previous classifiers lacked confidence Order classifiers by increasing complexity Differs from boosting Both errant and uncertain examples passed to next learner CptS 570 - Machine Learning 20

Typically, the hypothesis space H does not contain the target function f Weighted combinations of several approximations may represent classifiers outside of H Decision surfaces defined by learned decision trees. Decision surface defined by vote over Learned decision trees. CptS 570 - Machine Learning 21

$1M to team improving NetFlix s movie recommender by 10% Won by team BellKor s Pragmatic Chaos which combined classifiers from 3 teams Bellkor, Big Chaos, Pragmatic Theory Second place The Ensemble combined classifiers from 23 other teams Solutions effectively ensembles of over 800 classifiers www.netflixprize.com CptS 570 - Machine Learning 22

Toscher et al. The BigChaos Solution to the Netflix Grand Prize, 2009. CptS 570 - Machine Learning 23

Combining learners can overcome weaknesses of individual learners Base learners must do better than random and have uncorrelated errors Ensembles typically maority vote of base classifiers Boosting, stacking Application to recommender systems Netflix Prize CptS 570 - Machine Learning 24