Introduction to Ensemble Learning Featuring Successes in the Netflix Prize Competition

Similar documents
Machine Learning and Data Mining. Ensembles of Learners. Prof. Alexander Ihler

Python Machine Learning

CS Machine Learning

Lecture 1: Machine Learning Basics

Learning From the Past with Experiment Databases

Universidade do Minho Escola de Engenharia

Module 12. Machine Learning. Version 2 CSE IIT, Kharagpur

(Sub)Gradient Descent

Active Learning. Yingyu Liang Computer Sciences 760 Fall

CSL465/603 - Machine Learning

Assignment 1: Predicting Amazon Review Ratings

Twitter Sentiment Classification on Sanders Data using Hybrid Approach

Algebra 1, Quarter 3, Unit 3.1. Line of Best Fit. Overview

Lecture 1: Basic Concepts of Machine Learning

Softprop: Softmax Neural Network Backpropagation Learning

Generative models and adversarial training

Probabilistic Latent Semantic Analysis

Probability and Statistics Curriculum Pacing Guide

Course Outline. Course Grading. Where to go for help. Academic Integrity. EE-589 Introduction to Neural Networks NN 1 EE

Analysis of Hybrid Soft and Hard Computing Techniques for Forex Monitoring Systems

Impact of Cluster Validity Measures on Performance of Hybrid Models Based on K-means and Decision Trees

Chapter 2 Rule Learning in a Nutshell

Applications of data mining algorithms to analysis of medical data

Word Segmentation of Off-line Handwritten Documents

Rule Learning With Negation: Issues Regarding Effectiveness

Experiment Databases: Towards an Improved Experimental Methodology in Machine Learning

The Boosting Approach to Machine Learning An Overview

OPTIMIZATINON OF TRAINING SETS FOR HEBBIAN-LEARNING- BASED CLASSIFIERS

A Case Study: News Classification Based on Term Frequency

Truth Inference in Crowdsourcing: Is the Problem Solved?

QuickStroke: An Incremental On-line Chinese Handwriting Recognition System

An Empirical Comparison of Supervised Ensemble Learning Approaches

Reducing Features to Improve Bug Prediction

Mining Association Rules in Student s Assessment Data

The Good Judgment Project: A large scale test of different methods of combining expert predictions

The lab is designed to remind you how to work with scientific data (including dealing with uncertainty) and to review experimental design.

Algebra 2- Semester 2 Review

STAT 220 Midterm Exam, Friday, Feb. 24

Abstractions and the Brain

System Implementation for SemEval-2017 Task 4 Subtask A Based on Interpolated Deep Neural Networks

Rule Learning with Negation: Issues Regarding Effectiveness

The 9 th International Scientific Conference elearning and software for Education Bucharest, April 25-26, / X

Linking Task: Identifying authors and book titles in verbose queries

Team Formation for Generalized Tasks in Expertise Social Networks

FRAMEWORK FOR IDENTIFYING THE MOST LIKELY SUCCESSFUL UNDERPRIVILEGED TERTIARY STUDY BURSARY APPLICANTS

Multi-Lingual Text Leveling

School Competition and Efficiency with Publicly Funded Catholic Schools David Card, Martin D. Dooley, and A. Abigail Payne

A Neural Network GUI Tested on Text-To-Phoneme Mapping

Learning Distributed Linguistic Classes

Switchboard Language Model Improvement with Conversational Data from Gigaword

Individual Differences & Item Effects: How to test them, & how to test them well

A Decision Tree Analysis of the Transfer Student Emma Gunu, MS Research Analyst Robert M Roe, PhD Executive Director of Institutional Research and

STT 231 Test 1. Fill in the Letter of Your Choice to Each Question in the Scantron. Each question is worth 2 point.

Multivariate k-nearest Neighbor Regression for Time Series data -

Unit: Human Impact Differentiated (Tiered) Task How Does Human Activity Impact Soil Erosion?

have to be modeled) or isolated words. Output of the system is a grapheme-tophoneme conversion system which takes as its input the spelling of words,

arxiv: v1 [cs.lg] 15 Jun 2015

Lahore University of Management Sciences. FINN 321 Econometrics Fall Semester 2017

Model Ensemble for Click Prediction in Bing Search Ads

IT Students Workshop within Strategic Partnership of Leibniz University and Peter the Great St. Petersburg Polytechnic University

Notes on The Sciences of the Artificial Adapted from a shorter document written for course (Deciding What to Design) 1

Using focal point learning to improve human machine tacit coordination

Speech Emotion Recognition Using Support Vector Machine

SARDNET: A Self-Organizing Feature Map for Sequences

Purdue Data Summit Communication of Big Data Analytics. New SAT Predictive Validity Case Study

COMPUTER-ASSISTED INDEPENDENT STUDY IN MULTIVARIATE CALCULUS

Data Stream Processing and Analytics

Running head: DELAY AND PROSPECTIVE MEMORY 1

Cooperative evolutive concept learning: an empirical study

Australian Journal of Basic and Applied Sciences

Lecture 15: Test Procedure in Engineering Design

A Bootstrapping Model of Frequency and Context Effects in Word Learning

Version Space. Term 2012/2013 LSI - FIB. Javier Béjar cbea (LSI - FIB) Version Space Term 2012/ / 18

Shockwheat. Statistics 1, Activity 1

MYCIN. The MYCIN Task

Using EEG to Improve Massive Open Online Courses Feedback Interaction

Using dialogue context to improve parsing performance in dialogue systems

An Empirical and Computational Test of Linguistic Relativity

An OO Framework for building Intelligence and Learning properties in Software Agents

Machine Learning from Garden Path Sentences: The Application of Computational Linguistics

IS FINANCIAL LITERACY IMPROVED BY PARTICIPATING IN A STOCK MARKET GAME?

Defragmenting Textual Data by Leveraging the Syntactic Structure of the English Language

Optimizing to Arbitrary NLP Metrics using Ensemble Selection

Artificial Neural Networks written examination

Statistical Analysis of Climate Change, Renewable Energies, and Sustainability An Independent Investigation for Introduction to Statistics

Combining Proactive and Reactive Predictions for Data Streams

Calibration of Confidence Measures in Speech Recognition

Learning Methods for Fuzzy Systems

Mandarin Lexical Tone Recognition: The Gating Paradigm

Why Did My Detector Do That?!

The Value of Visualization

What Different Kinds of Stratification Can Reveal about the Generalizability of Data-Mined Skill Assessment Models

Evaluating and Comparing Classifiers: Review, Some Recommendations and Limitations

Handling Concept Drifts Using Dynamic Selection of Classifiers

*Net Perceptions, Inc West 78th Street Suite 300 Minneapolis, MN

A Version Space Approach to Learning Context-free Grammars

Business Analytics and Information Tech COURSE NUMBER: 33:136:494 COURSE TITLE: Data Mining and Business Intelligence

Constructive Induction-based Learning Agents: An Architecture and Preliminary Experiments

Web as Corpus. Corpus Linguistics. Web as Corpus 1 / 1. Corpus Linguistics. Web as Corpus. web.pl 3 / 1. Sketch Engine. Corpus Linguistics

Universityy. The content of

Transcription:

Introduction to Ensemble Learning Featuring Successes in the Netflix Prize Competition Todd Holloway Two Lecture Series for B551 November 20 & 27, 2007 Indiana University

Outline Introduction Bias and variance problems The Netflix Prize Success of ensemble methods in the Netflix Prize Why Ensemble Methods Work Algorithms AdaBoost BrownBoost Random forests

1-Slide Intro to Supervised Learning We want to approximate a function, Given examples, Find a function h among a fixed subclass of functions for which the error E(h) is minimal, Independent of h The distance from of f Variance of the predictions

Bias and Variance Bias Problem The hypothesis space made available by a particular classification method does not include sufficient hypotheses Variance Problem The hypothesis space made available is too large for the training data, and the selected hypothesis may not be accurate on unseen data

Bias and Variance Decision Trees Small trees have high bias. Large trees have high variance. Why? from Elder, John. From Trees to Forests and Rule Sets - A Unified Overview of Ensemble Methods. 2007.

Definition Ensemble Classification Aggregation of predictions of multiple classifiers with the goal of improving accuracy.

Teaser: How good are ensemble methods? Let s look at the Netflix Prize Competition

Began October 2006 Supervised learning task Training data is a set of users and ratings (1,2,3,4,5 stars) those users have given to movies. Construct a classifier that given a user and an unrated movie, correctly classifies that movie as either 1, 2, 3, 4, or 5 stars $1 million prize for a 10% improvement over Netflix s current movie recommender/classifier (MSE = 0.9514)

Just three weeks after it began, at least 40 teams had bested the Netflix classifier. Top teams showed about 5% improvement.

However, improvement slowed from http://www.research.att.com/~volinsky/netflix/

Today, the top team has posted a 8.5% improvement. Ensemble methods are the best performers

Rookies Thanks to Paul Harrison's collaboration, a simple mix of our solutions improved our result from 6.31 to 6.75

Arek Paterek My approach is to combine the results of many methods (also twoway interactions between them) using linear regression on the test set. The best method in my ensemble is regularized SVD with biases, post processed with kernel ridge regression http://rainbow.mimuw.edu.pl/~ap/ap_kdd.pdf

U of Toronto When the predictions of multiple RBM models and multiple SVD models are linearly combined, we achieve an error rate that is well over 6% better than the score of Netflix s own system. http://www.cs.toronto.edu/~rsalakhu/papers/rbmcf.pdf

Gravity home.mit.bme.hu/~gtakacs/download/gravity.pdf

When Gravity and Dinosaurs Unite Our common team blends the result of team Gravity and team Dinosaur Planet. Might have guessed from the name

BellKor / KorBell And, yes, the top team which is from AT&T Our final solution (RMSE=0.8712) consists of blending 107 individual results.

Some Intuitions on Why Ensemble Methods Work

Intuitions Utility of combining diverse, independent opinions in human decision-making Protective Mechanism (e.g. stock portfolio diversity) Violation of Ockham s Razor Identifying the best model requires identifying the proper "model complexity" See Domingos, P. Occam s two razors: the sharp and the blunt. KDD. 1998.

Intuitions Majority vote Suppose we have 5 completely independent classifiers If accuracy is 70% for each 10 (.7^3)(.3^2)+5(.7^4)(.3)+(.7^5) 83.7% majority vote accuracy 101 such classifiers 99.9% majority vote accuracy

Strategies Boosting Make examples currently misclassified more important (or less, in some cases) Bagging Use different samples or attributes of the examples to generate diverse classifiers

Boosting Make examples currently misclassified more important (or less, if lots of noise). Then combine the hypotheses given Types AdaBoost BrownBoost

AdaBoost Algorithm 1. Initialize Weights 2. Construct a classifier. Compute the error. 3. Update the weights, and repeat step 2. 4. Finally, sum hypotheses

Classifications (colors) and Weights (size) after 1 iteration Of AdaBoost 3 iterations 20 iterations from Elder, John. From Trees to Forests and Rule Sets - A Unified Overview of Ensemble Methods. 2007.

AdaBoost Advantages Very little code Reduces variance Disadvantages Sensitive to noise and outliers. Why?

BrownBoost Reduce the weight given to misclassified example Good (only) for very noisy data.

Bagging (Constructing for Diversity) 1. Use random samples of the examples to construct the classifiers 2. Use random attribute sets to construct the classifiers Random Decision Forests Leo Breiman

Random forests At every level, choose a random subset of the attributes (not examples) and choose the best split among those attributes Doesn t overfit

Random forests Let the number of training cases be M, and the number of variables in the classifier be N. For each tree, 1. Choose a training set by choosing N times with replacement from all N available training cases. 2. For each node, randomly choose n variables on which to base the decision at that node. Calculate the best split based on these.

Breiman, Leo (2001). "Random Forests". Machine Learning 45 (1), 5-32

Questions / Comments?

Sources David Mease. Statistical Aspects of Data Mining. Lecture. http://video.google.com/videoplay?docid=- 4669216290304603251&q=stats+202+engEDU&total=13&start=0&num=10&so=0&type=search&plindex=8 Dietterich, T. G. Ensemble Learning. In The Handbook of Brain Theory and Neural Networks, Second edition, (M.A. Arbib, Ed.), Cambridge, MA: The MIT Press, 2002. http://www.cs.orst.edu/~tgd/publications/hbtnn-ensemble-learning.ps.gz Elder, John and Seni Giovanni. From Trees to Forests and Rule Sets - A Unified Overview of Ensemble Methods. KDD 2007 http://tutorial. videolectures.net/kdd07_elder_ftfr/ Netflix Prize. http://www.netflixprize.com/ Christopher M. Bishop. Neural Networks for Pattern Recognition. Oxford University Press. 1995.