Bootstrapping. Giri Iyengar. April 11, Cornell University Giri Iyengar (Cornell Tech) Bootstrapping April 11, / 21

Similar documents
Lecture 1: Machine Learning Basics

Machine Learning and Data Mining. Ensembles of Learners. Prof. Alexander Ihler

Introduction to Ensemble Learning Featuring Successes in the Netflix Prize Competition

(Sub)Gradient Descent

Python Machine Learning

Assignment 1: Predicting Amazon Review Ratings

Probability and Statistics Curriculum Pacing Guide

GDP Falls as MBA Rises?

STA 225: Introductory Statistics (CT)

Probabilistic Latent Semantic Analysis

CS Machine Learning

MGT/MGP/MGB 261: Investment Analysis

Introduction to Simulation

Module 12. Machine Learning. Version 2 CSE IIT, Kharagpur

Lahore University of Management Sciences. FINN 321 Econometrics Fall Semester 2017

Machine Learning and Development Policy

CSL465/603 - Machine Learning

Algebra 1, Quarter 3, Unit 3.1. Line of Best Fit. Overview

Iterative Cross-Training: An Algorithm for Learning from Unlabeled Web Pages

Universityy. The content of

EDCI 699 Statistics: Content, Process, Application COURSE SYLLABUS: SPRING 2016

Learning From the Past with Experiment Databases

Detailed course syllabus

w o r k i n g p a p e r s

A Game-based Assessment of Children s Choices to Seek Feedback and to Revise

Why Did My Detector Do That?!

Exploration. CS : Deep Reinforcement Learning Sergey Levine

Hierarchical Linear Modeling with Maximum Likelihood, Restricted Maximum Likelihood, and Fully Bayesian Estimation

Instructor: Mario D. Garrett, Ph.D. Phone: Office: Hepner Hall (HH) 100

Semi-Supervised Face Detection

A survey of multi-view machine learning

Probability estimates in a scenario tree

JONATHAN H. WRIGHT Department of Economics, Johns Hopkins University, 3400 N. Charles St., Baltimore MD (410)

The Good Judgment Project: A large scale test of different methods of combining expert predictions

Lecture 1: Basic Concepts of Machine Learning

Evolutive Neural Net Fuzzy Filtering: Basic Description

On-the-Fly Customization of Automated Essay Scoring

STT 231 Test 1. Fill in the Letter of Your Choice to Each Question in the Scantron. Each question is worth 2 point.

Learning Optimal Dialogue Strategies: A Case Study of a Spoken Dialogue Agent for

Robot Learning Simultaneously a Task and How to Interpret Human Instructions

PIRLS. International Achievement in the Processes of Reading Comprehension Results from PIRLS 2001 in 35 Countries

Dynamic Tournament Design: An Application to Prediction Contests

Testing A Moving Target: How Do We Test Machine Learning Systems? Peter Varhol Technology Strategy Research, USA

Active Learning. Yingyu Liang Computer Sciences 760 Fall

Artificial Neural Networks written examination

Software Maintenance

Learning to Rank with Selection Bias in Personal Search

Business Analytics and Information Tech COURSE NUMBER: 33:136:494 COURSE TITLE: Data Mining and Business Intelligence

Using the Attribute Hierarchy Method to Make Diagnostic Inferences about Examinees Cognitive Skills in Algebra on the SAT

Mathematics. Mathematics

End-of-Module Assessment Task

Chinese Language Parsing with Maximum-Entropy-Inspired Parser

Purdue Data Summit Communication of Big Data Analytics. New SAT Predictive Validity Case Study

*Net Perceptions, Inc West 78th Street Suite 300 Minneapolis, MN

Understanding and Interpreting the NRC s Data-Based Assessment of Research-Doctorate Programs in the United States (2010)

CS177 Python Programming

Learning Structural Correspondences Across Different Linguistic Domains with Synchronous Neural Language Models

Certified Six Sigma Professionals International Certification Courses in Six Sigma Green Belt

Softprop: Softmax Neural Network Backpropagation Learning

Mathematics subject curriculum

PHD COURSE INTERMEDIATE STATISTICS USING SPSS, 2018

A Bootstrapping Model of Frequency and Context Effects in Word Learning

Speech Segmentation Using Probabilistic Phonetic Feature Hierarchy and Support Vector Machines

Version Space. Term 2012/2013 LSI - FIB. Javier Béjar cbea (LSI - FIB) Version Space Term 2012/ / 18

Redirected Inbound Call Sampling An Example of Fit for Purpose Non-probability Sample Design

Introduction and Motivation

Statewide Framework Document for:

Switchboard Language Model Improvement with Conversational Data from Gigaword

Greedy Decoding for Statistical Machine Translation in Almost Linear Time

PROFESSIONAL TREATMENT OF TEACHERS AND STUDENT ACADEMIC ACHIEVEMENT. James B. Chapman. Dissertation submitted to the Faculty of the Virginia

Alpha provides an overall measure of the internal reliability of the test. The Coefficient Alphas for the STEP are:

NIH Public Access Author Manuscript J Prim Prev. Author manuscript; available in PMC 2009 December 14.

Earnings Functions and Rates of Return

Individual Differences & Item Effects: How to test them, & how to test them well

VIEW: An Assessment of Problem Solving Style

Constructive Induction-based Learning Agents: An Architecture and Preliminary Experiments

Learning Methods for Fuzzy Systems

SARDNET: A Self-Organizing Feature Map for Sequences

Mathematics process categories

Monitoring Metacognitive abilities in children: A comparison of children between the ages of 5 to 7 years and 8 to 11 years

Web as Corpus. Corpus Linguistics. Web as Corpus 1 / 1. Corpus Linguistics. Web as Corpus. web.pl 3 / 1. Sketch Engine. Corpus Linguistics

OCR for Arabic using SIFT Descriptors With Online Failure Prediction

Design Of An Automatic Speaker Recognition System Using MFCC, Vector Quantization And LBG Algorithm

State University of New York at Buffalo INTRODUCTION TO STATISTICS PSC 408 Fall 2015 M,W,F 1-1:50 NSC 210

Language Acquisition Fall 2010/Winter Lexical Categories. Afra Alishahi, Heiner Drenhaus

Speech Emotion Recognition Using Support Vector Machine

Language properties and Grammar of Parallel and Series Parallel Languages

A study of speaker adaptation for DNN-based speech synthesis

WHEN THERE IS A mismatch between the acoustic

A Decision Tree Analysis of the Transfer Student Emma Gunu, MS Research Analyst Robert M Roe, PhD Executive Director of Institutional Research and

Evaluation of Teach For America:

Math-U-See Correlation with the Common Core State Standards for Mathematical Content for Third Grade

An Introduction to Simulation Optimization

Evaluation of ecodriving performances and teaching method: comparing training and simple advice

IT Students Workshop within Strategic Partnership of Leibniz University and Peter the Great St. Petersburg Polytechnic University

Multiple regression as a practical tool for teacher preparation program evaluation

Do Graduate Student Teacher Training Courses Affect Placement Rates?

2/15/13. POS Tagging Problem. Part-of-Speech Tagging. Example English Part-of-Speech Tagsets. More Details of the Problem. Typical Problem Cases

Admitting Students to Selective Education Programs: Merit, Profiling, and Affirmative Action

Transcription:

Bootstrapping Giri Iyengar Cornell University gi43@cornell.edu April 11, 2018 Giri Iyengar (Cornell Tech) Bootstrapping April 11, 2018 1 / 21

Overview 1 Bias-Variance trade-off and Cross Validation 2 Bootstrapping 3 Little Bag of Bootstraps Giri Iyengar (Cornell Tech) Bootstrapping April 11, 2018 2 / 21

Overview 1 Bias-Variance trade-off and Cross Validation 2 Bootstrapping 3 Little Bag of Bootstraps Giri Iyengar (Cornell Tech) Bootstrapping April 11, 2018 3 / 21

Bias-Variance Trade-off Figure: Image showing Bias Variance Trade-off - courtesy Quora Giri Iyengar (Cornell Tech) Bootstrapping April 11, 2018 4 / 21

Bias-Variance Trade-off In machine learning, we are trying to learn y = f(x) + ɛ In addition to intrinsic noise, ɛ, the models have their own sources of error Bias: The tendency of the algorithm to be consistently incorrect Variance: The algorithm s tendency to fit to the noise in the data in addition to the signal Models with high bias tend to underfit. E.g. represent a linear relationship with the mean Models with high variance tend to overfit E.g. represent a linear relationship with a higher-order polynomial Giri Iyengar (Cornell Tech) Bootstrapping April 11, 2018 5 / 21

Bias-Variance Trade-off: Mathematical definition Bias Variance Mathematical definition We can represent the error of a model as Err(x) = E[(y ˆf(x)) 2 ]. Decompose this down as (E[ ˆf(x)] f(x)) 2 + E[( ˆf(x) E[ ˆf(x)]) 2 ] + ɛ 2. In other words, Err(x) = Bias 2 + V ariance + Noise. Given infinite data, we can construct models that drive both bias and variance down to zero. However, we live in an imperfect world with finite data, noisy measurement tools, and finite resources. Typically there is a trade-off between Bias and Variance and we try to find the best balance between the two. Giri Iyengar (Cornell Tech) Bootstrapping April 11, 2018 6 / 21

Bias Variance Trade-off example Figure: Bias and Variance vs Model Complexity Giri Iyengar (Cornell Tech) Bootstrapping April 11, 2018 7 / 21

Overview 1 Bias-Variance trade-off and Cross Validation 2 Bootstrapping 3 Little Bag of Bootstraps Giri Iyengar (Cornell Tech) Bootstrapping April 11, 2018 8 / 21

Some ways to understand and work with bias-variance trade-off Cross-validation Bootstrapping Little Bag of Bootstraps Figure: Cross Validation of a Model Giri Iyengar (Cornell Tech) Bootstrapping April 11, 2018 9 / 21

Cross-validation Doesn t use the entire training set Typically the error is biased upwards Variance estimates of Θ is not strictly correct (K splits are not independent) Giri Iyengar (Cornell Tech) Bootstrapping April 11, 2018 10 / 21

Cross-validation mistakes Cross-validation Consider a simple classifier applied to some two-class data: 1 Starting with 5000 predictors and 50 samples, find the 100 predictors having the largest correlation with the class labels. 2 We then apply a classifier such as logistic regression, using only these 100 predictors. 3 How do we estimate the test set performance of this classifier? Can we apply cross-validation in step 2, forgetting about step 1? Giri Iyengar (Cornell Tech) Bootstrapping April 11, 2018 11 / 21

Cross-validation mistakes Wrong: Running CV only on Step 2 Right: Running CV on both Step 1 and 2 Giri Iyengar (Cornell Tech) Bootstrapping April 11, 2018 12 / 21

Bootstrap Powerful technique for estimating Bias and Variance Very simple, applies to most situations (e.g. except for Power Law/non-finite variance) Make inference about the population from a single sample Approximate population distribution by empirical distribution Figure: The Bootstrap Method Giri Iyengar (Cornell Tech) Bootstrapping April 11, 2018 13 / 21

The Bootstrap The bootstrap is a flexible and powerful statistical tool that can be used to quantify the uncertainty associated with a given estimator or statistical learning method. For example, it can provide an estimate of the standard error of a coefficient, or a confidence interval for that coefficient. Giri Iyengar (Cornell Tech) Bootstrapping April 11, 2018 14 / 21

The Bootstrap If we had several independent samples of the population, we could compute independent estimates of parameters Giri Iyengar (Cornell Tech) Bootstrapping April 11, 2018 15 / 21

The Bootstrap If we had several independent samples of the population, we could compute independent estimates of parameters In practice, gathering data is hard and expensive Giri Iyengar (Cornell Tech) Bootstrapping April 11, 2018 15 / 21

The Bootstrap If we had several independent samples of the population, we could compute independent estimates of parameters In practice, gathering data is hard and expensive Bootstrap starts with the assumption that empirical distribution of a single sample closely resembles the population Giri Iyengar (Cornell Tech) Bootstrapping April 11, 2018 15 / 21

The Bootstrap If we had several independent samples of the population, we could compute independent estimates of parameters In practice, gathering data is hard and expensive Bootstrap starts with the assumption that empirical distribution of a single sample closely resembles the population Sample with replacement from the original data set and derive as many copies of the data as you want Giri Iyengar (Cornell Tech) Bootstrapping April 11, 2018 15 / 21

The Bootstrap If we had several independent samples of the population, we could compute independent estimates of parameters In practice, gathering data is hard and expensive Bootstrap starts with the assumption that empirical distribution of a single sample closely resembles the population Sample with replacement from the original data set and derive as many copies of the data as you want Estimate coefficients on each sample independently and use that to derive variance, and standard error estimates Giri Iyengar (Cornell Tech) Bootstrapping April 11, 2018 15 / 21

Overview 1 Bias-Variance trade-off and Cross Validation 2 Bootstrapping 3 Little Bag of Bootstraps Giri Iyengar (Cornell Tech) Bootstrapping April 11, 2018 16 / 21

Little Bag of Bootstraps Approximately 63.2% samples covered in each bootstrap round When N, the sample size, is large this can be a severe limitation 1TB of data 632GB per bootstrap round You need to perform several rounds to get good estimates Difficult to parallelize when you have to move that much data around Giri Iyengar (Cornell Tech) Bootstrapping April 11, 2018 17 / 21

Little Bag of Bootstraps Figure: The Little Bag of Bootstraps Method Giri Iyengar (Cornell Tech) Bootstrapping April 11, 2018 18 / 21

Little Bag of Bootstraps From your sample of N data points, create s samples (without replacement) of size N 0.6 On each of these s samples, run r bootstrap iterations In the inner bootstraps (r iterations), data is sampled with replacement and resampled back to size N Take average of averages and return that as your estimate. Also return confidence intervals Giri Iyengar (Cornell Tech) Bootstrapping April 11, 2018 19 / 21

Little Bag of Bootstraps Figure: Little Bag of Bootstraps Performance Giri Iyengar (Cornell Tech) Bootstrapping April 11, 2018 20 / 21

Giri Iyengar (Cornell Tech) Bootstrapping April 11, 2018 21 / 21