Introduction to Machine Learning

Similar documents
Introduction to Ensemble Learning Featuring Successes in the Netflix Prize Competition

Lecture 1: Machine Learning Basics

Python Machine Learning

Machine Learning and Data Mining. Ensembles of Learners. Prof. Alexander Ihler

Learning From the Past with Experiment Databases

CS Machine Learning

STA 225: Introductory Statistics (CT)

Probability and Statistics Curriculum Pacing Guide

(Sub)Gradient Descent

Assignment 1: Predicting Amazon Review Ratings

Module 12. Machine Learning. Version 2 CSE IIT, Kharagpur

Twitter Sentiment Classification on Sanders Data using Hybrid Approach

Probabilistic Latent Semantic Analysis

ADVANCED MACHINE LEARNING WITH PYTHON BY JOHN HEARTY DOWNLOAD EBOOK : ADVANCED MACHINE LEARNING WITH PYTHON BY JOHN HEARTY PDF

Experiment Databases: Towards an Improved Experimental Methodology in Machine Learning

Universityy. The content of

Iterative Cross-Training: An Algorithm for Learning from Unlabeled Web Pages

Softprop: Softmax Neural Network Backpropagation Learning

Universidade do Minho Escola de Engenharia

FRAMEWORK FOR IDENTIFYING THE MOST LIKELY SUCCESSFUL UNDERPRIVILEGED TERTIARY STUDY BURSARY APPLICANTS

Hierarchical Linear Modeling with Maximum Likelihood, Restricted Maximum Likelihood, and Fully Bayesian Estimation

Business Analytics and Information Tech COURSE NUMBER: 33:136:494 COURSE TITLE: Data Mining and Business Intelligence

Rule Learning With Negation: Issues Regarding Effectiveness

Learning goal-oriented strategies in problem solving

Rule Learning with Negation: Issues Regarding Effectiveness

Impact of Cluster Validity Measures on Performance of Hybrid Models Based on K-means and Decision Trees

Multi-label classification via multi-target regression on data streams

CSL465/603 - Machine Learning

Algebra 1, Quarter 3, Unit 3.1. Line of Best Fit. Overview

Australian Journal of Basic and Applied Sciences

Analysis of Enzyme Kinetic Data

An Empirical Comparison of Supervised Ensemble Learning Approaches

Semi-Supervised Face Detection

Truth Inference in Crowdsourcing: Is the Problem Solved?

Conference Presentation

Optimizing to Arbitrary NLP Metrics using Ensemble Selection

What is a Mental Model?

Mining Association Rules in Student s Assessment Data

Switchboard Language Model Improvement with Conversational Data from Gigaword

*Net Perceptions, Inc West 78th Street Suite 300 Minneapolis, MN

Maximizing Learning Through Course Alignment and Experience with Different Types of Knowledge

A survey of multi-view machine learning

Applications of data mining algorithms to analysis of medical data

Generative models and adversarial training

Machine Learning and Development Policy

Research Design & Analysis Made Easy! Brainstorming Worksheet

Software Maintenance

Multi-label Classification via Multi-target Regression on Data Streams

NIH Public Access Author Manuscript J Prim Prev. Author manuscript; available in PMC 2009 December 14.

Model Ensemble for Click Prediction in Bing Search Ads

MULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question.

arxiv: v1 [cs.lg] 15 Jun 2015

SARDNET: A Self-Organizing Feature Map for Sequences

Introduction to Simulation

A Bayesian Learning Approach to Concept-Based Document Classification

Chapters 1-5 Cumulative Assessment AP Statistics November 2008 Gillespie, Block 4

OCR for Arabic using SIFT Descriptors With Online Failure Prediction

CSC200: Lecture 4. Allan Borodin

A Decision Tree Analysis of the Transfer Student Emma Gunu, MS Research Analyst Robert M Roe, PhD Executive Director of Institutional Research and

Discovering Statistics

Robot Learning Simultaneously a Task and How to Interpret Human Instructions

Statistical Analysis of Climate Change, Renewable Energies, and Sustainability An Independent Investigation for Introduction to Statistics

STT 231 Test 1. Fill in the Letter of Your Choice to Each Question in the Scantron. Each question is worth 2 point.

Speech Recognition at ICSI: Broadcast News and beyond

CS 446: Machine Learning

System Implementation for SemEval-2017 Task 4 Subtask A Based on Interpolated Deep Neural Networks

A Program Evaluation of Connecticut Project Learning Tree Educator Workshops

Assessing System Agreement and Instance Difficulty in the Lexical Sample Tasks of SENSEVAL-2

Understanding and Interpreting the NRC s Data-Based Assessment of Research-Doctorate Programs in the United States (2010)

Instructor: Mario D. Garrett, Ph.D. Phone: Office: Hepner Hall (HH) 100

Chapter 2 Rule Learning in a Nutshell

Reducing Features to Improve Bug Prediction

Lahore University of Management Sciences. FINN 321 Econometrics Fall Semester 2017

Lecture 10: Reinforcement Learning

Purdue Data Summit Communication of Big Data Analytics. New SAT Predictive Validity Case Study

QuickStroke: An Incremental On-line Chinese Handwriting Recognition System

Activity Recognition from Accelerometer Data

Data Integration through Clustering and Finding Statistical Relations - Validation of Approach

Learning Methods in Multilingual Speech Recognition

Conceptual and Procedural Knowledge of a Mathematics Problem: Their Measurement and Their Causal Interrelations

Mathematics (JUN14MS0401) General Certificate of Education Advanced Level Examination June Unit Statistics TOTAL.

Active Learning. Yingyu Liang Computer Sciences 760 Fall

Certified Six Sigma Professionals International Certification Courses in Six Sigma Green Belt

Evidence for Reliability, Validity and Learning Effectiveness

Title:A Flexible Simulation Platform to Quantify and Manage Emergency Department Crowding

stateorvalue to each variable in a given set. We use p(x = xjy = y) (or p(xjy) as a shorthand) to denote the probability that X = x given Y = y. We al

Lecture 1: Basic Concepts of Machine Learning

Learning Distributed Linguistic Classes

Gender and socioeconomic differences in science achievement in Australia: From SISS to TIMSS

Ricopili: Postimputation Module. WCPG Education Day Stephan Ripke / Raymond Walters Toronto, October 2015

learning collegiate assessment]

An Empirical Analysis of the Effects of Mexican American Studies Participation on Student Achievement within Tucson Unified School District

Why Did My Detector Do That?!

The Value of Visualization

Linking Task: Identifying authors and book titles in verbose queries

Statewide Framework Document for:

Cooperative evolutive concept learning: an empirical study

Evolutive Neural Net Fuzzy Filtering: Basic Description

The 9 th International Scientific Conference elearning and software for Education Bucharest, April 25-26, / X

Transcription:

1, 582631 5 credits Introduction to Machine Learning Lecturer: Teemu Roos Assistant: Ville Hyvönen Department of Computer Science University of Helsinki (based in part on material by Patrik Hoyer and Jyrki Kivinen) November 1st December 16th 2016

Lecture 13: Resampling and Ensemble Methods December 16, 2016 2,

3, Performance and Generalisation A fundamental issue in machine learning is that we build models based on training data, but really care about performance on new unseen test data Generalisation refers to the learned model s ability to work well also on unseen data good generalisation: what we learned from training data also applies to test data poor generalisation: what seemed to work well on training data is not so good on test data

Resampling: Not Only Supervised Learning So far, we ve considered supervised learning: learning to predict Y given X resampling (cross-validation) can be used to obtain a number of train test splits averaging reduces variance of the test error estimate However, we can apply the same ideas for estimating any parameter (accuracy, coefficient, probability) For example: estimate the variance of the least squares estimate of a regression coefficient (see the Lab in the textbook, pp. 195 197) obtain confidence interval of the median of a variable (see the additional material on bootstrap confidence intervals on the course homepage) combine different estimates, such as predictions or even hierarchical clustering solutions, etc. 4,

5, Performance and Generalisation (2) Important to notice: The test error rate (on a test/validation set that is separate from the training set) is a valid estimator of the error rate The purpose of cross-validation is just to obtain multiple train test splits Hence, resampling is not necessary to estimate performance it simply helps to improve estimation accuracy!

6, Cross-validation Recall that cross-validation gives us K (e.g., K = 10) train test splits 1. Divide the data into K equal-sized subsets: 1 2 3 4 5 2. For j goes from 1 to K: available data 2.1 Train the model(s) using all data except that of subset j 2.2 Compute the resulting validation error on the subset j 3. Average the K results When K = N (i.e. each datapoint is a separate subset) this is known as leave-one-out cross-validation.

Bootstrap Another popular resampling method is bootstrap The idea is to reuse the training set to obtain multiple data sets Not restricted to supervised learning (hence training ) These bootstrap samples can be used to estimate the variability of an estimate of parameter θ Bootstrap: 1. Let D = (x 1,..., x n ) be the actual data 2. Repeat j = 1,..., K times: 2.1 Create D j by drawing n objects from D with replacement 2.2 Obtain estimate ˆθ j from D j 3. Use the bootstrap estimates ˆθ 1,..., ˆθ K to estimate variability 7,

8, Bootstrap (2) Let F be the true underlying distribution Denote by F the empirical distribution corresponding to the actual data D For example, if D = (a, a, b, a), then F (a) = 0.75 and F (b) = 0.25 The bootstrap samples D j are drawn from F The bootstrap principle (assumption): The empirical distribution F is a good approximation of the true distribution F The bootstrap distribution of the estimator ˆθ is a good approximation of the sampling distribution of ˆθ The bootstrap principle implies that we can treat D 1,..., D K as K replicates from the same distribution as D

9, Bootstrap (3) Example (p. 189 in the textbook): Source: (James et al., 2013) Left: Histogram of estimates from 1000 simulated data sets (from F ) Right: Histogram of estimates from 1000 bootstrap samples (from F )

10, Ensemble Method for Supervised Learning Having several training samples, D 1,..., D K, would clearly be nice also for supervised learning We can combine the learned models ˆf 1,..., ˆf K into an aggregate model ˆf agg The aggregate model will have lower variance than the individual models If a learning method has high variance (but low bias), then ˆf agg may be a very good model Bagging = bootstrap aggregation: ˆf j = ˆf j bootstrap (see textbook Sec. 8.2.1) obtained from

11, Bagging 1. Bootstrap to obtain D 1,..., D K 2. Learn models (classifiers or regression models) ˆf 1,..., ˆf K from the bootstrap samples for example, unpruned regression/decision trees (high variance, low bias) 3a. For regression, combine by averaging: ˆf bag (x) = 1 K K j=1 ˆf j (x) 3b. For classification, combine by voting: ˆf bag (x) = majority (ˆf 1 (x),..., ˆf K (x) )

12, Random Forests For decision trees, bagging tends to improve somewhat However, the trees are highly dependent in cases where the splits that maximize the gain are clearly better than the 2nd best splits for example, always split according to X3 first, then X 4, etc. The trees can be forced to use different features by only allowing splits based on a random sample of the features For example, only consider about p of all features at each split: the feature of with maximum gain is usually outside this set Trees constructed in bagging or random forests are usually not pruned since averaging a large number of trees reduces overfitting

13, Stacking and Boosting (not required for the exam) The bagging approach is to combine the base-learners by averaging or voting. We can usually do better by either stacking: the idea in stacking is to apply a meta-level machine learning algorithm to learn how to best combine the base-learners boosting: boosting is an iterative method where new learning problems are constructed based on the errors made by the earlier solutions

Summary of Resampling and Ensemble Methods Resampling methods Boosting and other resampling methods are generic statistical techniques that reuse the data to simulate repeated sampling Bootstrap can be used for various statistical estimation tasks such as obtaining confidence intervals Ensemble Methods In supervised machine learning, ensemble methods build multiple hypotheses from multiple training sets obtained by resampling Examples: cross-validation bagging random forests (a specific variant of bagging for decision trees) stacking, boosting,... 14,