Theoretical Foundations of Active Learning

Similar documents
Active Learning. Yingyu Liang Computer Sciences 760 Fall

Lecture 1: Machine Learning Basics

NEURAL PROCESSING INFORMATION SYSTEMS 2 DAVID S. TOURETZKY ADVANCES IN EDITED BY CARNEGI-E MELLON UNIVERSITY

(Sub)Gradient Descent

Iterative Cross-Training: An Algorithm for Learning from Unlabeled Web Pages

Decision Analysis. Decision-Making Problem. Decision Analysis. Part 1 Decision Analysis and Decision Tables. Decision Analysis, Part 1

Evolutive Neural Net Fuzzy Filtering: Basic Description

Assignment 1: Predicting Amazon Review Ratings

Module 12. Machine Learning. Version 2 CSE IIT, Kharagpur

CS 446: Machine Learning

Exposé for a Master s Thesis

Self Study Report Computer Science

Testing A Moving Target: How Do We Test Machine Learning Systems? Peter Varhol Technology Strategy Research, USA

Python Machine Learning

IT Students Workshop within Strategic Partnership of Leibniz University and Peter the Great St. Petersburg Polytechnic University

Rule Learning With Negation: Issues Regarding Effectiveness

A survey of multi-view machine learning

Lecture 10: Reinforcement Learning

Rule Learning with Negation: Issues Regarding Effectiveness

Machine Learning and Data Mining. Ensembles of Learners. Prof. Alexander Ihler

Semi-supervised methods of text processing, and an application to medical concept extraction. Yacine Jernite Text-as-Data series September 17.

Corrective Feedback and Persistent Learning for Information Extraction

SEMAFOR: Frame Argument Resolution with Log-Linear Models

arxiv: v1 [math.at] 10 Jan 2016

The Strong Minimalist Thesis and Bounded Optimality

Introduction to Ensemble Learning Featuring Successes in the Netflix Prize Competition

Constraining X-Bar: Theta Theory

The Good Judgment Project: A large scale test of different methods of combining expert predictions

Australian Journal of Basic and Applied Sciences

System Implementation for SemEval-2017 Task 4 Subtask A Based on Interpolated Deep Neural Networks

Generative models and adversarial training

CSL465/603 - Machine Learning

Bridging Lexical Gaps between Queries and Questions on Large Online Q&A Collections with Compact Translation Models

A Case Study: News Classification Based on Term Frequency

WE GAVE A LAWYER BASIC MATH SKILLS, AND YOU WON T BELIEVE WHAT HAPPENED NEXT

Role of Pausing in Text-to-Speech Synthesis for Simultaneous Interpretation

Semi-Supervised Face Detection

SYLLABUS. EC 322 Intermediate Macroeconomics Fall 2012

On the Polynomial Degree of Minterm-Cyclic Functions

Chapter 2 Rule Learning in a Nutshell

OCR for Arabic using SIFT Descriptors With Online Failure Prediction

Iowa School District Profiles. Le Mars

Analysis: Evaluation: Knowledge: Comprehension: Synthesis: Application:

Predicting Students Performance with SimStudent: Learning Cognitive Skills from Observation

B. How to write a research paper

A Version Space Approach to Learning Context-free Grammars

Maximizing Learning Through Course Alignment and Experience with Different Types of Knowledge

Course Outline. Course Grading. Where to go for help. Academic Integrity. EE-589 Introduction to Neural Networks NN 1 EE

SARDNET: A Self-Organizing Feature Map for Sequences

INPE São José dos Campos

CONSTRUCTION OF AN ACHIEVEMENT TEST Introduction One of the important duties of a teacher is to observe the student in the classroom, laboratory and

Probabilistic Latent Semantic Analysis

Statewide Framework Document for:

A Genetic Irrational Belief System

Designing a Computer to Play Nim: A Mini-Capstone Project in Digital Design I

A General Class of Noncontext Free Grammars Generating Context Free Languages

Major Milestones, Team Activities, and Individual Deliverables

Machine Learning and Development Policy

CS Machine Learning

Essentials of Ability Testing. Joni Lakin Assistant Professor Educational Foundations, Leadership, and Technology

Lecture 1: Basic Concepts of Machine Learning

Learning Structural Correspondences Across Different Linguistic Domains with Synchronous Neural Language Models

Using dialogue context to improve parsing performance in dialogue systems

Learning Methods for Fuzzy Systems

Probability and Statistics Curriculum Pacing Guide

MENTORING. Tips, Techniques, and Best Practices

Discriminative Learning of Beam-Search Heuristics for Planning

Quantitative analysis with statistics (and ponies) (Some slides, pony-based examples from Blase Ur)

Professor Christina Romer. LECTURE 24 INFLATION AND THE RETURN OF OUTPUT TO POTENTIAL April 20, 2017

arxiv: v1 [cs.lg] 3 May 2013

Predicting Student Attrition in MOOCs using Sentiment Analysis and Neural Networks

1.11 I Know What Do You Know?

Beyond the Pipeline: Discrete Optimization in NLP

Knowledge Elicitation Tool Classification. Janet E. Burge. Artificial Intelligence Research Group. Worcester Polytechnic Institute

Rule-based Expert Systems

OPTIMIZATINON OF TRAINING SETS FOR HEBBIAN-LEARNING- BASED CLASSIFIERS

Multi-label classification via multi-target regression on data streams

Product Feature-based Ratings foropinionsummarization of E-Commerce Feedback Comments

Functional Skills Mathematics Level 2 assessment

Reducing Features to Improve Bug Prediction

arxiv: v2 [cs.cv] 30 Mar 2017

An Empirical and Computational Test of Linguistic Relativity

AUTOMATIC DETECTION OF PROLONGED FRICATIVE PHONEMES WITH THE HIDDEN MARKOV MODELS APPROACH 1. INTRODUCTION

Analysis of Emotion Recognition System through Speech Signal Using KNN & GMM Classifier

Conversation Starters: Using Spatial Context to Initiate Dialogue in First Person Perspective Games

Notes on The Sciences of the Artificial Adapted from a shorter document written for course (Deciding What to Design) 1

Multilingual Sentiment and Subjectivity Analysis

Spoken Language Parsing Using Phrase-Level Grammars and Trainable Classifiers

Decision Making. Unsure about how to decide which sorority to join? Review this presentation to learn more about the mutual selection process!

Launching GO 4 Schools as a whole school approach

Students Understanding of Graphical Vector Addition in One and Two Dimensions

Physics 270: Experimental Physics

Softprop: Softmax Neural Network Backpropagation Learning

The Foundations of Interpersonal Communication

AQUA: An Ontology-Driven Question Answering System

Mathematics process categories

QuickStroke: An Incremental On-line Chinese Handwriting Recognition System

How to Judge the Quality of an Objective Classroom Test

Carter M. Mast. Participants: Peter Mackenzie-Helnwein, Pedro Arduino, and Greg Miller. 6 th MPM Workshop Albuquerque, New Mexico August 9-10, 2010

The Netherlands. Jeroen Huisman. Introduction

Transcription:

Theoretical Foundations of Active Learning Steve Hanneke Machine Learning Department Carnegie Mellon University shanneke@cs.cmu.edu

Passive Learning Learning Algorithm Data Source Expert / Oracle Labeled data points Algorithm outputs a classifier Steve Hanneke 2

Active Learning Learning Algorithm Data Source Request for the label of a data point The label of that point Request for the label of another data point The label of that point Expert / Oracle... Algorithm outputs a classifier Steve Hanneke 3

Active Learning (Sequential Design) Learning Algorithm Data Source Expert / Oracle How many label requests Request for the label of a data point The label of that point are required to learn? Request for the label of another data point The label of that point Label Complexity... Algorithm outputs a classifier e.g., Das04, Das05, DKM05, BBL06, Kaa06, Han07a&b, BBZ07, DHM07, BHW08 Steve Hanneke 4

Active Learning Sometimes Helps An Example: 1-dimensional threshold functions. - + Steve Hanneke 5

Active Learning Sometimes Helps An Example: 1-dimensional threshold functions. Take m unlabeled examples Repeatedly request the label of the median point between -/+ boundaries. Take any threshold consistent with the observed labels. - - - - - - - - - - - + + + + + + + Used only log(m) label requests, but get a classifier consistent with all m examples! Exponential improvement over passive! Steve Hanneke 6

Outline Formal Model Analysis of Uncertainty-based Active Learning Strict Improvements Over Passive Learning Open Problems Steve Hanneke 7

Formal Model Steve Hanneke 8

Formal Model Steve Hanneke 9

CAL A simple idea from Cohn, Atlas & Ladner (1994). Assuming ν=0, produces a perfectly labeled data set, which we can feed into any passive algorithm! So we get a natural fallback guarantee. Can we characterize the label complexity achieved by CAL? Can we generalize it to handle label noise or non-separable data? Steve Hanneke 10

Disagreement Coefficient [Hanneke,07] (for our purposes, take r 0 = ε) DIS(B(f,r)) f Concepts in B(f,r) look like this Steve Hanneke 11

Disagreement Coefficient [Hanneke,07] (for our purposes, take r 0 = ε) Steve Hanneke 12

θ Characterizes CAL s Performance Steve Hanneke 13

What about Noise? Steve Hanneke 14

What about Noise? Steve Hanneke 15

Activized Learning Activizer Meta-algorithm Data Source Expert / Oracle Request for the label of a data point The label of that point Request for the label of another data point The label of that point...... Algorithm outputs a classifier Passive Learning Algorithm (Supervised / Semi-Supervised) Steve Hanneke 16

Activized Learning Activizer Meta-algorithm Data Source Expert / Oracle Request for the label of a data point The label of that point Request for the label of another data point The label of that point...... Algorithm outputs a classifier Passive Learning Algorithm (Supervised / Semi-Supervised) Are there general-purpose activizers that strictly improve the label complexity of any passive algorithm? Steve Hanneke 17

Formal Model Steve Hanneke 18

Uncertainty-based Sampling Doesn t Activize Intervals 0 - + - 1 Steve Hanneke 19

Uncertainty-based Sampling Doesn t Activize Intervals 0 - - - - - - - - Suppose the target labels everything -1 1 Uncertainty-based sampling requests every label. No improvements over passive. Steve Hanneke 20

What s Wrong? (formally) Steve Hanneke 21

How Can We Fix It? Steve Hanneke 22

A Simple Activizer So, which ever of the 2 k classifications can t be realized by V, look at the label of x and take the opposite. Steve Hanneke 23

This Works for Any C! [HLW94] passive algorithm has O(1/ε) sample complexity. Steve Hanneke 24

Dealing with Noise and Misspecification Recall passive gets O(1/ε 2 ) (minimax) Steve Hanneke 25

Open Questions Question: What can we activize with noise? Question: Can we give more detailed bounds on Λ a when θ>>1? Question: Is there a labeled/unlabeled trade-off under arbitrary D XY? Steve Hanneke 26

Thank You Steve Hanneke 27

A Simple Activizer Intervals revisited 0 - - - - - - - - - - - - - - -- - -- - - -- - - - Again, suppose the target labels everything -1 Passive algorithm trained on Ω(n 2 ) samples. Improved label complexity. x 1 1 Steve Hanneke 28

Efficiency? m = # unlabeled examples used by the algorithm. Suppose can test separability of O(n) points in poly(n) time Then SimpleActivizer runs in poly(n)m time (plus the time of the passive algorithm). For most learning problems, can set a poly(n) limit on m in the algorithm without losing our guarantees. Steve Hanneke 29