PAC Learning Introduction to Machine Learning. Matt Gormley Lecture 14 March 5, 2018

Similar documents
Lecture 1: Machine Learning Basics

(Sub)Gradient Descent

CS Machine Learning

Python Machine Learning

Module 12. Machine Learning. Version 2 CSE IIT, Kharagpur

Lecture 1: Basic Concepts of Machine Learning

Active Learning. Yingyu Liang Computer Sciences 760 Fall

CSL465/603 - Machine Learning

Lecture 10: Reinforcement Learning

Introduction to Ensemble Learning Featuring Successes in the Netflix Prize Competition

Artificial Neural Networks written examination

Probabilistic Latent Semantic Analysis

Algebra 1, Quarter 3, Unit 3.1. Line of Best Fit. Overview

A survey of multi-view machine learning

Assignment 1: Predicting Amazon Review Ratings

Semi-Supervised Face Detection

Exploration. CS : Deep Reinforcement Learning Sergey Levine

System Implementation for SemEval-2017 Task 4 Subtask A Based on Interpolated Deep Neural Networks

Lahore University of Management Sciences. FINN 321 Econometrics Fall Semester 2017

arxiv: v2 [cs.cv] 30 Mar 2017

Generative models and adversarial training

Laboratorio di Intelligenza Artificiale e Robotica

Twitter Sentiment Classification on Sanders Data using Hybrid Approach

Probability and Statistics Curriculum Pacing Guide

A study of speaker adaptation for DNN-based speech synthesis

Quantitative analysis with statistics (and ponies) (Some slides, pony-based examples from Blase Ur)

Version Space. Term 2012/2013 LSI - FIB. Javier Béjar cbea (LSI - FIB) Version Space Term 2012/ / 18

Mathematics. Mathematics

ENME 605 Advanced Control Systems, Fall 2015 Department of Mechanical Engineering

Notes on The Sciences of the Artificial Adapted from a shorter document written for course (Deciding What to Design) 1

Evolutive Neural Net Fuzzy Filtering: Basic Description

Learning From the Past with Experiment Databases

MGT/MGP/MGB 261: Investment Analysis

Probability and Game Theory Course Syllabus

Learning Structural Correspondences Across Different Linguistic Domains with Synchronous Neural Language Models

GRADUATE STUDENT HANDBOOK Master of Science Programs in Biostatistics

Rule Learning With Negation: Issues Regarding Effectiveness

Chapter 2 Rule Learning in a Nutshell

Model Ensemble for Click Prediction in Bing Search Ads

Knowledge Transfer in Deep Convolutional Neural Nets

Challenges in Deep Reinforcement Learning. Sergey Levine UC Berkeley

Discriminative Learning of Beam-Search Heuristics for Planning

Maximizing Learning Through Course Alignment and Experience with Different Types of Knowledge

Laboratorio di Intelligenza Artificiale e Robotica

An empirical study of learning speed in backpropagation

Rule Learning with Negation: Issues Regarding Effectiveness

STA 225: Introductory Statistics (CT)

A Version Space Approach to Learning Context-free Grammars

AGS THE GREAT REVIEW GAME FOR PRE-ALGEBRA (CD) CORRELATED TO CALIFORNIA CONTENT STANDARDS

The 9 th International Scientific Conference elearning and software for Education Bucharest, April 25-26, / X

ADVANCED MACHINE LEARNING WITH PYTHON BY JOHN HEARTY DOWNLOAD EBOOK : ADVANCED MACHINE LEARNING WITH PYTHON BY JOHN HEARTY PDF

Mathematics subject curriculum

Softprop: Softmax Neural Network Backpropagation Learning

Human Emotion Recognition From Speech

Purdue Data Summit Communication of Big Data Analytics. New SAT Predictive Validity Case Study

AGENDA LEARNING THEORIES LEARNING THEORIES. Advanced Learning Theories 2/22/2016

12- A whirlwind tour of statistics

Axiom 2013 Team Description Paper

How do adults reason about their opponent? Typologies of players in a turn-taking game

Machine Learning and Data Mining. Ensembles of Learners. Prof. Alexander Ihler

AP Calculus AB. Nevada Academic Standards that are assessable at the local level only.

Essentials of Ability Testing. Joni Lakin Assistant Professor Educational Foundations, Leadership, and Technology

Speech Recognition at ICSI: Broadcast News and beyond

2/15/13. POS Tagging Problem. Part-of-Speech Tagging. Example English Part-of-Speech Tagsets. More Details of the Problem. Typical Problem Cases

Detailed course syllabus

Tun your everyday simulation activity into research

An investigation of imitation learning algorithms for structured prediction

Regret-based Reward Elicitation for Markov Decision Processes

Machine Learning from Garden Path Sentences: The Application of Computational Linguistics

Constructive Induction-based Learning Agents: An Architecture and Preliminary Experiments

The Strong Minimalist Thesis and Bounded Optimality

CAAP. Content Analysis Report. Sample College. Institution Code: 9011 Institution Type: 4-Year Subgroup: none Test Date: Spring 2011

On Human Computer Interaction, HCI. Dr. Saif al Zahir Electrical and Computer Engineering Department UBC

Switchboard Language Model Improvement with Conversational Data from Gigaword

Iterative Cross-Training: An Algorithm for Learning from Unlabeled Web Pages

Instructor: Matthew Wickes Kilgore Office: ES 310

Rule-based Expert Systems

Self Study Report Computer Science

Natural Language Processing: Interpretation, Reasoning and Machine Learning

A Case Study: News Classification Based on Term Frequency

A New Perspective on Combining GMM and DNN Frameworks for Speaker Adaptation

Calibration of Confidence Measures in Speech Recognition

INPE São José dos Campos

Speech Segmentation Using Probabilistic Phonetic Feature Hierarchy and Support Vector Machines

Robot Learning Simultaneously a Task and How to Interpret Human Instructions

IT Students Workshop within Strategic Partnership of Leibniz University and Peter the Great St. Petersburg Polytechnic University

NEURAL PROCESSING INFORMATION SYSTEMS 2 DAVID S. TOURETZKY ADVANCES IN EDITED BY CARNEGI-E MELLON UNIVERSITY

Unsupervised Acoustic Model Training for Simultaneous Lecture Translation in Incremental and Batch Mode

Knowledge Elicitation Tool Classification. Janet E. Burge. Artificial Intelligence Research Group. Worcester Polytechnic Institute

Spring 2014 SYLLABUS Michigan State University STT 430: Probability and Statistics for Engineering

Syllabus ENGR 190 Introductory Calculus (QR)

ISFA2008U_120 A SCHEDULING REINFORCEMENT LEARNING ALGORITHM

Semi-Supervised GMM and DNN Acoustic Model Training with Multi-system Combination and Confidence Re-calibration

WHEN THERE IS A mismatch between the acoustic

Math 96: Intermediate Algebra in Context

Semi-supervised methods of text processing, and an application to medical concept extraction. Yacine Jernite Text-as-Data series September 17.

Missouri Mathematics Grade-Level Expectations

Online Updating of Word Representations for Part-of-Speech Tagging

Multi-label classification via multi-target regression on data streams

Extracting Opinion Expressions and Their Polarities Exploration of Pipelines and Joint Models

Transcription:

10-601 Introduction to Machine Learning Machine Learning Department School of Computer Science Carnegie Mellon University PAC Learning Matt Gormley Lecture 14 March 5, 2018 1

ML Big Picture Learning Paradigms: What data is available and when? What form of prediction? supervised learning unsupervised learning semi-supervised learning reinforcement learning active learning imitation learning domain adaptation online learning density estimation recommender systems feature learning manifold learning dimensionality reduction ensemble learning distant supervision hyperparameter optimization Theoretical Foundations: What principles guide learning? q probabilistic q information theoretic q evolutionary search q ML as optimization Problem Formulation: What is the structure of our output prediction? boolean Binary Classification categorical Multiclass Classification ordinal Ordinal Classification real Regression ordering Ranking multiple discrete Structured Prediction multiple continuous (e.g. dynamical systems) both discrete & (e.g. mixed graphical models) cont. Facets of Building ML Systems: How to build systems that are robust, efficient, adaptive, effective? 1. Data prep 2. Model selection 3. Training (optimization / search) 4. Hyperparameter tuning on validation data 5. (Blind) Assessment on test data Application Areas Key challenges? NLP, Speech, Computer Vision, Robotics, Medicine, Search Big Ideas in ML: Which are the ideas driving development of the field? inductive bias generalization / overfitting bias-variance decomposition generative vs. discriminative deep nets, graphical models PAC learning distant rewards 2

LEARNING THEORY 3

Questions For Today 1. Given a classifier with zero training error, what can we say about generalization error? (Sample Complexity, Realizable Case) 2. Given a classifier with low training error, what can we say about generalization error? (Sample Complexity, Agnostic Case) 3. Is there a theoretical justification for regularization to avoid overfitting? (Structural Risk Minimization) 4

PAC/SLT models for Supervised Learning PAC / SLT Model Data Source Distribution D on X Learning Algorithm Expert / Oracle Labeled Examples (x 1,c*(x 1 )),, (x m,c*(x m )) Alg.outputs h : X! Y x 1 > 5 x 6 > 2 +1-1 +1 c* : X! Y + + - - - - - Slide from Nina Balcan 6

Two Types of Error True Error (aka. expected risk) Train Error (aka. empirical risk) 7

PAC / SLT Model 8

Three Hypotheses of Interest 9

PAC LEARNING 10

Probably Approximately Correct Whiteboard: (PAC) Learning PAC Criterion Meaning of Probably Approximately Correct PAC Learnable Consistent Learner Sample Complexity 11

Generalization and Overfitting Whiteboard: Realizable vs. Agnostic Cases Finite vs. Infinite Hypothesis Spaces 12

PAC Learning 13

SAMPLE COMPLEXITY RESULTS 14

Sample Complexity Results Four Cases we care about Realizable We ll start with the finite case Agnostic 15

Sample Complexity Results Four Cases we care about Realizable Agnostic 16

Example: Conjunctions In-Class Quiz: Suppose H = class of conjunctions over x in {0,1} M If M = 10,! = 0.1, δ = 0.01, how many examples suffice? Realizable Agnostic 17

Sample Complexity Results Four Cases we care about Realizable Agnostic 18

1. Bound is inversely linear in epsilon (e.g. halving the error requires double the examples) Sample Complexity Results 2. Bound is only logarithmic in H (e.g. quadrupling the hypothesis space only requires double the examples) Four Cases we care about 1. Bound is inversely quadratic in epsilon (e.g. halving the error requires 4x the examples) 2. Bound is only logarithmic in H (i.e. same as Realizable case) Realizable Agnostic 19

Generalization and Overfitting Whiteboard: Sample Complexity Bounds (Agnostic Case) Corollary (Agnostic Case) Empirical Risk Minimization Structural Risk Minimization Motivation for Regularization 20

Sample Complexity Results Four Cases we care about Realizable Agnostic We need a new definition of complexity for a Hypothesis space for these results (see VC Dimension) 21

Learning Theory Objectives You should be able to Identify the properties of a learning setting and assumptions required to ensure low generalization error Distinguish true error, train error, test error Define PAC and explain what it means to be approximately correct and what occurs with high probability Apply sample complexity bounds to real-world learning examples Distinguish between a large sample and a finite sample analysis Theoretically motivate regularization 38