Capacity, Learning, Teaching

Similar documents
Lecture 1: Machine Learning Basics

Generative models and adversarial training

(Sub)Gradient Descent

Semi-Supervised Face Detection

A survey of multi-view machine learning

Course Outline. Course Grading. Where to go for help. Academic Integrity. EE-589 Introduction to Neural Networks NN 1 EE

CSL465/603 - Machine Learning

Speech Recognition at ICSI: Broadcast News and beyond

Probabilistic Latent Semantic Analysis

Switchboard Language Model Improvement with Conversational Data from Gigaword

Python Machine Learning

Active Learning. Yingyu Liang Computer Sciences 760 Fall

CS Machine Learning

Evolutive Neural Net Fuzzy Filtering: Basic Description

Twitter Sentiment Classification on Sanders Data using Hybrid Approach

Iterative Cross-Training: An Algorithm for Learning from Unlabeled Web Pages

Rule Learning With Negation: Issues Regarding Effectiveness

Learning Structural Correspondences Across Different Linguistic Domains with Synchronous Neural Language Models

FF+FPG: Guiding a Policy-Gradient Planner

Module 12. Machine Learning. Version 2 CSE IIT, Kharagpur

Multi-Dimensional, Multi-Level, and Multi-Timepoint Item Response Modeling.

Machine Learning and Data Mining. Ensembles of Learners. Prof. Alexander Ihler

Introduction to Ensemble Learning Featuring Successes in the Netflix Prize Competition

Semi-supervised methods of text processing, and an application to medical concept extraction. Yacine Jernite Text-as-Data series September 17.

Discriminative Learning of Beam-Search Heuristics for Planning

A New Perspective on Combining GMM and DNN Frameworks for Speaker Adaptation

The 9 th International Scientific Conference elearning and software for Education Bucharest, April 25-26, / X

Version Space. Term 2012/2013 LSI - FIB. Javier Béjar cbea (LSI - FIB) Version Space Term 2012/ / 18

Corrective Feedback and Persistent Learning for Information Extraction

Rule Learning with Negation: Issues Regarding Effectiveness

Comparison of network inference packages and methods for multiple networks inference

A study of speaker adaptation for DNN-based speech synthesis

Word Segmentation of Off-line Handwritten Documents

Learning Methods in Multilingual Speech Recognition

Phonetic- and Speaker-Discriminant Features for Speaker Recognition. Research Project

arxiv: v1 [cs.lg] 15 Jun 2015

Investigation on Mandarin Broadcast News Speech Recognition

Softprop: Softmax Neural Network Backpropagation Learning

Human Emotion Recognition From Speech

Unsupervised Learning of Word Semantic Embedding using the Deep Structured Semantic Model

Mining Student Evolution Using Associative Classification and Clustering

A NOVEL SCHEME FOR SPEAKER RECOGNITION USING A PHONETICALLY-AWARE DEEP NEURAL NETWORK. Yun Lei Nicolas Scheffer Luciana Ferrer Mitchell McLaren

Machine Learning and Development Policy

Longest Common Subsequence: A Method for Automatic Evaluation of Handwritten Essays

The Evolution of Random Phenomena

Speech Segmentation Using Probabilistic Phonetic Feature Hierarchy and Support Vector Machines

Word learning as Bayesian inference

Likelihood-Maximizing Beamforming for Robust Hands-Free Speech Recognition

Calibration of Confidence Measures in Speech Recognition

CS 446: Machine Learning

Australian Journal of Basic and Applied Sciences

Robot Learning Simultaneously a Task and How to Interpret Human Instructions

Learning From the Past with Experiment Databases

Modeling function word errors in DNN-HMM based LVCSR systems

Learning Human Utility from Video Demonstrations for Deductive Planning in Robotics

Citrine Informatics. The Latest from Citrine. Citrine Informatics. The data analytics platform for the physical world

arxiv: v1 [cs.lg] 3 May 2013

IT Students Workshop within Strategic Partnership of Leibniz University and Peter the Great St. Petersburg Polytechnic University

The Good Judgment Project: A large scale test of different methods of combining expert predictions

Cultivating DNN Diversity for Large Scale Video Labelling

Lecture 1: Basic Concepts of Machine Learning

ADVANCES IN DEEP NEURAL NETWORK APPROACHES TO SPEAKER RECOGNITION

A Case Study: News Classification Based on Term Frequency

ADVANCED MACHINE LEARNING WITH PYTHON BY JOHN HEARTY DOWNLOAD EBOOK : ADVANCED MACHINE LEARNING WITH PYTHON BY JOHN HEARTY PDF

Deep search. Enhancing a search bar using machine learning. Ilgün Ilgün & Cedric Reichenbach

Modeling function word errors in DNN-HMM based LVCSR systems

Artificial Neural Networks written examination

OCR for Arabic using SIFT Descriptors With Online Failure Prediction

Preference Learning in Recommender Systems

A Simple VQA Model with a Few Tricks and Image Features from Bottom-up Attention

INPE São José dos Campos

arxiv: v2 [cs.cv] 30 Mar 2017

IEEE/ACM TRANSACTIONS ON AUDIO, SPEECH AND LANGUAGE PROCESSING, VOL XXX, NO. XXX,

Beyond the Pipeline: Discrete Optimization in NLP

Speech Emotion Recognition Using Support Vector Machine

How do adults reason about their opponent? Typologies of players in a turn-taking game

Using EEG to Improve Massive Open Online Courses Feedback Interaction

Attributed Social Network Embedding

BUILDING CONTEXT-DEPENDENT DNN ACOUSTIC MODELS USING KULLBACK-LEIBLER DIVERGENCE-BASED STATE TYING

OPTIMIZATINON OF TRAINING SETS FOR HEBBIAN-LEARNING- BASED CLASSIFIERS

Assignment 1: Predicting Amazon Review Ratings

Comparison of EM and Two-Step Cluster Method for Mixed Data: An Application

An Empirical and Computational Test of Linguistic Relativity

Online Updating of Word Representations for Part-of-Speech Tagging

Edexcel GCSE. Statistics 1389 Paper 1H. June Mark Scheme. Statistics Edexcel GCSE

A NEW ALGORITHM FOR GENERATION OF DECISION TREES

Semantic Segmentation with Histological Image Data: Cancer Cell vs. Stroma

Exploration. CS : Deep Reinforcement Learning Sergey Levine

COMPUTER-ASSISTED INDEPENDENT STUDY IN MULTIVARIATE CALCULUS

Segregation of Unvoiced Speech from Nonspeech Interference

Segmental Conditional Random Fields with Deep Neural Networks as Acoustic Models for First-Pass Word Recognition

Deep Facial Action Unit Recognition from Partially Labeled Data

WHEN THERE IS A mismatch between the acoustic

1. REFLEXES: Ask questions about coughing, swallowing, of water as fast as possible (note! Not suitable for all

A Genetic Irrational Belief System

A Privacy-Sensitive Approach to Modeling Multi-Person Conversations

TextGraphs: Graph-based algorithms for Natural Language Processing

Semi-Supervised GMM and DNN Acoustic Model Training with Multi-system Combination and Confidence Re-calibration

MULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question.

Data Stream Processing and Analytics

Transcription:

Capacity, Learning, Teaching Xiaojin Zhu Department of Computer Sciences University of Wisconsin-Madison jerryzhu@cs.wisc.edu 23

Machine learning human learning Learning capacity and generalization bounds Beyond supervised learning: semi-supervised, active Beyond learning: teaching

Capacity VC-dimension F : a family of binary classifiers VC-dimension V C(F ): size of the largest set that F can shatter With probability at least δ, sup R(f) R n (f) 2 f F R(f): error of f in the future R n (f): error of f on a training set of size n 2 V C(F ) log n + V C(F ) log 2e V C(F ) + log 2 δ. n

Capacity Rademacher complexity σ,..., σ n : P (σ i = ) = P (σ i = ) = 2 Rademacher complexity Rad n (F ) = E σ,x (sup f F With probability at least δ, n ) n σ i f(x i ). i= sup R n (f) R(f) 2Rad n (F ) + f F log(2/δ). 2n

Machine learning human learning f: you categorize x by f(x) F : all the classifiers in your mind R n (f): how did you do in class R(f): how well can you do outside class Capacity: can we measure it in humans? V C(F ): too brittle (find one dataset of size n) and combinatorial (verify shattering) Others may behave better, e.g., Radn (F )

Measuring human Rademacher complexity learning random labels (x, σ )... (x n, σ n ), e.g., (grenade, B), (skull, A), (conflict, A), (meadow, B), (queen, B) Rad n (F ) m m j= n n i= σ(j) i ˆf (j) (x (j) i ) ˆf mnemonics: a queen was sitting in a meadow and then a grenade was thrown (B = before), then this started a conflict ending in bodies & skulls (A = after). ˆf wrong rules: (daylight, A), (hospital, B), (termite, B), (envy, B), (scream, B), anything related to omitting[sic] light rape killer funeral fun laughter joy Rademacher complexity 2.5 2 3 4 n Rademacher complexity 2.5 2 3 4 n

Overfitting indicator.5 bound observed e e^ Shape,4 Word,4 Shape,5 Word,5.5 2 Rademacher complexity e test set error, ê training set error generalization error bound holds actual overfitting tracks bound (nice but not predicted by theory) The study of capacity may constrain cognitive models understand groups differ in age, health, education, etc.

Human semi-supervised learning.9.8.7.6.4.3.2. Humans learn supervised first, then... decision boundary shifts to distribution trough in test data Can be explained by a variety of semi-supervised machine learning models test examples range examples left shifted Gaussian mixture 3 2 2 3 x percent class 2 responses.9.8.7.6.4.3.2. test, all test 2, L subjects test 2, R subjects x

Human semi-supervised learning, the other way around Human unsupervised learning first p(x) p(x) p(x) p(x) 8 4.8.6 8 8.6 4.8 8 8.6 8 8.6 8 time time time time 8 4.8.6 8 x 8.6 4.8 8 x 8.6 8 x 8.6 8 x trough peak uniform converge... influences subsequent (identical) supervised learning task mean accuracy ± std err.95.9.85.8.75.7 trough converge uniform peak.65.6

Active learning Passive learning (slow) inf ˆθ n sup θ [,] E[ ˆθ n θ ] 4 ( ) + 2ɛ 2ɛ 2ɛ n + Active learning (fast) ( sup E[ ˆθ n θ ] 2 θ [,] 2 + ) n ɛ( ɛ)

Active learning humans noise ɛ = ɛ =.5 ɛ =. ɛ =.2 ɛ =.4 Human Passive 2 3 4 2 3 4 2 3 4 2 3 4 2 3 4 Human Active 2 3 4 2 3 4 2 3 4 2 3 4 2 3 4

Machine teaching Example: a threshold classifier in D passive learning (x i, y i ) iid p, risk O( n ) active learning risk 2 n taught: n = 2. Teaching dimension curriculum learning

Human teacher behaviors strategy graspability (n = 3) lines (n = 32) boundary % 56% curriculum 48% 9% linear 42% 25% positive % %

A framework for teaching a Bayesian learner. World: p(x, y θ ), loss function l(f(x), y) 2. Learner: Bayesian. prior over Θ (θ Θ), likelihood p(x, y θ) maintains posterior p(θ data) by Bayesian update makes prediction f(x data) using the posterior 3. Teacher: clairvoyant, knows everything above can only teach by examples (x, y) goal: choose the least-effort teaching set D = (x, y):n to minimize the learner s future loss (risk): E θ [l(f(x D), y)] + effort(d) if the future loss approaches Bayes risk, D is a teaching set and n is the (generalized) teaching dimension

References R. Castro, C. Kalish, R. Nowak, R. Qian, T. Rogers, and X. Zhu. Human active learning. In Advances in Neural Information Processing Systems (NIPS) 22. 28. B. R. Gibson, T. T. Rogers, and X. Zhu. Human semi-supervised learning. Topics in Cognitive Science, 5():32 72, 23. F. Khan, X. Zhu, and B. Mutlu. How do humans teach: On curriculum learning and teaching dimension. In Advances in Neural Information Processing Systems (NIPS) 25. 2. X. Zhu, T. Rogers, R. Qian, and C. Kalish. Humans perform semi-supervised classification too. In Twenty-Second AAAI Conference on Artificial Intelligence (AAAI-7), 27. X. Zhu, T. T. Rogers, and B. Gibson. Human Rademacher complexity. In Advances in Neural Information Processing Systems (NIPS) 23. 29.