HAMLET JERRY ZHU UNIVERSITY OF WISCONSIN

Similar documents
Lecture 1: Machine Learning Basics

Generative models and adversarial training

Active Learning. Yingyu Liang Computer Sciences 760 Fall

Switchboard Language Model Improvement with Conversational Data from Gigaword

Probabilistic Latent Semantic Analysis

(Sub)Gradient Descent

Module 12. Machine Learning. Version 2 CSE IIT, Kharagpur

Introduction to Ensemble Learning Featuring Successes in the Netflix Prize Competition

Semi-Supervised Face Detection

A survey of multi-view machine learning

Artificial Neural Networks written examination

Speech Recognition at ICSI: Broadcast News and beyond

Machine Learning and Data Mining. Ensembles of Learners. Prof. Alexander Ihler

Semi-supervised methods of text processing, and an application to medical concept extraction. Yacine Jernite Text-as-Data series September 17.

Python Machine Learning

Iterative Cross-Training: An Algorithm for Learning from Unlabeled Web Pages

The Good Judgment Project: A large scale test of different methods of combining expert predictions

A Case Study: News Classification Based on Term Frequency

CS Machine Learning

Improving Conceptual Understanding of Physics with Technology

CSL465/603 - Machine Learning

Probability and Statistics Curriculum Pacing Guide

Conversation Starters: Using Spatial Context to Initiate Dialogue in First Person Perspective Games

COMPUTER-ASSISTED INDEPENDENT STUDY IN MULTIVARIATE CALCULUS

Rule Learning With Negation: Issues Regarding Effectiveness

Discriminative Learning of Beam-Search Heuristics for Planning

Twitter Sentiment Classification on Sanders Data using Hybrid Approach

OCR for Arabic using SIFT Descriptors With Online Failure Prediction

Lecture 2: Quantifiers and Approximation

Laboratorio di Intelligenza Artificiale e Robotica

STT 231 Test 1. Fill in the Letter of Your Choice to Each Question in the Scantron. Each question is worth 2 point.

Algebra 2- Semester 2 Review

Using Web Searches on Important Words to Create Background Sets for LSI Classification

Analysis of Emotion Recognition System through Speech Signal Using KNN & GMM Classifier

Courses in English. Application Development Technology. Artificial Intelligence. 2017/18 Spring Semester. Database access

TextGraphs: Graph-based algorithms for Natural Language Processing

Learning Methods in Multilingual Speech Recognition

Reinforcement Learning by Comparing Immediate Reward

LEGO MINDSTORMS Education EV3 Coding Activities

Course Outline. Course Grading. Where to go for help. Academic Integrity. EE-589 Introduction to Neural Networks NN 1 EE

Learning From the Past with Experiment Databases

An Evaluation of the Interactive-Activation Model Using Masked Partial-Word Priming. Jason R. Perry. University of Western Ontario. Stephen J.

Build on students informal understanding of sharing and proportionality to develop initial fraction concepts.

Language Acquisition Fall 2010/Winter Lexical Categories. Afra Alishahi, Heiner Drenhaus

Mathematics Success Grade 7

UNIVERSITY OF CALIFORNIA SANTA CRUZ TOWARDS A UNIVERSAL PARAMETRIC PLAYER MODEL

GACE Computer Science Assessment Test at a Glance

Using focal point learning to improve human machine tacit coordination

The Evolution of Random Phenomena

Rule Learning with Negation: Issues Regarding Effectiveness

Applications of data mining algorithms to analysis of medical data

Word learning as Bayesian inference

WiggleWorks Software Manual PDF0049 (PDF) Houghton Mifflin Harcourt Publishing Company

The 9 th International Scientific Conference elearning and software for Education Bucharest, April 25-26, / X

A study of speaker adaptation for DNN-based speech synthesis

Mathematics process categories

Proposal of Pattern Recognition as a necessary and sufficient principle to Cognitive Science

A Neural Network GUI Tested on Text-To-Phoneme Mapping

Cued Recall From Image and Sentence Memory: A Shift From Episodic to Identical Elements Representation

Visual CP Representation of Knowledge

Laboratorio di Intelligenza Artificiale e Robotica

arxiv: v2 [cs.cv] 30 Mar 2017

Evolutive Neural Net Fuzzy Filtering: Basic Description

Lecture 10: Reinforcement Learning

Learning Structural Correspondences Across Different Linguistic Domains with Synchronous Neural Language Models

2/15/13. POS Tagging Problem. Part-of-Speech Tagging. Example English Part-of-Speech Tagsets. More Details of the Problem. Typical Problem Cases

Introduction to Simulation

How Does Physical Space Influence the Novices' and Experts' Algebraic Reasoning?

Analysis of Enzyme Kinetic Data

Seminar - Organic Computing

Unsupervised Learning of Word Semantic Embedding using the Deep Structured Semantic Model

Star Math Pretest Instructions

Summary / Response. Karl Smith, Accelerations Educational Software. Page 1 of 8

Australian Journal of Basic and Applied Sciences

Mining Student Evolution Using Associative Classification and Clustering

Lecture 1: Basic Concepts of Machine Learning

AQUA: An Ontology-Driven Question Answering System

A NEW ALGORITHM FOR GENERATION OF DECISION TREES

MULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question.

On the Combined Behavior of Autonomous Resource Management Agents

A Bootstrapping Model of Frequency and Context Effects in Word Learning

ADVANCED MACHINE LEARNING WITH PYTHON BY JOHN HEARTY DOWNLOAD EBOOK : ADVANCED MACHINE LEARNING WITH PYTHON BY JOHN HEARTY PDF

End-of-Module Assessment Task

Edexcel GCSE. Statistics 1389 Paper 1H. June Mark Scheme. Statistics Edexcel GCSE

Statewide Framework Document for:

Quantitative analysis with statistics (and ponies) (Some slides, pony-based examples from Blase Ur)

SARDNET: A Self-Organizing Feature Map for Sequences

An Empirical and Computational Test of Linguistic Relativity

Using the Attribute Hierarchy Method to Make Diagnostic Inferences about Examinees Cognitive Skills in Algebra on the SAT

Physics 270: Experimental Physics

***** Article in press in Neural Networks ***** BOTTOM-UP LEARNING OF EXPLICIT KNOWLEDGE USING A BAYESIAN ALGORITHM AND A NEW HEBBIAN LEARNING RULE

Henry Tirri* Petri Myllymgki

Self-Supervised Acquisition of Vowels in American English

Executive Guide to Simulation for Health

Calibration of Confidence Measures in Speech Recognition

Eli Yamamoto, Satoshi Nakamura, Kiyohiro Shikano. Graduate School of Information Science, Nara Institute of Science & Technology

Interpreting ACER Test Results

Learning Optimal Dialogue Strategies: A Case Study of a Spoken Dialogue Agent for

Unsupervised Acoustic Model Training for Simultaneous Lecture Translation in Incremental and Batch Mode

Axiom 2013 Team Description Paper

Transcription:

HAMLET JERRY ZHU UNIVERSITY OF WISCONSIN Collaborators: Rui Castro, Michael Coen, Ricki Colman, Charles Kalish, Joseph Kemnitz, Robert Nowak, Ruichen Qian, Shelley Prudom, Timothy Rogers

Somewhere, something went terribly wrong.

Learning: improve with experience Machine s Animal Human Theory: s common mathematical s Experiments: principles behavioral study, computer simulation

Machine Learning + Cognition Three new case studies of common learning principles in humans, animals and machines: 1. Human semi-supervised learning 2. Human active learning 3. Monkey online learning

HAMLET example #1 Human Semi-Supervised Learning The first work that quantitatively studied human s ability to utilize both labeled and unlabeled data in concept forming.

A Camping Story

A Camping Story

A Camping Story

Supervised Learning? size x D : Input item = stimulus = feature vector y {1, 2} : class label = category Supervised learning: given labeled training examples (x 1,y 1 ) (x n, y n ), learn a classifier f: X Y In this example, decision boundary is in the middle

Back to the Camp

Semi-Supervised Learning Semi-supervised learning (SSL): given labeled examples (x 1,y 1 ) (x n, y n ) and unlabeled examples x n+1 x n+m learn a better classifier f: X Y The cluster assumption (one of many assumptions) SSL well-studied in machine learning IBM: Vikas Sindhwani feature

SSL with Gaussian Mixtures p(x) is a Gaussian mixture: Parameters: p(y x) from Bayes rule: Parameter estimation over labeled data (easy) Parameter estimation over both labeled and unlabeled data (EM algorithm)

SSL with Gaussian Mixtures Prior on parameters: Maximize objective

Human Semi-Supervised Learning Machine learning predicts decision boundary shift Do humans do semi-supervised learning? we are immersed in unlabeled data in supervised tasks (e.g., deciding luggage/bomb)

Materials and Subject Stimuli x parameterized in 1D, displayed on screen one at a time Label y: 2-way forced choice. Labeled data: audio feedback. Unlabeled data: no audio feedback. 22 subjects, two conditions: L and R

Procedure 1. 20 labeled instances 10 each: (-1,-), (1,+), random order (ditto) 2. Test1: x=-1, -0.9, 0.9, 1 3. 690 unlabeled instances sampled from the blue bi-modal distribution, Left- or Rightshifted. Also range examples. 4. Test2: x=-1, -0.9,, 0.9, 1

Results: Decision Boundaries Prob(y=+ x) Test2, L-cond Test1 Test2, R-cond Human decision boundaries shift after seeing unlabeled data.

Results: Reaction Time Test1 Test2, L-cond Test2, R-cond Peak of reaction time shifts accordingly

SSL Machine Learning Model Fit Prediction of the Gaussian Mixture Model The same labeled and unlabeled input, parameters learned with the EM algorithm Reaction time modeled as RT = a * Entropy(p(y x)) + b

HAMLET example #2 Human Active Learning The first work that quantitatively studied human s ability to actively select good queries in category learning.

Alien Eggs

Alien Eggs

Alien Eggs

Alien Eggs

Alien Eggs Active learning required 3 queries (in this case binary search) Passive learning with i.i.d. training examples likely needs more

The Learning Task 1D feature x Two classes y Unknown but fixed boundary Label noise (no more binary search!) Goal: learn from training data (x 1,y 1 ) (x n, y n ) Major difference in how x 1 x n are chosen Passive learning: x i.i.d. (in this case from uniform[0,1]) Active learning: at iteration i, learner selects x i

Learning-Theoretic Error Bounds Passive learning: with n random training examples, the minimax lower bound for boundary estimation error decreases polynomially as O(1/n) Active learning: there is a probabilistic bisecting algorithm for which the boundary estimation error decreases exponentially.

Human Active Learning 33 subjects randomly divided into three conditions Random (passive): subject receives i.i.d. (x,y) examples Active: subject use mouse scroll to choose x, receives y Yoked: subject receives x chosen by machine active learning algorithm, and its y, as if the machine is teaching the human. 5 sessions of 45 iterations, with different, Report boundary guess every 3 iterations.

Results Human active learning better than passive Noise makes human learning difficult

Results Human active learning decreases error exponentially, as learning theory predicts However, the decay constant is smaller than predicted

Human Active Strategies nudge just to be sure

HAMLET example #3 Monkey Online Learning Faced with an adversary, why do monkeys behave so differently than an online learning algorithm?

Wisconsin Card Sort Task (WCST) Three shapes, three colors on each screen Initial target concept: red, shape irrelevant After 10 consecutive correct trials, concept drifts to triangle (later to Blue, and Star ) How should a learner adjust?

Online Learning Against an Adversary Each object x has d=6 Boolean features (R,G,B,C,S,T). Repeat Adversary presents 3 objects, each with two features on (e.g., Red Circle) Adversary can change the taget concept before seeing learner s pick learner picks one, adversary says yes/no Want: the number of mistakes not too larger than the number of concept drifts.

An Online Learning Algorithm Theorem: For any input sequence with m concept drifts, the algorithm makes at most (2m + 1)(d 1) mistakes. Specifically, the bound is 35 (m=3, d=6). In practice, only 2 to 4 errors per concept drift.

Monkeys Play WCST 7 Rhesus monkeys on diet Touch screen Food pellet reward for touching target concept

Results WCST81010 1 0.9 level 1 level 2 level 3 level 4 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 WCST80088 level 1 level 2 level 3 level 43 level 4 0 0 200 400 600 800 1000 1200 trials 1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 0 500 1000 1500 2000 2500 trials 1 0.9 0.8 0.7 0.6 0.5 0.4 WCST81092 level level 1 21 level 2level 3 level 4 WCST84076 level level 2 12 level 3 level 4 1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 WCST80160 level 1 level 12 level 3 level 4 0 0 100 200 300 400 500 600 700 800 900 trials 1 WCST82057 level 2 level level 3 2level 3 level 4 0 0 200 400 600 800 1000 trials 1 0.9 0.8 0.7 0.6 0.5 0.4 WCST85014 level 1 level 2 level 3 level 4 0 0 200 400 600 800 1000 1200 trials Accuracy (30-trial average) Reaction time (x10 seconds) 0.3 0.2 0.1 0 0 100 200 300 400 500 600 700 800 trials 0.3 0.2 0.1 0 0 200 400 600 800 1000 trials Session

Results trials errors persv Red 425 242 - Triangl e 249 113 89 Blue 437 247 186 Star 279 132 94 Monkeys adapt to concept drifts slowly: ~300 trials Perservarative error (what would be correct under the previous concept) dominates at 75% No slow down after concept drifts: do they realize the change?

A Few Lessons Learned (warning: highly subjective and speculative)

Lessons for Machine Learning 1. Difficulty: Monkeys > Undergrads > Computers 2. There is no train/test split. People always learn and adapt, even on test data. 3. Strong sparsity. People focus on one feature. 4. Motivation. Non-diet monkeys refuse to learn. 5. Making existing ML algorithms dumber to explain natural learning is not very interesting.

References 1. Xiaojin Zhu and Andrew Goldberg. Introduction to Semi- Supervised Learning. Morgan-Claypool, 2009 (to appear). 2. Xiaojin Zhu, Timothy Rogers, Ruichen Qian, and Chuck Kalish. Humans perform semi-supervised classification too. In Twenty- Second AAAI Conference on Artificial Intelligence (AAAI-07), 2007. 3. Xiaojin Zhu. Semi-supervised learning literature survey. Technical Report 1530, Department of Computer Sciences, University of Wisconsin, Madison, 2005. 4. Rui Castro, Charles Kalish, Robert Nowak, Ruichen Qian, Timothy Rogers, and Xiaojin Zhu. Human active learning. In Advances in Neural Information Processing Systems (NIPS) 22, 2008. 5. Xiaojin Zhu, Michael Coen, Shelley Prudom, Ricki Colman, and Joseph Kemnitz. Online learning in monkeys. In Twenty-Third AAAI Conference on Artificial Intelligence (AAAI-08), 2008.

Some Other Work Multi-manifold, online semi-supervised learning Learning bigram LM from unigram bag-ofwords New year s wishes Text-to-picture synthesis

Conclusion Machine learning and cognitive science have much to offer to each other. Thank you

What s in a Name A feature a dimension Instance x (feature vector, point in feature space) a stimulus (continuous in this talk; discrete possible) Label y a category (two categories in this talk; multiple categories, or a continuous prediction possible) Classification concept/category learning Labeled data supervised experience (e.g., explicit instructions) from a teacher Unlabeled data passive experiences (including, but not limited to, test instances be careful)

Learning Paradigms Unsupervised learning: given x 1 x n, do clustering, outlier detection etc. Supervised learning: given (x 1, y 1 ) (x n, y n ), learn a predictor f: X Y Semi-supervised learning (SSL): given (x 1, y 1 ) (x n, y n ), x n+1 x n+m, learn a better predictor f: X Y

SSL Model 1: Mixtures Gaussian Mixture Models, Multinomial (bag-of-word) mixture Assumption: each class y has a specific parametric conditional distribution p(x y) for its items (e.g. Gaussian).

SSL Model 2: Large Margin Transductive Support Vector Machines, Gaussian Processes Assumption: instances from different classes are separated by a large gap (the margin).

SSL Model 3: Graph Graph cut, label propagation, manifold regularization, SSL on tree structure Assumption: two instances connected by a strong edge have similar labels.

When does SSL help? SSL helps, if the assumption fits the link between: p(x): what unlabeled can tell us, and p(y x): what the true classification should be Warning: wrong SSL assumption can actually lead to worse learning! but even this can be interesting

Results Human passive learning even slower than 1/n polynomially. Yoked: humans learn to rely on computer.

Monkey Algorithm? Slow learner: skip step 3, 4 with probability Stubborn: when h=0, retain the incorrect h with probability With =0.93 and =0.96, algorithm makes 563 errors, in which 67% perservarative.