Do we need more training data or better models for object detection?

Similar documents
Python Machine Learning

Generating Natural-Language Video Descriptions Using Text-Mined Knowledge

Lecture 1: Machine Learning Basics

Webly Supervised Learning of Convolutional Networks

Module 12. Machine Learning. Version 2 CSE IIT, Kharagpur

Probabilistic Latent Semantic Analysis

Generative models and adversarial training

Diverse Concept-Level Features for Multi-Object Classification

Learning From the Past with Experiment Databases

Semi-Supervised Face Detection

Machine Learning and Data Mining. Ensembles of Learners. Prof. Alexander Ihler

Assignment 1: Predicting Amazon Review Ratings

Probability and Statistics Curriculum Pacing Guide

Introduction to Ensemble Learning Featuring Successes in the Netflix Prize Competition

Linking Task: Identifying authors and book titles in verbose queries

CS Machine Learning

Active Learning. Yingyu Liang Computer Sciences 760 Fall

Speech Recognition at ICSI: Broadcast News and beyond

Rule Learning With Negation: Issues Regarding Effectiveness

DOCTORAL SCHOOL TRAINING AND DEVELOPMENT PROGRAMME

Switchboard Language Model Improvement with Conversational Data from Gigaword

The University of Amsterdam s Concept Detection System at ImageCLEF 2011

Lip Reading in Profile

CSL465/603 - Machine Learning

A Case Study: News Classification Based on Term Frequency

Multi-label classification via multi-target regression on data streams

Reducing Features to Improve Bug Prediction

Calibration of Confidence Measures in Speech Recognition

Knowledge Transfer in Deep Convolutional Neural Nets

Business Analytics and Information Tech COURSE NUMBER: 33:136:494 COURSE TITLE: Data Mining and Business Intelligence

OCR for Arabic using SIFT Descriptors With Online Failure Prediction

Lecture 1: Basic Concepts of Machine Learning

(Sub)Gradient Descent

THINKING SKILLS, STUDENT ENGAGEMENT BRAIN-BASED LEARNING LOOKING THROUGH THE EYES OF THE LEARNER AND SCHEMA ACTIVATOR ENGAGEMENT POINT

Statewide Framework Document for:

PLANNING FOR K TO 12. Don Brodeth, CFA Taft Consulting Group

Why Did My Detector Do That?!

CS4491/CS 7265 BIG DATA ANALYTICS INTRODUCTION TO THE COURSE. Mingon Kang, PhD Computer Science, Kennesaw State University

Using Web Searches on Important Words to Create Background Sets for LSI Classification

Probability estimates in a scenario tree

Rule Learning with Negation: Issues Regarding Effectiveness

Unsupervised Learning of Word Semantic Embedding using the Deep Structured Semantic Model

A Survey on Unsupervised Machine Learning Algorithms for Automation, Classification and Maintenance

System Implementation for SemEval-2017 Task 4 Subtask A Based on Interpolated Deep Neural Networks

Semi-supervised methods of text processing, and an application to medical concept extraction. Yacine Jernite Text-as-Data series September 17.

arxiv: v2 [cs.cv] 4 Mar 2016

K-Medoid Algorithm in Clustering Student Scholarship Applicants

CS 446: Machine Learning

TABLE OF CONTENTS TABLE OF CONTENTS COVER PAGE HALAMAN PENGESAHAN PERNYATAAN NASKAH SOAL TUGAS AKHIR ACKNOWLEDGEMENT FOREWORD

Language skills to be used and worked upon : Listening / Speaking PPC-PPI / Reading / Writing

Action Recognition and Video

Learning Methods for Fuzzy Systems

Chapter 10 APPLYING TOPIC MODELING TO FORENSIC DATA. 1. Introduction. Alta de Waal, Jacobus Venter and Etienne Barnard

Phonetic- and Speaker-Discriminant Features for Speaker Recognition. Research Project

Experiments with SMS Translation and Stochastic Gradient Descent in Spanish Text Author Profiling

Twitter Sentiment Classification on Sanders Data using Hybrid Approach

arxiv: v1 [cs.lg] 15 Jun 2015

Model Ensemble for Click Prediction in Bing Search Ads

Results In. Planning Questions. Tony Frontier Five Levers to Improve Learning 1

Human Emotion Recognition From Speech

Online Updating of Word Representations for Part-of-Speech Tagging

A study of speaker adaptation for DNN-based speech synthesis

Speech Segmentation Using Probabilistic Phonetic Feature Hierarchy and Support Vector Machines

Netpix: A Method of Feature Selection Leading. to Accurate Sentiment-Based Classification Models

MINUTE TO WIN IT: NAMING THE PRESIDENTS OF THE UNITED STATES

The Internet as a Normative Corpus: Grammar Checking with a Search Engine

Algebra 1, Quarter 3, Unit 3.1. Line of Best Fit. Overview

Evaluation of a College Freshman Diversity Research Program

Speech Emotion Recognition Using Support Vector Machine

arxiv: v2 [cs.cv] 3 Aug 2017

The 9 th International Scientific Conference elearning and software for Education Bucharest, April 25-26, / X

Multivariate k-nearest Neighbor Regression for Time Series data -

Identifying Topical Authorities in Microblogs

ScienceDirect. A Framework for Clustering Cardiac Patient s Records Using Unsupervised Learning Techniques

Software Maintenance

Comparison of EM and Two-Step Cluster Method for Mixed Data: An Application

Automatic document classification of biological literature

Autoregressive product of multi-frame predictions can improve the accuracy of hybrid models

A Vector Space Approach for Aspect-Based Sentiment Analysis

Course Outline. Course Grading. Where to go for help. Academic Integrity. EE-589 Introduction to Neural Networks NN 1 EE

arxiv: v1 [cs.cl] 2 Apr 2017

arxiv: v1 [cs.cv] 10 May 2017

ADVANCES IN DEEP NEURAL NETWORK APPROACHES TO SPEAKER RECOGNITION

Disciplinary Literacy in Science

UNIVERSITY OF CALIFORNIA SANTA CRUZ TOWARDS A UNIVERSAL PARAMETRIC PLAYER MODEL

arxiv: v2 [cs.cv] 30 Mar 2017

CLASSIFICATION OF TEXT DOCUMENTS USING INTEGER REPRESENTATION AND REGRESSION: AN INTEGRATED APPROACH

Segmentation of Multi-Sentence Questions: Towards Effective Question Retrieval in cqa Services

Iterative Cross-Training: An Algorithm for Learning from Unlabeled Web Pages

Student Course Evaluation Class Size, Class Level, Discipline and Gender Bias

Truth Inference in Crowdsourcing: Is the Problem Solved?

Comment-based Multi-View Clustering of Web 2.0 Items

Multi-Lingual Text Leveling

Learning Structural Correspondences Across Different Linguistic Domains with Synchronous Neural Language Models

Deep Facial Action Unit Recognition from Partially Labeled Data

Improving Simple Bayes. Abstract. The simple Bayesian classier (SBC), sometimes called

Critical Decisions within Student Learning Objectives: Target Setting Model

Medical Complexity: A Pragmatic Theory

PROJECT DESCRIPTION SLAM

UCLA UCLA Electronic Theses and Dissertations

Transcription:

Do we need more training data or better models for object detection? Xiangxin Zhu Carl Vondrick Deva Ramanan Charless Fowlkes University of California, Irvine Appeared in BMVC 2012 Slides adapted from Charless Fowlkes 1

MoBvaBons 2 2

Current state of object recognibon Avg. AP 0.4 0.3 0.2 2006 2007 2008 2009 2010 2011 Year Avg. AP 0.4 0.3 0.2 400 600 800 1000 1200 1400 Avg. num. of training samples per class PASCAL VOC detec=on challenge provides realisbc benchmark of object detecbon performance. Performance has steadily increased! 3

Current state of object recognibon Avg. AP 0.4 0.3 0.2 2006 2007 2008 2009 2010 2011 Year Avg. AP 0.4 0.3 0.2 400 600 800 1000 1200 1400 Avg. num. of training samples per class PASCAL VOC detec=on challenge provides realisbc benchmark of object detecbon performance. Performance has steadily increased!. but so has the amount of training data?? 4

Bayes Risk P(X face) P(X background) Feature space may limit our ulbmate classificabon performance 5

Performance saturabon Ideal Performance Data 6

Model Bias Class of models may not be flexible enough 7

Model Bias 8

Ideal Performance Model Complexity 9

Experiments 10 10

Experiment #1 Single Face Template HOG Feature vector Train a linear classifier using SVM PosiBve examples + hard negabve mining 11

Performance vs #training examples 12

Performance vs #training examples Worse performance with more training data?!?!? 13

Performance vs #training examples Average precision 0.6 0.5 0.4 0.3 Single template face model Fixed C=0.002 Crossval on C 0 500 1000 Num. of training samples 14

Average precision 0.6 0.5 0.4 0.3 0.2 Need to make cross validabon easy for everyday users! 0.1 10 7 10 5 10 3 10 1 10 1 C N=10 N=50 N=100 N=500 N=900 15

16 16

Experiment #2 We want to detect faces at many different viewpoints what posibve training data should we use? (a) include all viewpoints in training (b) only train on a subset of views (e.g. frontal faces) 17

Single template face model Average precision 0.6 0.5 0.4 All Frontal 0 200 400 600 800 Number of training samples Worse performance with more training data?!?!? 18

Single template face model Average precision 0.6 0.5 0.4 All Frontal 0 200 400 600 800 Number of training samples Single template trained with 200 clean frontal faces outperforms template trained with 800 images that include all views of faces. This holds true for both training and test performance 19

Learned templates All views Frontal views only 20

SVM is sensibve to outliers Single template face model Average precision 0.6 0.5 0.4 All Frontal 0 200 400 600 800 Number of training samples ALL has lower training objec3ve, but higher 0-1 loss! 21

Experiment #3 Increase model complexity by using mixture components to model different viewpoints. 22

Model wider range of variability by using a mixture of rigid templates 23

DiscriminaBve clustering uses mixture components to take care of outliers? AP Dataset Size 24

Human supervised clustering 1 3 5 13 25

Human- in- the- loop clustering can boost mixture model performance 1 3 5 13 Average precision 0.74 0.72 0.7 0.68 0.66 Face Human cluster, K=5 Kmeans cluster, K=4 0 500 1000 Num. of training samples 26

Human- in- the- loop clustering can boost mixture model performance Average precision 0.6 0.55 0.5 Bus 0.45 Human cluster, K=5 0.4 Kmeans cluster, K=4 0.35 0 1000 2000 3000 4000 5000 Num. of training samples 27

0.8 0.7 0.8 0.7 AP 0.6 K=1 K=3 K=5 0.5 K=13 K=26 0.4 0 500 1000 Number of training data AP 0.6 N=50 0.5 N=100 N=500 N=900 0.4 0 10 20 30 Number of mixtures Ideal Ideal Performance Data Model Complexity 28

Bus Category 0.6 0.55 0.5 0.45 0.4 0.35 K=1 K=3 K=5 K=11 K=21 AP 0.6 0.55 0.5 0.45 0.4 0.35 N=50 N=100 N=500 N=1000 N=1898 0 500 1000 1500 2000 Number of training data 0 5 10 15 20 25 Number of mixtures 29

PASCAL 10x Dataset Collected 10 Bmes as much posibve training data as original PASCAL dataset Collect images from Flickr, MTurk users label images 30

10x training dataset with DPM Average precision Horse Bicycle 0.4 Bus Cat 0.3 Cow Diningtable 0.2 Motorbike Sheep Sofa 0.1 Train Tvmonitor 0 0 2000 4000 6000 8000 100001200014000 Num. of training samples Cross validabon to choose opbmal regularizabon and # of mixture components for each category Performance saturates with 10 templates per category and 100 posibve training examples per template 31

Experiment #4: Have we reached Bayes Risk for linear classifiers with HOG features? Ideal Performance Data 32

Deformable part models parts deformabon model detector output Represent local part appearance with templates, connected by springs that encode relabve locabons. Trained using SVM. [Felzenszwalb, McAllester, Ramaman. 2008] 33

Alternate view of DPM Every placement of parts synthesizes a rigid template Dynamic programming used in DPM is a fast way to index a very large collecbon of rigid templates 34

Why does DPM do bejer than rigid mixtures? Part appearances are shared during training Can extrapolate to new unseen configurabons 35

Rigid Part Model (RPM): part appearance is learned from training only score spabal configurabons of parts seen during training very fast to test 36

Faces 0.9 Average precision 0.8 0.7 Sup.DPM RMP 0.6 Latent DPM Mix. HoG templ. 0.5 K=1, Frontal 0.4 K=1, All 0 200 400 600 800 1000 Num. of training samples 37

State of the art face detecbon with only 100 training examples DPM with shared parameters precision [Zhu & Ramanan, 2012] 38

39

Do we need more data? 40

Do we need more data? More training data helps, but only if you are careful Clean training data can help SVM which is sensibve to outliers Having the proper correspondence / alignment / clustering can greatly improve model performance Bejer models might provide more bang for the buck 41

Dataset Bias: Distribu3ons Match 42 42