Covariate Shift Consequences and good. practice Covariate shift, re-weight training data, active sampling. Joyce Wang Software Engineer Sep 2017

Similar documents
(Sub)Gradient Descent

Lecture 1: Machine Learning Basics

CS Machine Learning

Assignment 1: Predicting Amazon Review Ratings

Python Machine Learning

Active Learning. Yingyu Liang Computer Sciences 760 Fall

Introduction to Ensemble Learning Featuring Successes in the Netflix Prize Competition

Switchboard Language Model Improvement with Conversational Data from Gigaword

Machine Learning and Data Mining. Ensembles of Learners. Prof. Alexander Ihler

Probabilistic Latent Semantic Analysis

Learning From the Past with Experiment Databases

Generative models and adversarial training

Word Segmentation of Off-line Handwritten Documents

Twitter Sentiment Classification on Sanders Data using Hybrid Approach

CS 446: Machine Learning

Australian Journal of Basic and Applied Sciences

System Implementation for SemEval-2017 Task 4 Subtask A Based on Interpolated Deep Neural Networks

OCR for Arabic using SIFT Descriptors With Online Failure Prediction

Informal Comparative Inference: What is it? Hand Dominance and Throwing Accuracy

Semi-Supervised Face Detection

Module 12. Machine Learning. Version 2 CSE IIT, Kharagpur

A survey of multi-view machine learning

arxiv: v2 [cs.cv] 30 Mar 2017

OPTIMIZATINON OF TRAINING SETS FOR HEBBIAN-LEARNING- BASED CLASSIFIERS

Probability estimates in a scenario tree

Learning Structural Correspondences Across Different Linguistic Domains with Synchronous Neural Language Models

Model Ensemble for Click Prediction in Bing Search Ads

Effectiveness of McGraw-Hill s Treasures Reading Program in Grades 3 5. October 21, Research Conducted by Empirical Education Inc.

Evolutive Neural Net Fuzzy Filtering: Basic Description

Indian Institute of Technology, Kanpur

Universidade do Minho Escola de Engenharia

Rule Learning With Negation: Issues Regarding Effectiveness

Essentials of Ability Testing. Joni Lakin Assistant Professor Educational Foundations, Leadership, and Technology

Edexcel GCSE. Statistics 1389 Paper 1H. June Mark Scheme. Statistics Edexcel GCSE

JONATHAN H. WRIGHT Department of Economics, Johns Hopkins University, 3400 N. Charles St., Baltimore MD (410)

Algebra 1, Quarter 3, Unit 3.1. Line of Best Fit. Overview

WHEN THERE IS A mismatch between the acoustic

FUZZY EXPERT. Dr. Kasim M. Al-Aubidy. Philadelphia University. Computer Eng. Dept February 2002 University of Damascus-Syria

The Good Judgment Project: A large scale test of different methods of combining expert predictions

A Bootstrapping Model of Frequency and Context Effects in Word Learning

MULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question.

Malicious User Suppression for Cooperative Spectrum Sensing in Cognitive Radio Networks using Dixon s Outlier Detection Method

Knowledge Transfer in Deep Convolutional Neural Nets

Phonetic- and Speaker-Discriminant Features for Speaker Recognition. Research Project

Multi-Lingual Text Leveling

INPE São José dos Campos

Speech Segmentation Using Probabilistic Phonetic Feature Hierarchy and Support Vector Machines

Linking Task: Identifying authors and book titles in verbose queries

IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 17, NO. 3, MARCH

Attributed Social Network Embedding

QuickStroke: An Incremental On-line Chinese Handwriting Recognition System

Conference Presentation

Lecture 2: Quantifiers and Approximation

Analysis of Emotion Recognition System through Speech Signal Using KNN & GMM Classifier

Speech Recognition at ICSI: Broadcast News and beyond

Human Emotion Recognition From Speech

Uncertainty concepts, types, sources

Probability and Statistics Curriculum Pacing Guide

Geo Risk Scan Getting grips on geotechnical risks

Rule Learning with Negation: Issues Regarding Effectiveness

Copyright by Sung Ju Hwang 2013

An Empirical Analysis of the Effects of Mexican American Studies Participation on Student Achievement within Tucson Unified School District

CHAPTER 4: REIMBURSEMENT STRATEGIES 24

Document number: 2013/ Programs Committee 6/2014 (July) Agenda Item 42.0 Bachelor of Engineering with Honours in Software Engineering

A GENERIC SPLIT PROCESS MODEL FOR ASSET MANAGEMENT DECISION-MAKING

A Decision Tree Analysis of the Transfer Student Emma Gunu, MS Research Analyst Robert M Roe, PhD Executive Director of Institutional Research and

Introduction to Causal Inference. Problem Set 1. Required Problems

On-the-Fly Customization of Automated Essay Scoring

Lecture 15: Test Procedure in Engineering Design

A Case Study: News Classification Based on Term Frequency

Launching GO 4 Schools as a whole school approach

DOMAIN MISMATCH COMPENSATION FOR SPEAKER RECOGNITION USING A LIBRARY OF WHITENERS. Elliot Singer and Douglas Reynolds

Unsupervised Learning of Word Semantic Embedding using the Deep Structured Semantic Model

Predicting Student Attrition in MOOCs using Sentiment Analysis and Neural Networks

The lab is designed to remind you how to work with scientific data (including dealing with uncertainty) and to review experimental design.

MGT/MGP/MGB 261: Investment Analysis

A Comparison of Two Text Representations for Sentiment Analysis

Number of students enrolled in the program in Fall, 2011: 20. Faculty member completing template: Molly Dugan (Date: 1/26/2012)

Multiple regression as a practical tool for teacher preparation program evaluation

Scholastic Leveled Bookroom

Softprop: Softmax Neural Network Backpropagation Learning

arxiv: v1 [cs.lg] 15 Jun 2015

The University of Amsterdam s Concept Detection System at ImageCLEF 2011

Learning to Rank with Selection Bias in Personal Search

AUTHOR ACCEPTED MANUSCRIPT

Deep search. Enhancing a search bar using machine learning. Ilgün Ilgün & Cedric Reichenbach

Evaluating Interactive Visualization of Multidimensional Data Projection with Feature Transformation

School Size and the Quality of Teaching and Learning

TRANSFER LEARNING OF WEAKLY LABELLED AUDIO. Aleksandr Diment, Tuomas Virtanen

The 9 th International Scientific Conference elearning and software for Education Bucharest, April 25-26, / X

Detecting Wikipedia Vandalism using Machine Learning Notebook for PAN at CLEF 2011

TRANSFER LEARNING IN MIR: SHARING LEARNED LATENT REPRESENTATIONS FOR MUSIC AUDIO CLASSIFICATION AND SIMILARITY

WE GAVE A LAWYER BASIC MATH SKILLS, AND YOU WON T BELIEVE WHAT HAPPENED NEXT

Version Space. Term 2012/2013 LSI - FIB. Javier Béjar cbea (LSI - FIB) Version Space Term 2012/ / 18

Ricopili: Postimputation Module. WCPG Education Day Stephan Ripke / Raymond Walters Toronto, October 2015

Speech Emotion Recognition Using Support Vector Machine

The stages of event extraction

arxiv: v1 [cs.cl] 2 Apr 2017

PREDICTING SPEECH RECOGNITION CONFIDENCE USING DEEP LEARNING WITH WORD IDENTITY AND SCORE FEATURES

Transcription:

Covariate Shift Consequences and good practice Covariate shift, re-weight training data, active sampling Joyce Wang Software Engineer Sep 2017 www.csiro.au

Motivation Validation Accuracy = 0.96 Query Accuracy = 0.67 What is going on here? 2

Outline What is covariate shift? why would it occur? what consequence would it have? How to detect covariate shift? visualization method quantitative method Strategies to handle covariate shift training data reweighting active learning 3

Covariate Shift When the distribution on training and test/query sets do not match, we are facing covariate shift, or sample selection bias. Against fundamental assumption: Both the training and query data should be drawn from the same population / distribution. 4

Distribution Mismatch Training data and query data are drawn from almost the same population 5 Training data and query data are drawn from completely different population

Covariate Shift - Commonplace Lack of randomness Inadequate samples Biased sampling rules 6

Covariate Shift - Consequence Overfitting on training examples Unreliable predictions Example: binary classification wrong decision optimal decision boundary boundary Training set Query set Training set classification actual label 0 actual label 1 7 Query set

Detect Covariate Shift

Detect Covariate Shift Visualization Membership modelling Uncertainty quantification 9

Visualize Training and Query Data Query set Distribution Training set Distribution What if I have high-dimensional data? Per dimension visualization Dimensionality reduction (PCA, t-sne) 10 We need more robust methods.

Membership Modelling We apply a model to predict the probability of a new point being a member of training set. For example, one-class SVM could classify new data as similar or different to the training set. 11

Uncertainty Quantification 1. Fit a probabilistic model to training set 2. Every prediction has uncertainty (confidence interval) associated with it 3. Determine covariate shift with uncertainty of predictions 12

Uncertainty Quantification upper bound prediction value lower bound query low uncertainty similar to training dataset high uncertainty not similar to training dataset 13 high uncertainty

Handle Covariate Shift

Handle Covariate Shift Training Sample Reweighting Make the distribution of training data look like the distribution of query data. Active Sampling Help model gain understanding about query data and learn effectively. 15

Sample Reweighting Build a classifier to classify training and query sets e.g. logistic regression Training Set Query Set classification Color training points by the probability of being in query set 16 Low Median High

Sample Reweighting Reweight every training point in learning process. 17 Training samples Probability of being in query set 1 0.9872 w1 2 0.8754 w2 3 0.7913 w3...... wi n-1 0.2877 wn-1 n 0.1867 wn

Overlap Overlap is essential to apply sample re-weighting. 18

Active Learning Train a probabilistic model. Predict query set with trained model. Find the query point with that is expected to most improve the model Training Set Query Set 19 Get the target value for that most useful point. Put the point into training set.

Active Learning - Demo 20

Active Learning - Demo 21

Active Learning - Demo 22

Active Learning - Demo 23

Active Learning - Demo 24

Active Learning - Demo 25

Active Learning - Demo 26

Comparison of Strategies for Handling Covariate Shift Sample Reweighting Advantages Disadvantages 27 achievable if you cannot get more samples need overlap between training and query sets less understanding on data Active Learning no need for overlap gain more understanding about query data not achievable if you cannot get more samples

Thank you twitter email @joycexinyuewang joyce.wang@data61.csiro.au www.csiro.au

Reference Density Ratio Estimation in Machine Learning http://yosinski.com/mlss12/media/slides/mlss-2012-sugiyama-densit y-ratio-estimation-in-machine-learning.pdf Correcting Sample Selection Bias by Unlabeled Data https://papers.nips.cc/paper/3075-correcting-sample-selection-bias-by -unlabeled-data 29

Uncertainty Quantification probability of positive label 30

Sample Reweighting Reweight every training point in minimizing loss function. where Reweighting 31 training samples Training samples Probability of being in query set 1 0.9872 w1 2 0.8754 w2 3 0.7913 w3...... wi n-1 0.2877 wn-1 n 0.1867 wn

Acquisition Function Reduce the maximum uncertainty Reduce the maximum upper confidence bound Reduce the total uncertainty Utility function if policy is known 32

Detect Covariate Shift - Comparison Membership Modelling Visualization Advantage quick Disadvantage subjective open to interpretation 33 informative quantitative sensitive to tuning parameters Uncertainty Quantification informative quantitative make predictions difficult to work with large-size data

Sample Reweighting Apply trained classifier to obtain the probability of each training point being inside query set Hold-out Training Training Hold-out Training Training samples Probability of being in query set 1 0.9872 Training 2 0.8754 Hold-out 3 0.7913...... n-1 0.2877 n 0.1867 Training Hold-out Training Training Training Hold-out Use cross-validation to avoid over-fitting. 34

Glossary 90% Training data Training set Split 10% Hold-out / Development set Test data Query data 35 used to Validate model (optional) Apply model to predict the y value

Sample Reweighting Reweight every training point in learning process. reweighting Scale training points by weight 36 importance level