Cross-Validation. By: Huaicheng Liu Jiaxin Deng

Similar documents
Probability and Statistics Curriculum Pacing Guide

Python Machine Learning

Learning From the Past with Experiment Databases

CS Machine Learning

On-the-Fly Customization of Automated Essay Scoring

MULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question.

Statewide Framework Document for:

Class-Discriminative Weighted Distortion Measure for VQ-Based Speaker Identification

IS FINANCIAL LITERACY IMPROVED BY PARTICIPATING IN A STOCK MARKET GAME?

Create Quiz Questions

Machine Learning and Data Mining. Ensembles of Learners. Prof. Alexander Ihler

ACTL5103 Stochastic Modelling For Actuaries. Course Outline Semester 2, 2014

Classifying combinations: Do students distinguish between different types of combination problems?

STA 225: Introductory Statistics (CT)

Lecture 1: Machine Learning Basics

Improving Conceptual Understanding of Physics with Technology

Switchboard Language Model Improvement with Conversational Data from Gigaword

COMPUTER-ASSISTED INDEPENDENT STUDY IN MULTIVARIATE CALCULUS

Semi-Supervised GMM and DNN Acoustic Model Training with Multi-system Combination and Confidence Re-calibration

learning collegiate assessment]

Assignment 1: Predicting Amazon Review Ratings

SETTING STANDARDS FOR CRITERION- REFERENCED MEASUREMENT

Mandarin Lexical Tone Recognition: The Gating Paradigm

OPTIMIZATINON OF TRAINING SETS FOR HEBBIAN-LEARNING- BASED CLASSIFIERS

Chapters 1-5 Cumulative Assessment AP Statistics November 2008 Gillespie, Block 4

Introduction to Ensemble Learning Featuring Successes in the Netflix Prize Competition

THE PENNSYLVANIA STATE UNIVERSITY SCHREYER HONORS COLLEGE DEPARTMENT OF MATHEMATICS ASSESSING THE EFFECTIVENESS OF MULTIPLE CHOICE MATH TESTS

Mathematics (JUN14MS0401) General Certificate of Education Advanced Level Examination June Unit Statistics TOTAL.

The lab is designed to remind you how to work with scientific data (including dealing with uncertainty) and to review experimental design.

Abstractions and the Brain

An Empirical and Computational Test of Linguistic Relativity

Grade 2: Using a Number Line to Order and Compare Numbers Place Value Horizontal Content Strand

Statistical Analysis of Climate Change, Renewable Energies, and Sustainability An Independent Investigation for Introduction to Statistics

PREDICTING SPEECH RECOGNITION CONFIDENCE USING DEEP LEARNING WITH WORD IDENTITY AND SCORE FEATURES

Individual Differences & Item Effects: How to test them, & how to test them well

Probability Therefore (25) (1.33)

Essentials of Ability Testing. Joni Lakin Assistant Professor Educational Foundations, Leadership, and Technology

STT 231 Test 1. Fill in the Letter of Your Choice to Each Question in the Scantron. Each question is worth 2 point.

Exemplar 6 th Grade Math Unit: Prime Factorization, Greatest Common Factor, and Least Common Multiple

CONSTRUCTION OF AN ACHIEVEMENT TEST Introduction One of the important duties of a teacher is to observe the student in the classroom, laboratory and

arxiv: v1 [cs.lg] 3 May 2013

Speech Segmentation Using Probabilistic Phonetic Feature Hierarchy and Support Vector Machines

Reducing Features to Improve Bug Prediction

Interpreting ACER Test Results

Applying Fuzzy Rule-Based System on FMEA to Assess the Risks on Project-Based Software Engineering Education

Grade 6: Correlated to AGS Basic Math Skills

Introduction to the Practice of Statistics

A Case Study: News Classification Based on Term Frequency

Stacks Teacher notes. Activity description. Suitability. Time. AMP resources. Equipment. Key mathematical language. Key processes

WE GAVE A LAWYER BASIC MATH SKILLS, AND YOU WON T BELIEVE WHAT HAPPENED NEXT

End-of-Module Assessment Task K 2

Multivariate k-nearest Neighbor Regression for Time Series data -

Unsupervised Learning of Word Semantic Embedding using the Deep Structured Semantic Model

Memory-based grammatical error correction

Certified Six Sigma Professionals International Certification Courses in Six Sigma Green Belt

Analysis of Enzyme Kinetic Data

MARK 12 Reading II (Adaptive Remediation)

Digital Fabrication and Aunt Sarah: Enabling Quadratic Explorations via Technology. Michael L. Connell University of Houston - Downtown

Applications of data mining algorithms to analysis of medical data

Systematic reviews in theory and practice for library and information studies

Physics 270: Experimental Physics

The Action Similarity Labeling Challenge

University of Groningen. Systemen, planning, netwerken Bosman, Aart

A Vector Space Approach for Aspect-Based Sentiment Analysis

Softprop: Softmax Neural Network Backpropagation Learning

Why Did My Detector Do That?!

4-3 Basic Skills and Concepts

Algebra 1, Quarter 3, Unit 3.1. Line of Best Fit. Overview

Data Fusion Through Statistical Matching

INPE São José dos Campos

GROUP COMPOSITION IN THE NAVIGATION SIMULATOR A PILOT STUDY Magnus Boström (Kalmar Maritime Academy, Sweden)

Australian Journal of Basic and Applied Sciences

Assessing System Agreement and Instance Difficulty in the Lexical Sample Tasks of SENSEVAL-2

Getting Started with TI-Nspire High School Science

Teaching a Laboratory Section

What s in a Step? Toward General, Abstract Representations of Tutoring System Log Data

A Study of Metacognitive Awareness of Non-English Majors in L2 Listening

Detailed Instructions to Create a Screen Name, Create a Group, and Join a Group

Association Between Categorical Variables

How do adults reason about their opponent? Typologies of players in a turn-taking game

Transfer Learning Action Models by Measuring the Similarity of Different Domains

PROFESSIONAL TREATMENT OF TEACHERS AND STUDENT ACADEMIC ACHIEVEMENT. James B. Chapman. Dissertation submitted to the Faculty of the Virginia

Introduction to Causal Inference. Problem Set 1. Required Problems

WHEN THERE IS A mismatch between the acoustic

Artificial Neural Networks written examination

A Decision Tree Analysis of the Transfer Student Emma Gunu, MS Research Analyst Robert M Roe, PhD Executive Director of Institutional Research and

Learning Methods in Multilingual Speech Recognition

Malicious User Suppression for Cooperative Spectrum Sensing in Cognitive Radio Networks using Dixon s Outlier Detection Method

Multi-Lingual Text Leveling

Analysis of Emotion Recognition System through Speech Signal Using KNN & GMM Classifier

FOR TEACHERS ONLY. The University of the State of New York REGENTS HIGH SCHOOL EXAMINATION PHYSICAL SETTING/PHYSICS

Test How To. Creating a New Test

Measures of the Location of the Data

The Efficacy of PCI s Reading Program - Level One: A Report of a Randomized Experiment in Brevard Public Schools and Miami-Dade County Public Schools

The Good Judgment Project: A large scale test of different methods of combining expert predictions

University of Waterloo School of Accountancy. AFM 102: Introductory Management Accounting. Fall Term 2004: Section 4

Hierarchical Linear Models I: Introduction ICPSR 2015

An Empirical Analysis of the Effects of Mexican American Studies Participation on Student Achievement within Tucson Unified School District

Emotions from text: machine learning for text-based emotion prediction

Transcription:

Cross-Validation By: Huaicheng Liu Jiaxin Deng 1

2 Overviews 1.Model Assessment and Selection The Application of Cross-Validation 2.Cross-Validation 3.K-Fold Cross- Validation (1)What value should we choose for K (2)The wrong and right way to do Cross- Validation

3 The generalization performance of a learning method relates to its prediction capability on independent test data. Background: Model Assessment and Selection Assessment of this performance is extremely important in practice. Introduction

Formulas: (1).L(Y, f (x)) 4 ì (Y ï f (x)) 2 í ïy f (x) î é (2).Err t E L(Y, ù f (x)) t ë ê û ú é (3).Err E L(Y, ù f (x)) ë ê û ú E Err t (4).err 1 N N L(y i, f (x i )) i 1 [ ]

It is important to note that there are in fact two separate goals that we might have in mind: 5 Model selection: estimating the performance of different models in order to choose the best one. Model assessment: having chosen a final model, estimating its prediction error (generalization error) on new data.

6 The training set is used to fit the models The validation set is used to estimate prediction error for model selection The test set is used for assessment of the generalization error of the final chosen model.

7 Main Contents: Cross-Validation What is the Cross-Validation?

8 Conception: Cross-Validation is used to verify the classifier performance of a statistical analysis method. The data sets is divided into two parts, one part as a training set, another part as the test set. The classifier is trained with training set, test set is used to test the model obtained from the training. Be used to evaluate the classifier performance.

9

10 Hold-Out Method Leave-One-Out Cross-Validation K-Fold Cross- Validation Cross-Validation methods are: K*2-Fold Cross- Validation

11 Hold-Out Method Data sets are randomly divided into two groups, one group as the training set, another group as the test set. Using the training set training classifier, and then using the test set checking the model.

12 Leave-One-Out Cross-Validation We assume data sets have N samples, each samples separately as a test set, the rest samples as the training set, it can get N models. Finally, we can get the average of the prediction error about the models of test set.

13 K-Fold Cross-Validation The data sets into k groups on average, each subset data for a test set respectively, the remaining subset k-1 as the training set. For the Kth part, we fit the model to the other K-1 parts of the data, and calculate the prediction error of the fitted model when predicting the Kth part of data. We do this for k=1,,k and combine the K estimates of prediction error.

14 Details: Denote by f k(x) the fitted function, computed with the kth part of the data removed. Then the cross-validation estimate of prediction error is CV( f ) 1 N N L(y, f i k(i)(x )) i i 1

15 f (x,a ) Given a set of models indexed by a tuning parameter a, denote by k the a f (x,a ) th model fit with the kth part of the data removed. Then for this set of models we define CV ( f, a ) 1 N N i 1 L(, f k ( i ), a )) The function CV( f,a ) provides an estimate of the test error curve, and we find the tuning parameter a that minimizes it. Our final chosen model is f (x,a ) which we then fit to all the data. y i ( x i

16 K*2-Fold Cross-Validation The change of K-Fold Cross-Validation method, for each group of k, to average is divided into two sets: S1, S. Training with S1, and S test; then use S training, S1 test.

17 The quantity of K-Fold Cross- Validation estimates With k=5 or 10, we might guess that it estimates the expected error Err. If K=N we might guess that crossvalidation estimates the conditional error. Err t What value should we choose for K?

The Application of Cross-Validation 1. What value should we choose for K? 2.The wrong and right way to do Cross-Validation

Section 1: What value should we choose for K?

Section 1:What value should we choose for K? In cross-validation with given K, we consider: Err: the average prediction error; Variance of estimation; Computational burden etc.

Section 1:What value should we choose for K? FIGURE 1. Hypothetical learning curve for a classifier on a given task: a plot of 1-Err versus the size of the training set N.

Section 1:What value should we choose for K? Another situation: What if we only have 50 samples in the model?

Section 1:What value should we choose for K? If the learning curve has a considerable slope at the given training set size, five or tenfold cross-validation will estimate the true prediction error effectively.

Section 1:What value should we choose for K?

Section 2: The Wrong and Right Way to Do Cross-Validation

Section 2: The wrong and right way to do cross-validation The predictor: a variable of our classifier

Section 2: The wrong and right way to do cross-validation Example: Consider a classification problem with N=50 samples in two equal-sized classes, and p=5000 predictors that are independent of the class labels. The true error rate of any classifier is 50%.

Section 2: The wrong and right way to do cross-validation A typical strategy for analysis might be as follows: 1. Screen the predictors: find a subset of predictors that show fairly strong correlation with the class labels. 2. Using just this subset of predictors, build a multivariate classifier. 3. Use cross-validation to estimate the prediction error of the final model.

Section 2: The wrong and right way to do cross-validation Firstly we choose the 100 predictors having highest correlation with the class labels over the 50 samples. Then we use a 1-nearest neighbor classifier based on just these 100 predictors. Over 50 simulations from this setting, we build a multivariate classifier. Then we do cross-validation and find out the average CV error rate is 3% which is far lower than the true error rate of 50%.

Section 2: The wrong and right way to do cross-validation

Section 2: The wrong and right way to do cross-validation Review what we have done: We selected the 100 predictors having largest correlation with the class labels over all 50 samples. Then we leave samples out to do the cross- validation. Here comes the problem: The classifier is not completely independent to the test set,these predictors have already seen the left out samples.

Section 2: The wrong and right way to do cross-validation FIGURE 2: Histograms shows the correlation of class labels, in 10 randomly chosen samples, with the 100 predictors chosen using the incorrect version of cross-validation.

Section 2: The wrong and right way to do cross-validation Here is the correct way to carry out cross-validation in this example: 1.Divide the samples into K cross-validation folds at random. 2. (a) Find a subset of good predictors, using all of the samples except those in fold K (b) Build a multivariate classifier (c) Use the classifier to predict the class labels for the samples in fold k.

Section 2: The wrong and right way to do cross-validation FIGURE 3. Histograms shows the correlation of class labels, in 10 randomly chosen samples, with the 100 predictors chosen using the correct version of cross-validation.