Learning From Data Yaser Abu-Mostafa, Caltech Self-paced version. Homework # 7

Similar documents
Lecture 1: Machine Learning Basics

Python Machine Learning

Introduction to Ensemble Learning Featuring Successes in the Netflix Prize Competition

Algebra 1, Quarter 3, Unit 3.1. Line of Best Fit. Overview

Active Learning. Yingyu Liang Computer Sciences 760 Fall

Probability and Statistics Curriculum Pacing Guide

Lahore University of Management Sciences. FINN 321 Econometrics Fall Semester 2017

Spring 2014 SYLLABUS Michigan State University STT 430: Probability and Statistics for Engineering

Linking Task: Identifying authors and book titles in verbose queries

12- A whirlwind tour of statistics

CS 1103 Computer Science I Honors. Fall Instructor Muller. Syllabus

STA 225: Introductory Statistics (CT)

Statewide Framework Document for:

Cal s Dinner Card Deals

Machine Learning and Data Mining. Ensembles of Learners. Prof. Alexander Ihler

The Good Judgment Project: A large scale test of different methods of combining expert predictions

SAT MATH PREP:

NCEO Technical Report 27

The lab is designed to remind you how to work with scientific data (including dealing with uncertainty) and to review experimental design.

Switchboard Language Model Improvement with Conversational Data from Gigaword

Lecture 1: Basic Concepts of Machine Learning

Mathematics subject curriculum

Learning From the Past with Experiment Databases

Mathematics process categories

Detailed course syllabus

Math 098 Intermediate Algebra Spring 2018

CHAPTER 4: REIMBURSEMENT STRATEGIES 24

(Sub)Gradient Descent

The Evolution of Random Phenomena

School of Innovative Technologies and Engineering

OPTIMIZATINON OF TRAINING SETS FOR HEBBIAN-LEARNING- BASED CLASSIFIERS

MAT 122 Intermediate Algebra Syllabus Summer 2016

Mathematics. Mathematics

Foothill College Summer 2016

Certified Six Sigma Professionals International Certification Courses in Six Sigma Green Belt

Mathematics Assessment Plan

Business Analytics and Information Tech COURSE NUMBER: 33:136:494 COURSE TITLE: Data Mining and Business Intelligence

WE GAVE A LAWYER BASIC MATH SKILLS, AND YOU WON T BELIEVE WHAT HAPPENED NEXT

ENME 605 Advanced Control Systems, Fall 2015 Department of Mechanical Engineering

Assignment 1: Predicting Amazon Review Ratings

CAAP. Content Analysis Report. Sample College. Institution Code: 9011 Institution Type: 4-Year Subgroup: none Test Date: Spring 2011

Artificial Neural Networks written examination

Module 12. Machine Learning. Version 2 CSE IIT, Kharagpur

AGS THE GREAT REVIEW GAME FOR PRE-ALGEBRA (CD) CORRELATED TO CALIFORNIA CONTENT STANDARDS

Entrepreneurial Discovery and the Demmert/Klein Experiment: Additional Evidence from Germany

CS Machine Learning

Math 96: Intermediate Algebra in Context

Human Emotion Recognition From Speech

Remainder Rules. 3. Ask students: How many carnations can you order and what size bunches do you make to take five carnations home?

SOUTHERN MAINE COMMUNITY COLLEGE South Portland, Maine 04106

Level 1 Mathematics and Statistics, 2015

One Way Draw a quick picture.

Using Calculators for Students in Grades 9-12: Geometry. Re-published with permission from American Institutes for Research

AP Calculus AB. Nevada Academic Standards that are assessable at the local level only.

Physics 270: Experimental Physics

Quantitative analysis with statistics (and ponies) (Some slides, pony-based examples from Blase Ur)

Probabilistic Latent Semantic Analysis

TCC Jim Bolen Math Competition Rules and Facts. Rules:

Activities, Exercises, Assignments Copyright 2009 Cem Kaner 1

Grade 6: Correlated to AGS Basic Math Skills

MTH 215: Introduction to Linear Algebra

learning collegiate assessment]

OCR for Arabic using SIFT Descriptors With Online Failure Prediction

Unit 3: Lesson 1 Decimals as Equal Divisions

Analysis of Emotion Recognition System through Speech Signal Using KNN & GMM Classifier

Syllabus ENGR 190 Introductory Calculus (QR)

STA2023 Introduction to Statistics (Hybrid) Spring 2013

Teaching a Laboratory Section

LANGUAGE DIVERSITY AND ECONOMIC DEVELOPMENT. Paul De Grauwe. University of Leuven

Session 2B From understanding perspectives to informing public policy the potential and challenges for Q findings to inform survey design

AGENDA LEARNING THEORIES LEARNING THEORIES. Advanced Learning Theories 2/22/2016

Twitter Sentiment Classification on Sanders Data using Hybrid Approach

Issues in the Mining of Heart Failure Datasets

Getting Started with TI-Nspire High School Science

Course Syllabus for Math

State University of New York at Buffalo INTRODUCTION TO STATISTICS PSC 408 Fall 2015 M,W,F 1-1:50 NSC 210

Using the Attribute Hierarchy Method to Make Diagnostic Inferences about Examinees Cognitive Skills in Algebra on the SAT

THE ROLE OF TOOL AND TEACHER MEDIATIONS IN THE CONSTRUCTION OF MEANINGS FOR REFLECTION

AP Statistics Summer Assignment 17-18

Instructor: Matthew Wickes Kilgore Office: ES 310

Given a directed graph G =(N A), where N is a set of m nodes and A. destination node, implying a direction for ow to follow. Arcs have limitations

AN EXAMPLE OF THE GOMORY CUTTING PLANE ALGORITHM. max z = 3x 1 + 4x 2. 3x 1 x x x x N 2

UNIT ONE Tools of Algebra

Honors Mathematics. Introduction and Definition of Honors Mathematics

Report on organizing the ROSE survey in France

Problem-Solving with Toothpicks, Dots, and Coins Agenda (Target duration: 50 min.)

Course Outline. Course Grading. Where to go for help. Academic Integrity. EE-589 Introduction to Neural Networks NN 1 EE

An ICT environment to assess and support students mathematical problem-solving performance in non-routine puzzle-like word problems

arxiv: v1 [cs.lg] 15 Jun 2015

PEER EFFECTS IN THE CLASSROOM: LEARNING FROM GENDER AND RACE VARIATION *

Office Hours: Mon & Fri 10:00-12:00. Course Description

Measurement. When Smaller Is Better. Activity:

1.11 I Know What Do You Know?

THE UNIVERSITY OF SYDNEY Semester 2, Information Sheet for MATH2068/2988 Number Theory and Cryptography

Math-U-See Correlation with the Common Core State Standards for Mathematical Content for Third Grade

SARDNET: A Self-Organizing Feature Map for Sequences

Technical Manual Supplement

Sociology 521: Social Statistics and Quantitative Methods I Spring Wed. 2 5, Kap 305 Computer Lab. Course Website

DOCTORAL SCHOOL TRAINING AND DEVELOPMENT PROGRAMME

Transcription:

Learning From Data Yaser Abu-Mostafa, Caltech http://work.caltech.edu/telecourse Self-paced version Homework # 7 All questions have multiple-choice answers ([a], [b], [c],...). You can collaborate with others, but do not discuss the selected or excluded choices in the answers. You can consult books and notes, but not other people s solutions. Your solutions should be based on your own work. Definitions and notation follow the lectures. Note about the homework The goal of the homework is to facilitate a deeper understanding of the course material. The questions are not designed to be puzzles with catchy answers. They are meant to make you roll up your sleeves, face uncertainties, and approach the problem from different angles. The problems range from easy to difficult, and from practical to theoretical. Some problems require running a full experiment to arrive at the answer. The answer may not be obvious or numerically close to one of the choices, but one (and only one) choice will be correct if you follow the instructions precisely in each problem. You are encouraged to explore the problem further by experimenting with variations on these instructions, for the learning benefit. You are also encouraged to take part in the forum http://book.caltech.edu/bookforum where there are many threads about each homework set. We hope that you will contribute to the discussion as well. Please follow the forum guidelines for posting answers (see the BEFORE posting answers announcement at the top there). c 2012-2015 Yaser Abu-Mostafa. All rights reserved. No redistribution in any format. No translation or derivative products without written permission. 1

Validation In the following problems, use the data provided in the files in.dta and out.dta for Homework # 6. We are going to apply linear regression with a nonlinear transformation for classification (without regularization). The nonlinear transformation is given by φ 0 through φ 7 which transform (x 1, x 2 ) into 1 x 1 x 2 x 2 1 x 2 2 x 1 x 2 x 1 x 2 x 1 + x 2 To illustrate how taking out points for validation affects the performance, we will consider the hypotheses trained on D train (without restoring the full D for training after validation is done). 1. Split in.dta into training (first 25 examples) and validation (last 10 examples). Train on the 25 examples only, using the validation set of 10 examples to select between five models that apply linear regression to φ 0 through φ k, with k = 3, 4, 5, 6, 7. For which model is the classification error on the validation set smallest? 2. Evaluate the out-of-sample classification error using out.dta on the 5 models to see how well the validation set predicted the best of the 5 models. For which model is the out-of-sample classification error smallest? 3. Reverse the role of training and validation sets; now training with the last 10 examples and validating with the first 25 examples. For which model is the classification error on the validation set smallest? 2

4. Once again, evaluate the out-of-sample classification error using out.dta on the 5 models to see how well the validation set predicted the best of the 5 models. For which model is the out-of-sample classification error smallest? 5. What values are closest in Euclidean distance to the out-of-sample classification error obtained for the model chosen in Problems 1 and 3, respectively? [a] 0.0, 0.1 [b] 0.1, 0.2 [c] 0.1, 0.3 [d] 0.2, 0.2 [e] 0.2, 0.3 Validation Bias 6. Let e 1 and e 2 be independent random variables, distributed uniformly over the interval [0, 1]. Let e = min(e 1, e 2 ). The expected values of e 1, e 2, e are closest to [a] 0.5, 0.5, 0 [b] 0.5, 0.5, 0.1 [c] 0.5, 0.5, 0.25 [d] 0.5, 0.5, 0.4 [e] 0.5, 0.5, 0.5 Cross Validation 7. You are given the data points (x, y): ( 1, 0), (ρ, 1), (1, 0), ρ 0, and a choice between two models: constant { h 0 (x) = b } and linear { h 1 (x) = ax + b }. For which value of ρ would the two models be tied using leave-one-out crossvalidation with the squared error measure? 3

[a] 3 + 4 [b] 3 1 [c] 9 + 4 6 [d] 9 6 [e] None of the above PLA vs. SVM Notice: Quadratic Programming packages sometimes need tweaking and have numerical issues, and this is characteristic of packages you will use in practical ML situations. Your understanding of support vectors will help you get to the correct answers. In the following problems, we compare PLA to SVM with hard margin 1 on linearly separable data sets. For each run, you will create your own target function f and data set D. Take d = 2 and choose a random line in the plane as your target function f (do this by taking two random, uniformly distributed points on [ 1, 1] [ 1, 1] and taking the line passing through them), where one side of the line maps to +1 and the other maps to 1. Choose the inputs x n of the data set as random points in X = [ 1, 1] [ 1, 1], and evaluate the target function on each x n to get the corresponding output y n. If all data points are on one side of the line, discard the run and start a new run. Start PLA with the all-zero vector and pick the misclassified point for each PLA iteration at random. Run PLA to find the final hypothesis g PLA and measure the disagreement between f and g PLA as P[f(x) g PLA (x)] (you can either calculate this exactly, or approximate it by generating a sufficiently large, separate set of points to evaluate it). Now, run SVM on the same data to find the final hypothesis g SVM by solving 1 2 wt w ( s.t. y n w T x n + b ) 1 min w,b using quadratic programming on the primal or the dual problem. Measure the disagreement between f and g SVM as P[f(x) g SVM (x)], and count the number of support vectors you get in each run. 8. For N = 10, repeat the above experiment for 1000 runs. How often is g SVM better than g PLA in approximating f? The percentage of time is closest to: [a] 20% 1 For hard margin in SVM packages, set C. 4

[b] 40% [c] 60% [d] 80% [e] 100% 9. For N = 100, repeat the above experiment for 1000 runs. How often is g SVM better than g PLA in approximating f? The percentage of time is closest to: [a] 10% [b] 30% [c] 50% [d] 70% [e] 90% 10. For the case N = 100, which of the following is the closest to the average number of support vectors of g SVM (averaged over the 1000 runs)? [a] 2 [b] 3 [c] 5 [d] 10 [e] 20 5