University of Wisconsin-Madison Computer Sciences Department. CS 760 Machine Learning. Fall Midterm Exam. (one page of notes allowed)

Similar documents
Module 12. Machine Learning. Version 2 CSE IIT, Kharagpur

CS Machine Learning

Lecture 1: Machine Learning Basics

(Sub)Gradient Descent

Python Machine Learning

Learning From the Past with Experiment Databases

Axiom 2013 Team Description Paper

Lecture 10: Reinforcement Learning

Word Segmentation of Off-line Handwritten Documents

The 9 th International Scientific Conference elearning and software for Education Bucharest, April 25-26, / X

Rule Learning With Negation: Issues Regarding Effectiveness

Learning Methods for Fuzzy Systems

OPTIMIZATINON OF TRAINING SETS FOR HEBBIAN-LEARNING- BASED CLASSIFIERS

An OO Framework for building Intelligence and Learning properties in Software Agents

Reinforcement Learning by Comparing Immediate Reward

Rule Learning with Negation: Issues Regarding Effectiveness

CS 446: Machine Learning

ADVANCED MACHINE LEARNING WITH PYTHON BY JOHN HEARTY DOWNLOAD EBOOK : ADVANCED MACHINE LEARNING WITH PYTHON BY JOHN HEARTY PDF

A Case Study: News Classification Based on Term Frequency

Exploration. CS : Deep Reinforcement Learning Sergey Levine

System Implementation for SemEval-2017 Task 4 Subtask A Based on Interpolated Deep Neural Networks

Major Milestones, Team Activities, and Individual Deliverables

Introduction to Ensemble Learning Featuring Successes in the Netflix Prize Competition

Twitter Sentiment Classification on Sanders Data using Hybrid Approach

UNIVERSITY OF CALIFORNIA SANTA CRUZ TOWARDS A UNIVERSAL PARAMETRIC PLAYER MODEL

Laboratorio di Intelligenza Artificiale e Robotica

Switchboard Language Model Improvement with Conversational Data from Gigaword

Artificial Neural Networks written examination

ENG 111 Achievement Requirements Fall Semester 2007 MWF 10:30-11: OLSC

Generative models and adversarial training

Georgetown University at TREC 2017 Dynamic Domain Track

Physics 270: Experimental Physics

Grade 2: Using a Number Line to Order and Compare Numbers Place Value Horizontal Content Strand

Softprop: Softmax Neural Network Backpropagation Learning

Laboratorio di Intelligenza Artificiale e Robotica

Measurement. When Smaller Is Better. Activity:

Reducing Features to Improve Bug Prediction

Designing a Rubric to Assess the Modelling Phase of Student Design Projects in Upper Year Engineering Courses

Math-U-See Correlation with the Common Core State Standards for Mathematical Content for Third Grade

The lab is designed to remind you how to work with scientific data (including dealing with uncertainty) and to review experimental design.

Testing A Moving Target: How Do We Test Machine Learning Systems? Peter Varhol Technology Strategy Research, USA

Student Handbook 2016 University of Health Sciences, Lahore

Iterative Cross-Training: An Algorithm for Learning from Unlabeled Web Pages

Active Learning. Yingyu Liang Computer Sciences 760 Fall

Designing a Computer to Play Nim: A Mini-Capstone Project in Digital Design I

University of Victoria School of Exercise Science, Physical and Health Education EPHE 245 MOTOR LEARNING. Calendar Description Units: 1.

International Business BADM 455, Section 2 Spring 2008

SARDNET: A Self-Organizing Feature Map for Sequences

Teaching a Laboratory Section

Machine Learning from Garden Path Sentences: The Application of Computational Linguistics

Informal Comparative Inference: What is it? Hand Dominance and Throwing Accuracy

with The Grouchy Ladybug

Data Structures and Algorithms

A Neural Network GUI Tested on Text-To-Phoneme Mapping

Experiment Databases: Towards an Improved Experimental Methodology in Machine Learning

The ABCs of FBAs and BIPs Training

CSL465/603 - Machine Learning

Using EEG to Improve Massive Open Online Courses Feedback Interaction

Lecture 1: Basic Concepts of Machine Learning

Australian Journal of Basic and Applied Sciences

Hardhatting in a Geo-World

WESTERN NATIONAL ROUNDUP LIVESTOCK QUIZ BOWL

Chapter 2 Rule Learning in a Nutshell

Using focal point learning to improve human machine tacit coordination

Specification and Evaluation of Machine Translation Toy Systems - Criteria for laboratory assignments

2/15/13. POS Tagging Problem. Part-of-Speech Tagging. Example English Part-of-Speech Tagsets. More Details of the Problem. Typical Problem Cases

The Good Judgment Project: A large scale test of different methods of combining expert predictions

Using computational modeling in language acquisition research

Learning Structural Correspondences Across Different Linguistic Domains with Synchronous Neural Language Models

OCR for Arabic using SIFT Descriptors With Online Failure Prediction

Language Acquisition Fall 2010/Winter Lexical Categories. Afra Alishahi, Heiner Drenhaus

Semi-supervised methods of text processing, and an application to medical concept extraction. Yacine Jernite Text-as-Data series September 17.

An Introduction to Simio for Beginners

ReinForest: Multi-Domain Dialogue Management Using Hierarchical Policies and Knowledge Ontology

Document number: 2013/ Programs Committee 6/2014 (July) Agenda Item 42.0 Bachelor of Engineering with Honours in Software Engineering

Speeding Up Reinforcement Learning with Behavior Transfer

Semi-Supervised Face Detection

Spring 2015 IET4451 Systems Simulation Course Syllabus for Traditional, Hybrid, and Online Classes

ENME 605 Advanced Control Systems, Fall 2015 Department of Mechanical Engineering

MKTG 611- Marketing Management The Wharton School, University of Pennsylvania Fall 2016

Functional Skills Mathematics Level 2 assessment

Calibration of Confidence Measures in Speech Recognition

Bayllocator: A proactive system to predict server utilization and dynamically allocate memory resources using Bayesian networks and ballooning

Time series prediction

AUTOMATED TROUBLESHOOTING OF MOBILE NETWORKS USING BAYESIAN NETWORKS

Program Rating Sheet - University of South Carolina - Columbia Columbia, South Carolina

College Pricing. Ben Johnson. April 30, Abstract. Colleges in the United States price discriminate based on student characteristics

Spring 2014 SYLLABUS Michigan State University STT 430: Probability and Statistics for Engineering

Students Understanding of Graphical Vector Addition in One and Two Dimensions

On the Combined Behavior of Autonomous Resource Management Agents

Statewide Framework Document for:

Textbook Chapter Analysis this is an ungraded assignment, however a reflection of the task is part of your journal

Assessing System Agreement and Instance Difficulty in the Lexical Sample Tasks of SENSEVAL-2

A Reinforcement Learning Variant for Control Scheduling

Machine Learning and Data Mining. Ensembles of Learners. Prof. Alexander Ihler

LEGO MINDSTORMS Education EV3 Coding Activities

Speech Recognition at ICSI: Broadcast News and beyond

Universidade do Minho Escola de Engenharia

Chinese Language Parsing with Maximum-Entropy-Inspired Parser

Transcription:

University of Wisconsin-Madison Computer Sciences Department CS 760 Machine Learning Fall 1997 Midterm Exam (one page of notes allowed) 100 points, 90 minutes December 3, 1997 Write your answers on these pages and show your work. If you feel that a question is not fully specified, state any assumptions you need to make in order to solve the problem. You may use the backs of these sheets for scratch work. Notice that all questions do not have the same point-value. Divide your time appropriately. Before starting, write your name on this and all other pages of this exam. Also, make sure your exam contains five (5) problems on ten (10) pages. Problem Score Max Score 1 25 2 25 3 25 4 15 5 10 Total 100 CS 760 Machine Learning

1. Learning from Labelled Examples (25 pts) Part A Assume you are given the following three features with the possible values shown. The first two are nominally valued, while the third is real-valued. F1 {v1, v2} F2 {v3, v4, v5} F3 [0, 9] Using ID3 and its max-gain formula, produce a decision tree that accounts for the following training examples. Show all your work. F1 = v1 F2 = v3 F3 = 7 + F1 = v2 F2 = v4 F3 = 8 + F1 = v1 F2 = v5 F3 = 9 - F1 = v1 F2 = v3 F3 = 2 - Part B Discuss how one could apply the Naive Bayes algorithm to the above training data. Explain how the resulting classifier would categorize the following testset example: F1 = v1 F2 = v4 F3 = 3? (page 2 of 10)

Part C Explain how a two-nearest neighbor algorithm would categorize the test example of Part B, given the above training set. Part D According to the Bayesian interpretation of what a neural network should optimize that was discussed in lecture, which error function is most appropriate for categorical problems like the above? For this case, what statistical interpretation should we give to the network s output? How would your answers to the above questions change if the task was to learn a real-valued function? (page 3 of 10)

2. Reinforcement Learning and Neural Networks (25 pts) Part A Imagine an environment like the Agent World, but where the learner can only move LEFT or RIGHT. The agent s sensors report the distances to the left wall and right walls. The world is 3 meters wide, the agent s step size is 1 meter, and the agent always starts 1 meter from the right wall. Finally, the agent gets a reward of +2 when moving left and of +1 when moving right, unless it tries to move into a wall, in which case its reward is -10. (Assume the agent is dimensionless - i.e., has zero width - and that the agent can abut the wall without penalty.) Apply the one-step, Q-learning algorithm to this problem, using a table to represent your Q- function (all entries in the table should initially be zero); let gamma = 0.9. In the space below, show the state of the Q-table after the first two (2) steps of the learner. For simplicity, always follow the current policy during learning (i.e., no exploration) and break ties by moving to the right. Briefly explain why the steps were chosen. initial state of the Q table: state of the Q table after the agent s first step: state of the Q table after the agent s second step: (page 4 of 10)

Part B Instead of using a Q-table, imagine you used a perceptron to learn the Q function. Assume your perceptron has two inputs - the (unnormalized) distances to the left and right walls - and that all the free parameters in your perceptron are initialized to 1. Under the assumptions of Part A, what would be the first training example given to this perceptron? Explain your answer. Finally, show the changes (if any) in the perceptron that result from this training example (using the delta rule with eta = 0.1). Part C Describe one (1) important strength and one (1) major weakness of using a compact representation like neural networks, rather than complete tables, to represent Q functions. strength: weakness: (page 5 of 10)

Part D In HW 4 s Agent World simulator, agents get a reward (penalty) of -2 immediately upon pushing a mineral and a reward of +25 if this mineral later hits another player. Assume that a pushed mineral always hits one (and only one) other player after exactly N time steps. Using gamma = 0.9, for what range of values for N would the optimal policy never involve pushing a mineral? Obviously, in terms of the real (undiscounted) score of the game, under the above assumptions it would always be a good idea to push minerals. Why, then, do we use discounting in our Q function? 3. Experimental Methodology and Computational Learning Theory (25 pts) Part A Assume that you have drawn (with replacement) 1000 examples from some fixed distribution for a two-category problem, and that after dealing with the overfitting problem, your learning algorithm categorizes 898 of these training examples correctly. You next draw (again, with replacement) another 100 examples and measure the accuracy of your learned concept (without doing any further learning, i.e., without adjusting the concept learned from the first 1000 examples). Of this second set of examples, your algorithm categorizes 85 of them correctly. Within what interval can you say, with 95% confidence, that the true accuracy of your learned concept lies? (page 6 of 10)

State and briefly explain one (1) major assumption underlying the above calculation. Part B Assume that is it known that the true concept for the above problem is in the following family of functions: circles of radius R centered at point X, Y where R {1, 2, 3, 4, 5}, X {3, 4, 5, 6}, Y {-1, 0, 1} Provide an upper bound on the number of training examples needed so that, with probability 0.95, a concept that is consistent with the training examples has an error rate of no more than 15%. Part C Briefly discuss what you believe to be the important difference between the analyses of expected future error rate in Parts A and B. (page 7 of 10)

Part D Assume your preferred learning algorithm has one problem-specific parameter (with 7 possible values) to set. Imagine you are given a new dataset of 500 labelled examples. Briefly discuss: (i) how you would go about choosing a good setting for this parameter (ii) how you would estimate the future performance of the concept learned by your algorithm on the task represented by these 500 examples (iii) the most important assumption about this dataset and its acquisition that you are making when you apply the experimental methodology you described (page 8 of 10)

4. Short Essays (15 pts) Briefly explain the importance in machine learning of the following: VC Dimension ensembles building-block hypothesis decision-tree pruning policy iteration (page 9 of 10)

5. Longer Essays (10 pts) IF YOU WISH, YOU MAY RIP OFF THIS FINAL QUESTION, TAKE IT HOME WITH YOU, AND RETURN IT IN CLASS ON MONDAY. HOWEVER, DO NOT DISCUSS YOUR ANSWERS WITH ANYONE ELSE UNTIL AFTER MONDAY S CLASS (this constraint holds even if you turn in your answer to this question today). You may type your answers on a separate sheet of paper, but do not use more than one side of a normal sheet of paper and use a reasonably large font. Part A Describe what you believe to be the most important idea in machine learning (other than those topics listed in Question 4). Justify your answer. Part B Describe what you believe to be the most important open issue in machine learning. Briefly sketch an approach for addressing it. (page 10 of 10)