CSEP 573 Final Exam March 12, 2016

Similar documents
The Evolution of Random Phenomena

Lecture 1: Machine Learning Basics

CS Machine Learning

Python Machine Learning

Machine Learning and Data Mining. Ensembles of Learners. Prof. Alexander Ihler

Learning From the Past with Experiment Databases

The Good Judgment Project: A large scale test of different methods of combining expert predictions

Lecture 1: Basic Concepts of Machine Learning

2/15/13. POS Tagging Problem. Part-of-Speech Tagging. Example English Part-of-Speech Tagsets. More Details of the Problem. Typical Problem Cases

WE GAVE A LAWYER BASIC MATH SKILLS, AND YOU WON T BELIEVE WHAT HAPPENED NEXT

Introduction to Ensemble Learning Featuring Successes in the Netflix Prize Competition

CS 446: Machine Learning

(Sub)Gradient Descent

Module 12. Machine Learning. Version 2 CSE IIT, Kharagpur

Active Learning. Yingyu Liang Computer Sciences 760 Fall

Learning Methods in Multilingual Speech Recognition

Semi-Supervised Face Detection

Rule Learning With Negation: Issues Regarding Effectiveness

Iterative Cross-Training: An Algorithm for Learning from Unlabeled Web Pages

Artificial Neural Networks written examination

Truth Inference in Crowdsourcing: Is the Problem Solved?

Softprop: Softmax Neural Network Backpropagation Learning

Discriminative Learning of Beam-Search Heuristics for Planning

Indian Institute of Technology, Kanpur

Semi-supervised methods of text processing, and an application to medical concept extraction. Yacine Jernite Text-as-Data series September 17.

ACTL5103 Stochastic Modelling For Actuaries. Course Outline Semester 2, 2014

COMPUTER-ASSISTED INDEPENDENT STUDY IN MULTIVARIATE CALCULUS

Model Ensemble for Click Prediction in Bing Search Ads

System Implementation for SemEval-2017 Task 4 Subtask A Based on Interpolated Deep Neural Networks

Introduction to Causal Inference. Problem Set 1. Required Problems

West s Paralegal Today The Legal Team at Work Third Edition

Managerial Decision Making

A Case Study: News Classification Based on Term Frequency

AUTOMATED TROUBLESHOOTING OF MOBILE NETWORKS USING BAYESIAN NETWORKS

OCR for Arabic using SIFT Descriptors With Online Failure Prediction

Exploration. CS : Deep Reinforcement Learning Sergey Levine

Evaluating Statements About Probability

Objectives. Chapter 2: The Representation of Knowledge. Expert Systems: Principles and Programming, Fourth Edition

Classify: by elimination Road signs

Universidade do Minho Escola de Engenharia

Finding Your Friends and Following Them to Where You Are

Twitter Sentiment Classification on Sanders Data using Hybrid Approach

Test How To. Creating a New Test

Word learning as Bayesian inference

Lahore University of Management Sciences. FINN 321 Econometrics Fall Semester 2017

Probability and Statistics Curriculum Pacing Guide

Predicting Students Performance with SimStudent: Learning Cognitive Skills from Observation

Contents. Foreword... 5

Australian Journal of Basic and Applied Sciences

Chinese Language Parsing with Maximum-Entropy-Inspired Parser

ScienceDirect. A Framework for Clustering Cardiac Patient s Records Using Unsupervised Learning Techniques

Mathematics process categories

arxiv: v1 [cs.lg] 15 Jun 2015

The 9 th International Scientific Conference elearning and software for Education Bucharest, April 25-26, / X

Create Quiz Questions

Algebra 1, Quarter 3, Unit 3.1. Line of Best Fit. Overview

A Neural Network GUI Tested on Text-To-Phoneme Mapping

Radius STEM Readiness TM

Using dialogue context to improve parsing performance in dialogue systems

Corrective Feedback and Persistent Learning for Information Extraction

Reinforcement Learning by Comparing Immediate Reward

Short vs. Extended Answer Questions in Computer Science Exams

FCE Speaking Part 4 Discussion teacher s notes

On-Line Data Analytics

Honors Mathematics. Introduction and Definition of Honors Mathematics

Linking Task: Identifying authors and book titles in verbose queries

POS tagging of Chinese Buddhist texts using Recurrent Neural Networks

CROSS COUNTRY CERTIFICATION STANDARDS

1. READING ENGAGEMENT 2. ORAL READING FLUENCY

11/29/2010. Statistical Parsing. Statistical Parsing. Simple PCFG for ATIS English. Syntactic Disambiguation

Grade 6: Correlated to AGS Basic Math Skills

A study of speaker adaptation for DNN-based speech synthesis

Cognitive Thinking Style Sample Report

Mathematics Scoring Guide for Sample Test 2005

Rule Learning with Negation: Issues Regarding Effectiveness

Why Did My Detector Do That?!

Classifying combinations: Do students distinguish between different types of combination problems?

Curriculum Design Project with Virtual Manipulatives. Gwenanne Salkind. George Mason University EDCI 856. Dr. Patricia Moyer-Packenham

CAAP. Content Analysis Report. Sample College. Institution Code: 9011 Institution Type: 4-Year Subgroup: none Test Date: Spring 2011

Using the Attribute Hierarchy Method to Make Diagnostic Inferences about Examinees Cognitive Skills in Algebra on the SAT

A Model of Knower-Level Behavior in Number Concept Development

CSC200: Lecture 4. Allan Borodin

Physics 270: Experimental Physics

Grade 2: Using a Number Line to Order and Compare Numbers Place Value Horizontal Content Strand

Facing our Fears: Reading and Writing about Characters in Literary Text

Comparison of network inference packages and methods for multiple networks inference

POLA: a student modeling framework for Probabilistic On-Line Assessment of problem solving performance

Teaching a Laboratory Section

Maximizing Learning Through Course Alignment and Experience with Different Types of Knowledge

Cross Language Information Retrieval

A Bayesian Learning Approach to Concept-Based Document Classification

Applications of data mining algorithms to analysis of medical data

Short Text Understanding Through Lexical-Semantic Analysis

Toward Probabilistic Natural Logic for Syllogistic Reasoning

Software Maintenance

Assignment 1: Predicting Amazon Review Ratings

STAT 220 Midterm Exam, Friday, Feb. 24

A Genetic Irrational Belief System

CS 1103 Computer Science I Honors. Fall Instructor Muller. Syllabus

Transcription:

CSEP 573 Final Exam March 12, 2016 Name: This exam is take home and is due on Sunday March 20th at 11:45 pm. You can submit it in the online DropBox or to the course staff. This exam should not take significantly longer than 3 hours to complete if you have already carefully studied all of course material. Studying while taking the exam may take longer. :) This exam is open book and open notes, but you must complete all of the work yourself with no help from others. Please feel free to post clarification questions to the course message board, but please do not discuss solutions. If you show your work and *briefly* describe your approach to the longer questions, we will happily give partially credit, where possible. There are 8 pages in this exam. 1

Scores Q.1 (30) Q.2 (30) Q.3 (20) Q.4 (20) Q.5 (20) Total (120) 2

Question 1 True/False 30 points Circle the correct answer for each True / False question. If you think a question is ambiguous, please add a very short explanation of the interpretation you are making, and we will do our best to grade accordingly. 1. True / False Adding more edges to a Bayesian network can restrict the space of possible distributions it can represent. (3 pt) 2. True / False For answering conditional queries in Bayesian networks, rejection sampling has generally been observed to provide worse estimates that likelihood weighting (when given the same number of samples). (3 pt) 3. True / False Naive Bayes models always encode incorrect independence assumptions. (3 pt) 4. True / False The Perceptron will always converge if the data is linearly separable. (3 pt) 5. True / False Overfitting occurs when the test error is higher than the training error. (3 pt) 6. True / False Inference by enumeration can produce incorrect results if the Bayes network is dense (has many edges). (3 pt) 7. True / False The HMM forward inference algorithm takes time that is polynomial in the number of observations that have been received. (3 pt) 8. True / False The choice of the variable ordering in variable elimination does not change the correctness of the algorithm (you will always get the correct answer for any ordering). (3 pt) 9. True / False Naive Bayes, as presented in class, is an online learning algorithm. (3 pt) 10. True / False The number of parameters in a Bayesian network grows exponentially with the highest out degree of a node in the network. (3 pt) 3

Question 2 Short Answer 30 points These short answer questions can be answered with a few sentences each. Please be brief, we will subtract points for very long responses (e.g. more than a sentence or two for each part of the question). 1. Short Answer Briefly describe how you would decide which algorithm to use for answering queries to a Bayesian network. What is the key property of the network that, if known, would best help you make the appropriate decision. (5 pts) 2. Short Answer In machine learning, explain generalization and over-fitting. Describe an experimental setup that correctly measures generalization. Assume that your algorithm has one hyperparameter that must be set. (5 pts) 3. Short Answer Briefly describe a situation in which you would use Bayes rule, and why, from the examples we saw in class. (5 pts) 4. Short Answer Briefly describe a sign of overfitting in Naive Bayes learning, and how it can be avoided. (5 pts) 4

5. Short Answer Briefly describe when you would prefer to report precision and recall for a learned classifier, instead of accuracy. (5 its) 6. Short Answer Briefly describe the difference between outcomes and events in joint probability models. (5 its) 5

Question 3 Hidden Markov Models: Tricky Coins 20 points Consider the following random process. A magician has two coins, each of which has an unknown type. They can either be fair coins (50/50 odds of heads vs tails), or trick coins that either (1) have heads on both sides or (2) have tails on both sides. A priori, each coin is equally likely to be any of the three possible types. At every time step, the magician randomly picks a coin (without showing you which one was selected), flips it, and shows you the result. However, unfortunately, the magician only shows you the coin very briefly, and 10% of the time you make a mistake when you observe the true side of the coin (e.g. you see heads when it was actually tails). 1. Model this process as an HMM. Specify all of the necessary parameters. You do not have to write out all of the probability distributions explicitly, but be careful to specify what values they would have if you did the full enumeration. [15 pts] 2. Consider the Markov model that would result if you ran the process above and always observed heads. What is the stationary distribution of this model? [5pts] 6

Question 4 Bayesian Networks 20 points Consider the following two Bayesian networks, which are variations on the alarm network we discussed in class: MaryCalls MaryCalls JohnCalls JohnCalls Alarm Earthquake Burglary Burglary Earthquake Alarm (a) (b) 1. Based on the network structure alone, which network above makes the most independence (marginal or conditional) assumptions? [3 pts] 2. Draw a new Bayesian network with the same set of random variables that makes as many independence assumptions as possible. [5 pts] 3. Write down two conditional independence assumptions encoded by the structure of network (a). If there are not two, write as many as possible. [6 pts] 4. Write down two conditional independence assumptions encoded by the structure of network (b). If there are not two, write as many as possible. [6 pts] 7

H(x) =sgn! TX t h t (x) In this question we will use decision trees as our weak learners, which classify a point as {1, 1} based on a sequence of threshold splits on its features (here x, y). In the questions below, be sure to mark which regions are marked positive/negative, and assume that ties are broken arbitrarily. Question 5 Perceptron 20 points t=1 1. Assume that our weak learners are decision trees of depth 1 (i.e. decision stumps), Consider the following which minimize training the weighted set, where trainingthe error. xusing and ythe axis dataset represent below, draw the the values decisionof two features boundary learned by h and the examples are marked with 1. + for the positive class and for the negative class: 1. In the figure above, draw a decision boundary that the Perceptron could learn. [5 pts] 2. Briefly describe why you drew the line you did for the previous question. Are other separators possible? [5 pts] 9 3. The Percepton is known to not converge in some situations. In the data above, circle one datapoint that, if you were to change its class, the Perceptron would no longer converge. [5 pts] 4. Now, given your new dataset from the last question, briefly describe a change that you could make which would, again, allow the Perceptron to converge. You cannot change the number of training examples or the labels they are assigned, but anything else is fair game. [5 pts] 8