Foundations of Intelligent Systems CSCI (Spring 2014)

Similar documents
Module 12. Machine Learning. Version 2 CSE IIT, Kharagpur

Machine Learning and Data Mining. Ensembles of Learners. Prof. Alexander Ihler

Python Machine Learning

Artificial Neural Networks written examination

Lecture 1: Machine Learning Basics

(Sub)Gradient Descent

Introduction to Ensemble Learning Featuring Successes in the Netflix Prize Competition

CS Machine Learning

Rule-based Expert Systems

Rule Learning With Negation: Issues Regarding Effectiveness

Axiom 2013 Team Description Paper

Objectives. Chapter 2: The Representation of Knowledge. Expert Systems: Principles and Programming, Fourth Edition

CSL465/603 - Machine Learning

Analysis of Hybrid Soft and Hard Computing Techniques for Forex Monitoring Systems

Discriminative Learning of Beam-Search Heuristics for Planning

OPTIMIZATINON OF TRAINING SETS FOR HEBBIAN-LEARNING- BASED CLASSIFIERS

STT 231 Test 1. Fill in the Letter of Your Choice to Each Question in the Scantron. Each question is worth 2 point.

Assignment 1: Predicting Amazon Review Ratings

Answer Key For The California Mathematics Standards Grade 1

Rule Learning with Negation: Issues Regarding Effectiveness

Universidade do Minho Escola de Engenharia

Knowledge-Based - Systems

Laboratorio di Intelligenza Artificiale e Robotica

Proof Theory for Syntacticians

AGS THE GREAT REVIEW GAME FOR PRE-ALGEBRA (CD) CORRELATED TO CALIFORNIA CONTENT STANDARDS

FUZZY EXPERT. Dr. Kasim M. Al-Aubidy. Philadelphia University. Computer Eng. Dept February 2002 University of Damascus-Syria

The 9 th International Scientific Conference elearning and software for Education Bucharest, April 25-26, / X

AQUA: An Ontology-Driven Question Answering System

Dublin City Schools Mathematics Graded Course of Study GRADE 4

MYCIN. The MYCIN Task

An OO Framework for building Intelligence and Learning properties in Software Agents

QuickStroke: An Incremental On-line Chinese Handwriting Recognition System

Mathematics process categories

Name: Class: Date: ID: A

Learning Methods for Fuzzy Systems

Exploration. CS : Deep Reinforcement Learning Sergey Levine

Switchboard Language Model Improvement with Conversational Data from Gigaword

A Neural Network GUI Tested on Text-To-Phoneme Mapping

CS 446: Machine Learning

Radius STEM Readiness TM

arxiv: v1 [cs.lg] 15 Jun 2015

INPE São José dos Campos

Softprop: Softmax Neural Network Backpropagation Learning

Purdue Data Summit Communication of Big Data Analytics. New SAT Predictive Validity Case Study

Course Outline. Course Grading. Where to go for help. Academic Integrity. EE-589 Introduction to Neural Networks NN 1 EE

Model Ensemble for Click Prediction in Bing Search Ads

Laboratorio di Intelligenza Artificiale e Robotica

Designing a Computer to Play Nim: A Mini-Capstone Project in Digital Design I

Experiments with SMS Translation and Stochastic Gradient Descent in Spanish Text Author Profiling

System Implementation for SemEval-2017 Task 4 Subtask A Based on Interpolated Deep Neural Networks

Multi-label classification via multi-target regression on data streams

Unsupervised Learning of Word Semantic Embedding using the Deep Structured Semantic Model

Deep search. Enhancing a search bar using machine learning. Ilgün Ilgün & Cedric Reichenbach

A Version Space Approach to Learning Context-free Grammars

Grade 6: Correlated to AGS Basic Math Skills

Version Space. Term 2012/2013 LSI - FIB. Javier Béjar cbea (LSI - FIB) Version Space Term 2012/ / 18

Grades. From Your Friends at The MAILBOX

Twitter Sentiment Classification on Sanders Data using Hybrid Approach

Learning From the Past with Experiment Databases

Generative models and adversarial training

Firms and Markets Saturdays Summer I 2014

Testing A Moving Target: How Do We Test Machine Learning Systems? Peter Varhol Technology Strategy Research, USA

COMPUTER-ASSISTED INDEPENDENT STUDY IN MULTIVARIATE CALCULUS

How do adults reason about their opponent? Typologies of players in a turn-taking game

Calibration of Confidence Measures in Speech Recognition

Simple Random Sample (SRS) & Voluntary Response Sample: Examples: A Voluntary Response Sample: Examples: Systematic Sample Best Used When

Lecture 10: Reinforcement Learning

A Decision Tree Analysis of the Transfer Student Emma Gunu, MS Research Analyst Robert M Roe, PhD Executive Director of Institutional Research and

CS 1103 Computer Science I Honors. Fall Instructor Muller. Syllabus

Probability and Game Theory Course Syllabus

Using focal point learning to improve human machine tacit coordination

CS4491/CS 7265 BIG DATA ANALYTICS INTRODUCTION TO THE COURSE. Mingon Kang, PhD Computer Science, Kennesaw State University

Student s Edition. Grade 6 Unit 6. Statistics. Eureka Math. Eureka Math

Reducing Features to Improve Bug Prediction

EDCI 699 Statistics: Content, Process, Application COURSE SYLLABUS: SPRING 2016

A Case Study: News Classification Based on Term Frequency

Lahore University of Management Sciences. FINN 321 Econometrics Fall Semester 2017

Introduction to Simulation

LEGO MINDSTORMS Education EV3 Coding Activities

Lecture 1: Basic Concepts of Machine Learning

Probabilistic Latent Semantic Analysis

Business Analytics and Information Tech COURSE NUMBER: 33:136:494 COURSE TITLE: Data Mining and Business Intelligence

STA 225: Introductory Statistics (CT)

Paper Reference. Edexcel GCSE Mathematics (Linear) 1380 Paper 1 (Non-Calculator) Foundation Tier. Monday 6 June 2011 Afternoon Time: 1 hour 30 minutes

On Human Computer Interaction, HCI. Dr. Saif al Zahir Electrical and Computer Engineering Department UBC

Diagnostic Test. Middle School Mathematics

Indian Institute of Technology, Kanpur

Chinese Language Parsing with Maximum-Entropy-Inspired Parser

Truth Inference in Crowdsourcing: Is the Problem Solved?

TUESDAYS/THURSDAYS, NOV. 11, 2014-FEB. 12, 2015 x COURSE NUMBER 6520 (1)

Machine Learning from Garden Path Sentences: The Application of Computational Linguistics

Cal s Dinner Card Deals

SARDNET: A Self-Organizing Feature Map for Sequences

Evolutive Neural Net Fuzzy Filtering: Basic Description

Investigations for Chapter 1. How do we measure and describe the world around us?

Second Exam: Natural Language Parsing with Neural Networks

Visual CP Representation of Knowledge

Scientific Method Investigation of Plant Seed Germination

TD(λ) and Q-Learning Based Ludo Players

An empirical study of learning speed in backpropagation

Transcription:

Foundations of Intelligent Systems CSCI-630-01 (Spring 2014) Final Examination, Wed. May 21, 2014 Instructor: Richard Zanibbi, Duration: 120 Minutes Name: Instructions The exam questions are worth a total of 100 points. After the exam has started, once a student leaves the exam room, they may not return to the exam room until the exam has finished. Remain in the exam room if you finish during the final five minutes of the exam. Close the door behind you quietly if you leave before the end of the examination. The exam is closed book and notes. Place any coats or bags at the front of the exam room. If you require clarification of a question, please raise your hand You may use pencil or pen, and write on the backs of pages in the booklets. Additional pages are provided at the back of the exam - clearly indicate where answers to each question may be found. 1

Questions 1. True/False (5 points) (a) (b) (c) (d) (e) ( T / F ) The minimax algorithm is guaranteed to be optimal (i.e. achieves the highest payoff) only against optimal opponents in two-player strategic games. ( T / F ) Random Forests and Multi-Layer Perceptrons are able to represent complex regression functions using combinations of simple models. ( T / F ) Neural network research was slowed substantially in the late 1960 s with the publication of Minsky and Papert s book Perceptrons, which demonstrated that a standard perceptron was incapable of learning when only one of its two inputs was on. ( T / F ) Nearly all optimization algorithms considered in our course are incremental search algorithms. ( T / F ) Entropy is measured in bits, the number of binary decisions needed to predict the answer to a question with n uncertain (probabilistic) outcomes. For outcomes v 1, v 2,..., v n entropy is defined as: H(P (v 1 ),..., P (v n )) = 1 i P (v i ) log 2 (P (i)). (f) (g) (h) (i) (j) (T / F) Predicate logic is decidable. ( T / F ) P (A, B C) = P (A C)P (B C) is an example of absolute independence. ( T / F ) It is always possible to convert a predicate logic knowledge base to a finite propositional logic knowledge base. (T / F) In practice, for problems with small search spaces and high certainty in observed data, a brute-force solution may be preferable to an intelligent solution. ( T / F ) Random Forests as discussed in class are generative models, while Multi-Layer Perceptrons are discriminative. 2

2. Miscellaneous Topics (10 points) (a) (2) Name the four key components of a formal search problem definition. (b) (4) Name the four (increasingly complex) agent types discussed in lecture and the textbook. For each agent type after the simplest one, in a single sentence identify which capability the agent type adds relative to simpler models. All four of these agent types may become learning agents, so do not include the general learning agent. (c) (4) Give a concrete example of a problem whose solution requires a combination of logic, search, and machine learning. Briefly identify how each is needed to address the problem. 3

3. Logic (30 points) (a) (4) Briefly describe how facts are represented in propositional versus first-order logic. (b) (6) Define and provide an example for each of the following. i. A sound inference rule. ii. A complete inference algorithm. iii. A satisfiable statement. 4

(c) The following is a propositional knowledge base representing relationships between available flavors at an ice cream store. 1. V anilla Chocolate 2. V anilla Strawberry 3. Chocolate (CookieDough P istachio) 4. Mint P istachio 5. Strawberry CookieDough i. (4) Convert the knowledge base to conjunctive normal form (CNF). ii. (6) Prove that Cookie Dough ice cream is available using resolution. (Hint: resolution proofs are a form of proof by contradiction). You may use a proof tree or a list of statements. 5

(d) The Prolog program below represents a Canadian legal matter. ally(spain,china). ally(china,belgium). ally(x,z) :- not(x=z), ally(x,y), ally(y,z). has(spain,beer). canadian(colonel_molson). criminal(x) :- sold(x,beer,y), canadian(x), ally(y,belgium). sold(colonel_molson,beer,y) :- has(y,beer), ally(y,belgium). i. (3) Given this knowledge base, will Prolog say that the query ally(canada,belgium) is true, false or unknown, and why? ii. (7) Show how Prolog would process the query criminal(a) for the program. You may use a tree such as the ones seen in class to illustrate the execution and unifications (variable bindings). 6

4. Decision Trees, AdaBoost and Random Forests (20 points) (a) (6) Provide the formulas for entropy and information gain, and explain how they are used to select which attribute to split on at a node in a decision tree. (b) (2) Why is the decision tree learning algorithm prone to over-fitting the training data? (c) (3) Chi-squared pruning may be used to prevent over-fitting by pruning a decision tree after its construction. At a node whose children are being considered for pruning, what difference does the Chi-square measure? 7

(d) AdaBoost creates an ensemble (a set) of classifiers that work together to make classification decisions, where classifiers are trained one-at-a-time. i. (2) What is different about how AdaBoost handles training samples versus other machine learning algorithms such as regular decision trees or the backpropagation algorithm? ii. (3) How are the decisions of the individual classifiers (e.g. decision trees) combined to make a final classification decision? (e) (4) Identify one (meaningful) similarity and one (meaningful) difference between the Random Forest construction algorithm and AdaBoost. 8

5. Linear Regression and Classification (20 points) (a) EZRide prices its cars based on interior size ($50/cubic foot) and top speed in miles per hour of the car ($100/mile per hour). The base price of a car before considering the size of the interior and top speed is $500. i. (2) Provide a linear model for the cost of an EZRide car. ii. (4) Now suppose that over a few years, EZRide changes their base price, interior size and speed costs. Given a sufficient set of (cubic feet, top speed, car price) triples, we can use linear regression to estimate the new parameters of the cost function. Provide a diagram showing the inputs and outputs of a linear regressor that can be used to learn the price model parameter weights using gradient descent. iii. (4) Provide pseudo code for the gradient descent algorithm that will be used to learn the new weights. Make sure to identify how weights are initialized and updated. 9

iv. We would like to buy a car from EZRide, and have exactly $7,000 available (and no more). The current pricing has a base cost of $500, $100 per cubic foot and $100 per mile/hour in the top speed of the car. A. (2) Provide a formula to determine whether we can afford to buy an EZRide car or not, given the size of the interior and top speed of the car. B. (6) Sketch the weight vector of the linear model and the decision boundary between the affordable and unaffordable classes in 2D, using labeled axes. v. (2) Which error function is commonly used for linear regression? 10

6. Machine Learning (15 points) (a) (2) Define over-fitting. (b) (4) Explain how over-fitting can be prevented when training a Multi-Layer Perceptron. (c) (4) Explain how over-fitting is avoided in Random Forests. 11

(d) (5) Define regression and classification functions, and discuss their relationship. (e) Bonus (2) Why is it important not to evaluate the performance of a machine learning algorithm using its training data? 12

Additional Space 13

Additional Space 14

Additional Space 15

Additional Space 16