Do not turn this page until you have received the signal to start. In the meantime, please read the instructions below carefully.

Similar documents
Python Machine Learning

Lecture 1: Machine Learning Basics

Assignment 1: Predicting Amazon Review Ratings

System Implementation for SemEval-2017 Task 4 Subtask A Based on Interpolated Deep Neural Networks

Artificial Neural Networks written examination

Course Outline. Course Grading. Where to go for help. Academic Integrity. EE-589 Introduction to Neural Networks NN 1 EE

Module 12. Machine Learning. Version 2 CSE IIT, Kharagpur

A study of speaker adaptation for DNN-based speech synthesis

Generative models and adversarial training

A Neural Network GUI Tested on Text-To-Phoneme Mapping

CS Machine Learning

OPTIMIZATINON OF TRAINING SETS FOR HEBBIAN-LEARNING- BASED CLASSIFIERS

Truth Inference in Crowdsourcing: Is the Problem Solved?

Axiom 2013 Team Description Paper

arxiv: v1 [cs.cv] 10 May 2017

Phonetic- and Speaker-Discriminant Features for Speaker Recognition. Research Project

Lecture 10: Reinforcement Learning

CSL465/603 - Machine Learning

Deep search. Enhancing a search bar using machine learning. Ilgün Ilgün & Cedric Reichenbach

Human Emotion Recognition From Speech

Learning Methods for Fuzzy Systems

AGS THE GREAT REVIEW GAME FOR PRE-ALGEBRA (CD) CORRELATED TO CALIFORNIA CONTENT STANDARDS

Model Ensemble for Click Prediction in Bing Search Ads

Analysis of Emotion Recognition System through Speech Signal Using KNN & GMM Classifier

Using focal point learning to improve human machine tacit coordination

Machine Learning and Data Mining. Ensembles of Learners. Prof. Alexander Ihler

AUTOMATIC DETECTION OF PROLONGED FRICATIVE PHONEMES WITH THE HIDDEN MARKOV MODELS APPROACH 1. INTRODUCTION

Speaker recognition using universal background model on YOHO database

WHEN THERE IS A mismatch between the acoustic

arxiv: v1 [cs.lg] 15 Jun 2015

Issues in the Mining of Heart Failure Datasets

A Simple VQA Model with a Few Tricks and Image Features from Bottom-up Attention

Attributed Social Network Embedding

Class-Discriminative Weighted Distortion Measure for VQ-Based Speaker Identification

Likelihood-Maximizing Beamforming for Robust Hands-Free Speech Recognition

STA 225: Introductory Statistics (CT)

Grade 6: Correlated to AGS Basic Math Skills

Statewide Framework Document for:

On-the-Fly Customization of Automated Essay Scoring

Introduction to Ensemble Learning Featuring Successes in the Netflix Prize Competition

Algebra 1, Quarter 3, Unit 3.1. Line of Best Fit. Overview

Georgetown University at TREC 2017 Dynamic Domain Track

Test Effort Estimation Using Neural Network

Modeling function word errors in DNN-HMM based LVCSR systems

Edexcel GCSE. Statistics 1389 Paper 1H. June Mark Scheme. Statistics Edexcel GCSE

arxiv: v1 [math.at] 10 Jan 2016

The Good Judgment Project: A large scale test of different methods of combining expert predictions

Why Did My Detector Do That?!

Autoregressive product of multi-frame predictions can improve the accuracy of hybrid models

Knowledge Transfer in Deep Convolutional Neural Nets

INPE São José dos Campos

On the Formation of Phoneme Categories in DNN Acoustic Models

Speech Emotion Recognition Using Support Vector Machine

MULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question.

Using the Attribute Hierarchy Method to Make Diagnostic Inferences about Examinees Cognitive Skills in Algebra on the SAT

Calibration of Confidence Measures in Speech Recognition

Unsupervised Learning of Word Semantic Embedding using the Deep Structured Semantic Model

QuickStroke: An Incremental On-line Chinese Handwriting Recognition System

Mathematics process categories

Time series prediction

Training a Neural Network to Answer 8th Grade Science Questions Steven Hewitt, An Ju, Katherine Stasaski

Exploration. CS : Deep Reinforcement Learning Sergey Levine

Twitter Sentiment Classification on Sanders Data using Hybrid Approach

Learning Structural Correspondences Across Different Linguistic Domains with Synchronous Neural Language Models

arxiv: v2 [cs.ir] 22 Aug 2016

School of Innovative Technologies and Engineering

OCR for Arabic using SIFT Descriptors With Online Failure Prediction

HIERARCHICAL DEEP LEARNING ARCHITECTURE FOR 10K OBJECTS CLASSIFICATION

Understanding and Interpreting the NRC s Data-Based Assessment of Research-Doctorate Programs in the United States (2010)

Modeling function word errors in DNN-HMM based LVCSR systems

Physics 270: Experimental Physics

Introduction to Causal Inference. Problem Set 1. Required Problems

DIRECT ADAPTATION OF HYBRID DNN/HMM MODEL FOR FAST SPEAKER ADAPTATION IN LVCSR BASED ON SPEAKER CODE

Lecture 1: Basic Concepts of Machine Learning

Dialog-based Language Learning

Semi-Supervised Face Detection

An Online Handwriting Recognition System For Turkish

SORT: Second-Order Response Transform for Visual Recognition

BUILDING CONTEXT-DEPENDENT DNN ACOUSTIC MODELS USING KULLBACK-LEIBLER DIVERGENCE-BASED STATE TYING

POS tagging of Chinese Buddhist texts using Recurrent Neural Networks

Cal s Dinner Card Deals

Digital Signal Processing: Speaker Recognition Final Report (Complete Version)

Design Of An Automatic Speaker Recognition System Using MFCC, Vector Quantization And LBG Algorithm

PREDICTING SPEECH RECOGNITION CONFIDENCE USING DEEP LEARNING WITH WORD IDENTITY AND SCORE FEATURES

arxiv: v2 [cs.ro] 3 Mar 2017

Switchboard Language Model Improvement with Conversational Data from Gigaword

Diagnostic Test. Middle School Mathematics

Missouri Mathematics Grade-Level Expectations

The 9 th International Scientific Conference elearning and software for Education Bucharest, April 25-26, / X

A Deep Bag-of-Features Model for Music Auto-Tagging

Probabilistic Latent Semantic Analysis

Reducing Features to Improve Bug Prediction

GACE Computer Science Assessment Test at a Glance

While you are waiting... socrative.com, room number SIMLANG2016

Evolutive Neural Net Fuzzy Filtering: Basic Description

CS177 Python Programming

Data Fusion Through Statistical Matching

CS 1103 Computer Science I Honors. Fall Instructor Muller. Syllabus

FUZZY EXPERT. Dr. Kasim M. Al-Aubidy. Philadelphia University. Computer Eng. Dept February 2002 University of Damascus-Syria

Transcription:

UNIVERSITY OF TORONTO FACULTY OF ARTS AND SCIENCE FINAL EXAMINATION, APRIL 2017 DURATION: 3 hours CSC 411 H1S Machine Learning and Data Mining Aids allowed: Non-programmable calculators and Aid sheets distributed with the exam Examiner(s): M. Guerzhoy Student Number: Family Name(s): Given Name(s): Do not turn this page until you have received the signal to start. In the meantime, please read the instructions below carefully. This final examination paper consists of 7 questions on 28 pages (including this one), printed on both sides of the paper. When you receive the signal to start, please make sure that your copy is complete, fill in the identification section above, and write your student number where indicated at the bottom of every odd-numbered page (except page 1). Answer each question directly on this paper, in the space provided, and use the reverse side of the previous page for rough work. If you need more space for one of your solutions, use the reverse side of a page or the pages at the end of the exam and indicate clearly the part of your work that should be marked. Write up your solutions carefully! In particular, use notation and terminology correctly and explain what you are trying to do part marks will be given for showing that you know some aspects of the answer, even if your solution is incomplete. A mark of at least 40% (after adjustment, if there is an adjustment) on this exam is required to obtain a passing grade in the course. Marking Guide # 1: / 10 # 2: / 15 # 3: / 20 # 4: / 10 # 5: / 10 # 6: / 15 # 7: / 20 TOTAL: /100 Page 1 of 28 Good Luck! over...

Use this page for rough work clearly indicate any section(s) to be marked. Page 2 of 28 cont d...

APRIL 2017 Final Examination CSC 411 H1S Question 1. [10 marks] Draw a design of a small neural network that takes in two inputs, x 1 and x 2, and outputs a number close to 1 if x 1 < 0 and x 2 < 0 and a number close to 0 if x 1 > 0 and x 2 > 0. You may only use sigmoid activation functions. Include the weights you used. Briefly explain why your network computes what it s required to compute. Page 3 of 28 Student #: over...

Use this page for rough work clearly indicate any section(s) to be marked. Page 4 of 28 cont d...

APRIL 2017 Final Examination CSC 411 H1S Question 2. [15 marks] In this question, we are considering generating a dataset, which will then be randomly split into a test set and a training set. The dataset will consist of N 2-dimensional vectors, with each vector having the label 0 or 1. Part (a) [5 marks] Describe a dataset for which 3-Nearest-Neighbours will perform substantially better than Linear Regression on the test set. Explain your reasoning. Part (b) [5 marks] Describe a dataset for which Linear Regression will perform better than a one-hidden-layer neural network on the test set. Explain your reasoning. Page 5 of 28 Student #: over...

Use this page for rough work clearly indicate any section(s) to be marked. Page 6 of 28 cont d...

APRIL 2017 Final Examination CSC 411 H1S Part (c) [5 marks] Describe how to generate a dataset on which a 5-hidden-layer neural network could be expected to perform bettern than a single-hidden-layer neural network (if trained appropriately.) Explain your reasoning. Use pseudocode to accompany your description of how to generate the dataset. Page 7 of 28 Student #: over...

Use this page for rough work clearly indicate any section(s) to be marked. Page 8 of 28 cont d...

APRIL 2017 Final Examination CSC 411 H1S Question 3. [20 marks] Consider the Convolutional Neural Network below. z W F FC1 (20 units) W M MEDIANPOOL1 CONV1 (1 feat. Map) stride: 3, spatial extent: 3 stride: 1, filter size: 2x1, pad: 1 x 1 x 20 W C x119 The network takes in an input of dimension 119 1, and its output is of dimension 1 1. The network consists of the input layer X (with a 0-pad of witdth 1), a convolutional layer CONV1 which consists of one feature map with a 2 1 filter which uses the ReLU nonlinearity, a median-pooling layer MEDIANPOOL1, a fully-connected layer FC1 which uses a ReLU nonlinearity, and an output layer Z of size 1 1, which is fully connected to the FC1 layer and uses a sigmoid nonlinearity. Recall that σ (t) = σ(t)(1 σ(t)). Denote the weight that connects the i-th unit in FC1 to Z by Wi F and the bias for Z by b F. Denote the weight that connects the j-th unit in MEDIANPOOL1 to the i-th unit in FC1 by Wji M and the bias of the i-th unit in FC1 by b M i. Let W C = [W1 C, W 2 C] and the bias for the CONV1 layer be bc. A unit in a median-pooling layer outputs the median value of the neurons in its receptive field (i.e., the neurons connected to the unit). Part (a) [4 marks] How many parameters (i.e., values that specify how the network computes its output) are there in this network? Briefly show your work. Page 9 of 28 Student #: over...

Use this page for rough work clearly indicate any section(s) to be marked. Page 10 of 28 cont d...

APRIL 2017 Final Examination CSC 411 H1S Let the training set inputs be X = [X (1), X (2),..., X (N) ] and the expected outputs be Y = [Y (1), Y (2),..., Y (N) ]. Let the outputs of the layers in the network be denoted using c(x (i) ), m(x (i) ), f(x (i) ), and z(x (i) ) for the CONV1, MEDIANPOOL1, FC1, and Z layers, respectively (you may use notation such as z i, f j, etc.). You may use those without explicitly telling us how to compute them. The cost function is cost(x, Y ) = n cost(x (n), Y (n) ) = n ( Y (n) log(z(x (n) )) (1 Y (n) ) log(1 z(x (n) )). Part (b) [8 marks] Compute Cost/ Wji M, for the entire training set. Show the details of the computation. Use Backpropagation to obtain the final answer: show how you would compute the gradients layer-by-layer. You may not use matrix multiplication in your answer. Page 11 of 28 Student #: over...

Use this page for rough work clearly indicate any section(s) to be marked. Page 12 of 28 cont d...

APRIL 2017 Final Examination CSC 411 H1S Part (c) [8 marks] Compute Cost/ W1 C, for the entire training set. Show the details of the computation. Note: the padding is significant. Use Backpropagation to obtain the final answer: show how you would compute the gradients layer-by-layer. You may not use matrix multiplication in your answer. Page 13 of 28 Student #: over...

Use this page for rough work clearly indicate any section(s) to be marked. Page 14 of 28 cont d...

APRIL 2017 Final Examination CSC 411 H1S Question 4. [10 marks] Describe how to learn word2vec vectors using negative sampling. Be specific. Use pseudocode. You do not need to compute any gradients, but you do need to specify which gradients need to be computed. Page 15 of 28 Student #: over...

Use this page for rough work clearly indicate any section(s) to be marked. Page 16 of 28 cont d...

APRIL 2017 Final Examination CSC 411 H1S Question 5. [10 marks] A Mixture of Gaussians model is defined using mus = np.array([[0, 5], [1, 1]]) sigmas = np.array([ [[1, 0], [0, 2]], [[2, 0], [0, 3]]]) pis = np.array([0.2, 0.8]) Write code to generate ten datapoints using the model. To generate random numbers, you may only use the following function, which returns one float. def rnorm(loc, scale): """Return an a sample from the normal distribution N(mu=loc, sigma=scale) loc and scale are both floats""" Page 17 of 28 Student #: over...

Use this page for rough work clearly indicate any section(s) to be marked. Page 18 of 28 cont d...

APRIL 2017 Final Examination CSC 411 H1S Question 6. [15 marks] Write code that uses the Metropolis algorithm to fit a linear regression model to the data (x raw, y), and to then output the predictions using this model for the data x_new. Your code s output should be the predictions made for x new. You can use the supplied functions. Assume that the data is generated using y N(ax raw + b, σ 2 ) for σ 2 = 4, and that the prior for the unknown parameters a and b is N(0, 1). You should use a Guassian distribution as the proposal distribution. The probability density function of a 1 (x µ)2 univariate Gaussian distribution is exp( ). Annotate the code to show what you are 2πσ 2 2σ 2 doing. x_raw.shape == (20,) x = vstack(( ones_like(x_raw), x_raw, )) y.shape == (20,) x_new.shape == (30,) def loglik(x, mu, sigma): return sum(-.5*log(2*pi*sigma**2)-(x-mu)**2/(2*sigma**2)) def rnorm(loc, scale, size): """Return an array of size independent samples from the normal distribution N(mu=loc, sigma=scale)""" Page 19 of 28 Student #: over...

Use this page for rough work clearly indicate any section(s) to be marked. Page 20 of 28 cont d...

APRIL 2017 Final Examination CSC 411 H1S Question 7. [20 marks] We would like to use REINFORCE to train an agent that plays Rock Paper Scissors against the computer. The game is played as follows: both the agent and the computer pick an action from the set {0, 1, 2}. The reward is +1 if the tuple of (agent, computer) actions is one of (0, 1), (1, 2), or (2, 0). The reward is 1 if the tuple of (agent, computer) actions is one of (1, 0), (2, 1), or (0, 2). The reward is 0 otherwise. (For simplicity, we substitute the integers 0, 1, 2 for Rock, Paper, and Scissors from the familiar game.) The computer is using an unknown strategy. For a computer action c t 1, taken at time t 1, the policy function that defines the probability of agent action a t is π(a t = a c t 1 ) = p a,ct 1. That is, the policy function is parametrized using 9 coefficients. You may use the function rps(act) as follows: computer_act, reward = rps(act) The function takes in the agent s action, and returns the computer s action and the reward the agent gets (so that you do no need to compute the reward yourself). For reference, the Policy Gradient Theorem is: η(θ) = s d π (s) a q π (s, a) θ (a s, θ). The REINFORCE algorithm is as follows: Repeat Generate an episode S 0, A 0, R 1,..., S T 1, A T 1, R T, following π(, θ) For each step of the episode t = 0,..., T 1: G t return from step t θ θ + αγ t G t θ log π(a t S t, θ) Write pseudocode to use REINFORCE to learn the parameters of the policy function. Make clear how you obtained that pseudocode. You must provide all the details of the computation of each variable, and you must provide all the necessary derivations in your answer. You do not need to justify the REINFORCE algorithm itself. Please start your answer on the next page. Neatness and logical structure count!. Page 21 of 28 Student #: over...

Use this page for rough work clearly indicate any section(s) to be marked. Page 22 of 28 cont d...

APRIL 2017 Final Examination CSC 411 H1S Page 23 of 28 Student #: over...

Use this page for rough work clearly indicate any section(s) to be marked. Page 24 of 28 cont d...

APRIL 2017 Final Examination CSC 411 H1S This page was intentionally left blank Page 25 of 28 Student #: over...

Use this page for rough work clearly indicate any section(s) to be marked. Page 26 of 28 cont d...

APRIL 2017 Final Examination CSC 411 H1S This page was intentionally left blank Page 27 of 28 Student #: over...

CSC 411 H1S Final Examination APRIL 2017 PLEASE WRITE NOTHING ON THIS PAGE Page 28 of 28 Total Marks = 100 End of Final Examination