CS 760 Machine Learning Spring 2017

Similar documents
Python Machine Learning

Lecture 1: Machine Learning Basics

Introduction to Ensemble Learning Featuring Successes in the Netflix Prize Competition

(Sub)Gradient Descent

Learning From the Past with Experiment Databases

Assignment 1: Predicting Amazon Review Ratings

Module 12. Machine Learning. Version 2 CSE IIT, Kharagpur

CS Machine Learning

Artificial Neural Networks written examination

CS 446: Machine Learning

QuickStroke: An Incremental On-line Chinese Handwriting Recognition System

Human Emotion Recognition From Speech

CSL465/603 - Machine Learning

Reducing Features to Improve Bug Prediction

Machine Learning and Data Mining. Ensembles of Learners. Prof. Alexander Ihler

Softprop: Softmax Neural Network Backpropagation Learning

A Neural Network GUI Tested on Text-To-Phoneme Mapping

Generative models and adversarial training

OCR for Arabic using SIFT Descriptors With Online Failure Prediction

Multivariate k-nearest Neighbor Regression for Time Series data -

Speech Emotion Recognition Using Support Vector Machine

Rule Learning With Negation: Issues Regarding Effectiveness

SARDNET: A Self-Organizing Feature Map for Sequences

arxiv: v2 [cs.cv] 30 Mar 2017

Twitter Sentiment Classification on Sanders Data using Hybrid Approach

Axiom 2013 Team Description Paper

Lecture 1: Basic Concepts of Machine Learning

Version Space. Term 2012/2013 LSI - FIB. Javier Béjar cbea (LSI - FIB) Version Space Term 2012/ / 18

A Case Study: News Classification Based on Term Frequency

Switchboard Language Model Improvement with Conversational Data from Gigaword

Semi-supervised methods of text processing, and an application to medical concept extraction. Yacine Jernite Text-as-Data series September 17.

Universidade do Minho Escola de Engenharia

OPTIMIZATINON OF TRAINING SETS FOR HEBBIAN-LEARNING- BASED CLASSIFIERS

CLASSIFICATION OF TEXT DOCUMENTS USING INTEGER REPRESENTATION AND REGRESSION: AN INTEGRATED APPROACH

AUTOMATIC DETECTION OF PROLONGED FRICATIVE PHONEMES WITH THE HIDDEN MARKOV MODELS APPROACH 1. INTRODUCTION

Speech Segmentation Using Probabilistic Phonetic Feature Hierarchy and Support Vector Machines

Rule Learning with Negation: Issues Regarding Effectiveness

CS4491/CS 7265 BIG DATA ANALYTICS INTRODUCTION TO THE COURSE. Mingon Kang, PhD Computer Science, Kennesaw State University

Measurement. When Smaller Is Better. Activity:

The 9 th International Scientific Conference elearning and software for Education Bucharest, April 25-26, / X

Time series prediction

Analysis of Emotion Recognition System through Speech Signal Using KNN & GMM Classifier

System Implementation for SemEval-2017 Task 4 Subtask A Based on Interpolated Deep Neural Networks

Large-Scale Web Page Classification. Sathi T Marath. Submitted in partial fulfilment of the requirements. for the degree of Doctor of Philosophy

Algebra 1, Quarter 3, Unit 3.1. Line of Best Fit. Overview

Indian Institute of Technology, Kanpur

arxiv: v1 [cs.lg] 15 Jun 2015

CS 1103 Computer Science I Honors. Fall Instructor Muller. Syllabus

Detecting Wikipedia Vandalism using Machine Learning Notebook for PAN at CLEF 2011

Knowledge Transfer in Deep Convolutional Neural Nets

Model Ensemble for Click Prediction in Bing Search Ads

Issues in the Mining of Heart Failure Datasets

Learning Methods for Fuzzy Systems

Australian Journal of Basic and Applied Sciences

Experiment Databases: Towards an Improved Experimental Methodology in Machine Learning

Modeling function word errors in DNN-HMM based LVCSR systems

An OO Framework for building Intelligence and Learning properties in Software Agents

Lecture 10: Reinforcement Learning

Semi-Supervised Face Detection

Word Segmentation of Off-line Handwritten Documents

Calibration of Confidence Measures in Speech Recognition

Course Outline. Course Grading. Where to go for help. Academic Integrity. EE-589 Introduction to Neural Networks NN 1 EE

Attributed Social Network Embedding

Grade 6: Correlated to AGS Basic Math Skills

Radius STEM Readiness TM

Using focal point learning to improve human machine tacit coordination

Class-Discriminative Weighted Distortion Measure for VQ-Based Speaker Identification

The Good Judgment Project: A large scale test of different methods of combining expert predictions

Mathematics. Mathematics

Phonetic- and Speaker-Discriminant Features for Speaker Recognition. Research Project

Dublin City Schools Mathematics Graded Course of Study GRADE 4

Active Learning. Yingyu Liang Computer Sciences 760 Fall

Task Types. Duration, Work and Units Prepared by

Chinese Language Parsing with Maximum-Entropy-Inspired Parser

Using the Attribute Hierarchy Method to Make Diagnostic Inferences about Examinees Cognitive Skills in Algebra on the SAT

Content-based Image Retrieval Using Image Regions as Query Examples

Automatic Speaker Recognition: Modelling, Feature Extraction and Effects of Clinical Environment

Data Fusion Through Statistical Matching

Page 1 of 11. Curriculum Map: Grade 4 Math Course: Math 4 Sub-topic: General. Grade(s): None specified

Probabilistic Latent Semantic Analysis

Learning Structural Correspondences Across Different Linguistic Domains with Synchronous Neural Language Models

Extracting Opinion Expressions and Their Polarities Exploration of Pipelines and Joint Models

PREDICTING SPEECH RECOGNITION CONFIDENCE USING DEEP LEARNING WITH WORD IDENTITY AND SCORE FEATURES

A Bayesian Learning Approach to Concept-Based Document Classification

Support Vector Machines for Speaker and Language Recognition

Applications of data mining algorithms to analysis of medical data

Analysis of Hybrid Soft and Hard Computing Techniques for Forex Monitoring Systems

Feature Selection based on Sampling and C4.5 Algorithm to Improve the Quality of Text Classification using Naïve Bayes

Iterative Cross-Training: An Algorithm for Learning from Unlabeled Web Pages

Natural Language Processing: Interpretation, Reasoning and Machine Learning

Learning Distributed Linguistic Classes

Modeling function word errors in DNN-HMM based LVCSR systems

Missouri Mathematics Grade-Level Expectations

HIERARCHICAL DEEP LEARNING ARCHITECTURE FOR 10K OBJECTS CLASSIFICATION

I-COMPETERE: Using Applied Intelligence in search of competency gaps in software project managers.

Speaker Identification by Comparison of Smart Methods. Abstract

Learning Methods in Multilingual Speech Recognition

Exposé for a Master s Thesis

Predicting Student Attrition in MOOCs using Sentiment Analysis and Neural Networks

Beyond the Pipeline: Discrete Optimization in NLP

Transcription:

Page 1 University of Wisconsin Madison Department of Computer Sciences CS 760 Machine Learning Spring 2017 Final Examination Duration: 1 hour 15 minutes One set of handwritten notes and calculator allowed. Instructions: Write your answers in the space provided. Show your calculations LEGIBLY. If you feel that a question is not fully specified, state any assumptions you need to make in order to solve the problem. Use the backs of the sheets for scratch work ONLY. Write all the final answers BELOW the questions. Answers written in the scratch sheets will NOT be considered. Name : UW ID: Problem Score Max Score 1 2 3 4 20 30 20 30 Total 100

Page 2 Problem 1: Decision trees and instance based learning 20 points 1. Which of the following statements are true for BOTH decision trees and Naive Bayes classifiers (you may choose more than one statement)? Explain (4 points) a) In both classifiers a pair of features are assumed to be independent b) In both classifiers a pair of features are assumed to be dependent c) In both classifiers a pair of features are assumed to be independent given the class label d) In both classifiers a pair of features are assumed to be dependent given the class label 2. Consider the following training set in 2 dimensional Euclidean space: (6 points) x y Class -1 1-0 1 + 0 2-1 -1-1 0 + 1 2 + 2 2-2 3 + a) What is the prediction of a 3 nearest neighbor classifier at the point (1,1)?

Page 3 b) What is the prediction of a 5 nearest neighbor classifier at the point (1,1)? c) What is the prediction of a 7 nearest neighbor classifier at the point (1,1)? 3. What is the biggest advantage/disadvantage of decision trees when compared to logistic regression classifiers? (5 points) 4. Show a decision tree that would perfectly classify the following data set: (5 points) A B Class Instance 1 2 3 + Instance 2 4 4 + Instance 3 4 5 - Instance 4 6 3 + Instance 5 8 3 -

Instance 6 8 4 - Page 4

Page 5 Problem 2 - Neural Networks: 30 points a) State whether the following statements are true or false and explain why. (12 points) i) A Perceptron can learn to correctly classify the following data, where each consists of three binary input values and a binary classification value: (111,1), (110,1), (011,1), (010,0), (000,0). ii) The Perceptron Learning Rule is a sound and complete method for a Perceptron to learn to correctly classify any two-class problem. iii) Training neural networks has the potential problem of overfitting the training data.

Page 6 b) Answer the following: (12 points) i) What is the search space and what is the search method used by the backpropagation algorithm for training neural networks? ii) What quantity does backpropagation minimize? iii) Does the back-propagation algorithm, when run until a minimum is achieved, always find the same solution no matter what the initial set of weights are? Briefly explain why or why not. c) Demonstrate how the perceptron without bias (i.e. we set the parameter b = 0 and keep it fixed) updates its parameters given the following training sequence: x 1 = (0,0,0,1,0,0,1) y 1 = 1 x 2 = (1,1,0,0,0,1,0) y 2 = -1 x 3 = (0,0,1,1,0,0,0) y 3 = 1 x 4 = (1,0,0,0,1,1,0) y 4 = -1 x 5 = (1,0,0,0,0,1,0) y 5 = -1

Page 7 Assume initial weights to be 0 and learning rate to be 1.0. (6 points)

Page 8 Problem 3 20 points Briefly describe the following: i) Pruning a decision tree ii) Auto encoders iii) Bagging iv) Regularization

Page 9 v) Markov Blanket vi) Occam's razor

Page 10 Problem 4 - Support Vector Machine 20 points 1. What are the advantages/disadvantages of a non-linear SVM? Give examples to justify your reasoning. 2. What is a kernel function? Why do we need it? 3. Given the following data samples (square and triangle mean two classes), which one(s) of the following kernels can we use in SVM to separate the two classes?

Page 11 a) Linear kernel b) Polynomial kernel c) Gaussian RBF (radial basis function) kernel d) None of the above 4. How does the margin ρ relate to the weight vector w? Express the relation using a formula.