University of Wisconsin-Madison Computer Sciences Department. CS 760 Machine Learning. Spring Midterm Exam. (one page of notes allowed)

Similar documents
Artificial Neural Networks written examination

A Neural Network GUI Tested on Text-To-Phoneme Mapping

(Sub)Gradient Descent

CS Machine Learning

Discriminative Learning of Beam-Search Heuristics for Planning

Introduction to Ensemble Learning Featuring Successes in the Netflix Prize Competition

Chapter 2 Rule Learning in a Nutshell

Module 12. Machine Learning. Version 2 CSE IIT, Kharagpur

Axiom 2013 Team Description Paper

arxiv: v1 [cs.cv] 10 May 2017

Lecture 1: Basic Concepts of Machine Learning

Python Machine Learning

Test Effort Estimation Using Neural Network

Learning Methods for Fuzzy Systems

Course Outline. Course Grading. Where to go for help. Academic Integrity. EE-589 Introduction to Neural Networks NN 1 EE

INPE São José dos Campos

Rule Learning With Negation: Issues Regarding Effectiveness

A Version Space Approach to Learning Context-free Grammars

System Implementation for SemEval-2017 Task 4 Subtask A Based on Interpolated Deep Neural Networks

Softprop: Softmax Neural Network Backpropagation Learning

Constructive Induction-based Learning Agents: An Architecture and Preliminary Experiments

Version Space. Term 2012/2013 LSI - FIB. Javier Béjar cbea (LSI - FIB) Version Space Term 2012/ / 18

Transfer Learning Action Models by Measuring the Similarity of Different Domains

An OO Framework for building Intelligence and Learning properties in Software Agents

Calibration of Confidence Measures in Speech Recognition

Using focal point learning to improve human machine tacit coordination

Machine Learning and Data Mining. Ensembles of Learners. Prof. Alexander Ihler

Rule Learning with Negation: Issues Regarding Effectiveness

OPTIMIZATINON OF TRAINING SETS FOR HEBBIAN-LEARNING- BASED CLASSIFIERS

Cooperative evolutive concept learning: an empirical study

Model Ensemble for Click Prediction in Bing Search Ads

AGS THE GREAT REVIEW GAME FOR PRE-ALGEBRA (CD) CORRELATED TO CALIFORNIA CONTENT STANDARDS

Semi-supervised methods of text processing, and an application to medical concept extraction. Yacine Jernite Text-as-Data series September 17.

Lecture 1: Machine Learning Basics

QuickStroke: An Incremental On-line Chinese Handwriting Recognition System

An empirical study of learning speed in backpropagation

Knowledge Transfer in Deep Convolutional Neural Nets

CS 446: Machine Learning

have to be modeled) or isolated words. Output of the system is a grapheme-tophoneme conversion system which takes as its input the spelling of words,

CSL465/603 - Machine Learning

Predicting Students Performance with SimStudent: Learning Cognitive Skills from Observation

Grade 6: Correlated to AGS Basic Math Skills

Integrating derivational analogy into a general problem solving architecture

Active Learning. Yingyu Liang Computer Sciences 760 Fall

CS177 Python Programming

Maximizing Learning Through Course Alignment and Experience with Different Types of Knowledge

Predicting Student Attrition in MOOCs using Sentiment Analysis and Neural Networks

A New Perspective on Combining GMM and DNN Frameworks for Speaker Adaptation

Word Segmentation of Off-line Handwritten Documents

Human Emotion Recognition From Speech

SARDNET: A Self-Organizing Feature Map for Sequences

Grade 2: Using a Number Line to Order and Compare Numbers Place Value Horizontal Content Strand

The 9 th International Scientific Conference elearning and software for Education Bucharest, April 25-26, / X

Analysis of Hybrid Soft and Hard Computing Techniques for Forex Monitoring Systems

HIERARCHICAL DEEP LEARNING ARCHITECTURE FOR 10K OBJECTS CLASSIFICATION

ReinForest: Multi-Domain Dialogue Management Using Hierarchical Policies and Knowledge Ontology

Abstractions and the Brain

Iterative Cross-Training: An Algorithm for Learning from Unlabeled Web Pages

Multiplication of 2 and 3 digit numbers Multiply and SHOW WORK. EXAMPLE. Now try these on your own! Remember to show all work neatly!

The Good Judgment Project: A large scale test of different methods of combining expert predictions

AQUA: An Ontology-Driven Question Answering System

Quantitative analysis with statistics (and ponies) (Some slides, pony-based examples from Blase Ur)

Learning to Schedule Straight-Line Code

PM tutor. Estimate Activity Durations Part 2. Presented by Dipo Tepede, PMP, SSBB, MBA. Empowering Excellence. Powered by POeT Solvers Limited

Numeracy Medium term plan: Summer Term Level 2C/2B Year 2 Level 2A/3C

arxiv: v1 [cs.cl] 2 Apr 2017

CS 101 Computer Science I Fall Instructor Muller. Syllabus

Self Study Report Computer Science

Speaker Identification by Comparison of Smart Methods. Abstract

Learning Methods in Multilingual Speech Recognition

Neuro-Symbolic Approaches for Knowledge Representation in Expert Systems

Artificial Neural Networks

Experiments with SMS Translation and Stochastic Gradient Descent in Spanish Text Author Profiling

MYCIN. The MYCIN Task

11/29/2010. Statistical Parsing. Statistical Parsing. Simple PCFG for ATIS English. Syntactic Disambiguation

Dublin City Schools Mathematics Graded Course of Study GRADE 4

Curriculum Design Project with Virtual Manipulatives. Gwenanne Salkind. George Mason University EDCI 856. Dr. Patricia Moyer-Packenham

Using the Attribute Hierarchy Method to Make Diagnostic Inferences about Examinees Cognitive Skills in Algebra on the SAT

PREDICTING SPEECH RECOGNITION CONFIDENCE USING DEEP LEARNING WITH WORD IDENTITY AND SCORE FEATURES

A Simple VQA Model with a Few Tricks and Image Features from Bottom-up Attention

Speeding Up Reinforcement Learning with Behavior Transfer

Evolutive Neural Net Fuzzy Filtering: Basic Description

Knowledge-Based - Systems

Machine Learning from Garden Path Sentences: The Application of Computational Linguistics

Evolution of Symbolisation in Chimpanzees and Neural Nets

Statewide Framework Document for:

Speech Segmentation Using Probabilistic Phonetic Feature Hierarchy and Support Vector Machines

Algebra 2- Semester 2 Review

Math-U-See Correlation with the Common Core State Standards for Mathematical Content for Third Grade

Knowledge based expert systems D H A N A N J A Y K A L B A N D E

SINGLE DOCUMENT AUTOMATIC TEXT SUMMARIZATION USING TERM FREQUENCY-INVERSE DOCUMENT FREQUENCY (TF-IDF)

Lecture 10: Reinforcement Learning

Multi-label classification via multi-target regression on data streams

A study of speaker adaptation for DNN-based speech synthesis

GDP Falls as MBA Rises?

FIGURE IT OUT! MIDDLE SCHOOL TASKS. Texas Performance Standards Project

MKTG 611- Marketing Management The Wharton School, University of Pennsylvania Fall 2016

APES Summer Work PURPOSE: THE ASSIGNMENT: DUE DATE: TEST:

Phonetic- and Speaker-Discriminant Features for Speaker Recognition. Research Project

Improvements to the Pruning Behavior of DNN Acoustic Models

Transcription:

University of Wisconsin-Madison Computer Sciences Department CS 760 Machine Learning Spring 1992 Midterm Exam (one page of notes allowed) 100 points, 90 minutes April 28, 1992 Write your answers on these pages and show your work. If you feel that a question is not fully specified, state any assumptions you need to make in order to solve the problem. You may use the backs of these sheets for scratch work. Notice that all questions do not have the same point-value. Divide your time appropriately. Before starting, write your name on this and all other pages of this exam. Also, make sure your exam contains four (4) problems on 9 pages. Problem Score Max Score 1 35 2 35 3 20 4 10 Total 100 CS 760 Machine Learning

1. Empirical Learning (35 points) i) Assume you are given the following three nominal features with the possible values shown. Shape {triangle, diamond} Font {small, large} Color {red, blue, green} Also assume that ID3 is given the following set of classified examples. Using Quinlan s max-gain formula, produce a decision tree that accounts for these examples. Show all your work. You may use the abbreviations used to specify the examples. S = t F = l C = g + S = d F = s C = r + S = t F = s C = g + S = t F = l C = r - S = d F = s C = g - (page 2 of 9)

ii) Briefly discuss the relations, if any, between the G and S sets in Mitchell s Version Spaces algorithm and Michalski s concepts of discriminant and characteristic descriptions. iii) Define, in terms of concept space, the following types of inductive biases. (a) preference biases (b) restricted hypothesis-space biases (page 3 of 9)

iv) Which type of biases (preference, restricted hypothesis space, both, or neither) do ID3, Version Space, and AQ have? Briefly justify your answers. (a) ID3 (b) Version Space (c) AQ v) Briefly define the experimental question addressed by t-tests. (Note: you need not describe the calculations of a t-test.) (page 4 of 9)

2. Artificial Neural Networks (35 points) i) Consider using an artificial neural network (without hidden units) to learn, using the delta rule, the following examples: Input Output 0 1 1 1 1 1 0 0 1 1 1 1 1 0 1 0 Draw the network after processing (once) each of the above four (4) training examples. Use a linear threshold unit as the output and initialize its threshold to 0 (assume the node is "active" if it equals or exceeds its threshold). Also assume all weights are initially 0 and that η=0.25. What output does your final network give for the input "0 0 1"? (page 5 of 9)

ii) Discuss two (2) useful ways of using tuning sets in neural network training. (a) (b) iii) Briefly describe one (1) major problem that can arise from using a neural network with: (a) too few hidden units (b) too many hidden units (page 6 of 9)

iv) Assume you are given a linearly-separable data set and have trained a perceptron so that it properly classifies all the examples in your training set. Consider the two network alterations listed below. For each, either (1) argue that it would have no impact on the perceptron s accuracy on the training set or (2) show (mathematically or with a specific case) that the accuracy can change. (a) you multiply all the weights and the output unit s threshold by α, a positive constant (b) you add α to all the weights and to the output unit s threshold v) Repeat part (iv-a), but assume now that your data is not linearly separable and that you have to use a neural network with one layer of sigmoidal hidden units to completely separate (i.e., learn) the training data. (a) you multiply all the weights and each unit s bias by α (page 7 of 9)

3. Explanation-Based Learning (20 points) i) Consider the following EBL domain theory. Terms beginning with? s are implicitly universallyquantified variables. A(?x,?y) and B(?y,1,?z) C(1,?y,?z) D(?x,?x) and E(?x,?y) A(?x,?y) F(?x,?z,?z) B(?x,?y,?z) G(?y,?x) D(?x,?y) Assume the following problem-specific facts are asserted: E(1,2) E(3,2) F(2,3,3) F(3,1,2) G(1,2) D(3,3) Explain, with a proof tree, that C(1,2,3) is true. Draw to the right of your proof tree the corresponding explanation structure (before pruning at operational nodes). Clearly indicate the necessary unifications. Assuming that predicate B is operational, what rule would Mooney s EGGS algorithm learn? ii) Define the utility problem in EBL and briefly explain Minton s solution to it. Be sure to discuss why this is a central problem to EBL. (page 8 of 9)

4. Learning without a Teacher (10 points) Fisher s COBWEB system is designed to work incrementally - it assumes examples arrive periodically and it adjusts its current hierarchy following the receipt of each new example. Sketch the design of a batch version of COBWEB that also does hill-climbing search. That is, assume all the training examples are available at the start of learning and describe how your approach would heuristically search for a good concept hierarchy. (Do not merely convert this problem to that addressed by the standard, incremental COBWEB algorithm.) (page 9 of 9)