Machine Learning (Decision Trees and Intro to Neural Nets) CSCI 3202, Fall 2010

Similar documents
Artificial Neural Networks written examination

Introduction to Ensemble Learning Featuring Successes in the Netflix Prize Competition

CS Machine Learning

OPTIMIZATINON OF TRAINING SETS FOR HEBBIAN-LEARNING- BASED CLASSIFIERS

Python Machine Learning

Test Effort Estimation Using Neural Network

Machine Learning and Data Mining. Ensembles of Learners. Prof. Alexander Ihler

Softprop: Softmax Neural Network Backpropagation Learning

Lecture 1: Machine Learning Basics

Analysis of Hybrid Soft and Hard Computing Techniques for Forex Monitoring Systems

Testing A Moving Target: How Do We Test Machine Learning Systems? Peter Varhol Technology Strategy Research, USA

A Neural Network GUI Tested on Text-To-Phoneme Mapping

Rule Learning With Negation: Issues Regarding Effectiveness

Neuro-Symbolic Approaches for Knowledge Representation in Expert Systems

Course Outline. Course Grading. Where to go for help. Academic Integrity. EE-589 Introduction to Neural Networks NN 1 EE

Evolutive Neural Net Fuzzy Filtering: Basic Description

Axiom 2013 Team Description Paper

How People Learn Physics

Artificial Neural Networks

(Sub)Gradient Descent

Issues in the Mining of Heart Failure Datasets

INPE São José dos Campos

Rule Learning with Negation: Issues Regarding Effectiveness

MYCIN. The MYCIN Task

An OO Framework for building Intelligence and Learning properties in Software Agents

Lecture 1: Basic Concepts of Machine Learning

The Evolution of Random Phenomena

COMPUTER-ASSISTED INDEPENDENT STUDY IN MULTIVARIATE CALCULUS

Computerized Adaptive Psychological Testing A Personalisation Perspective

Predicting Student Attrition in MOOCs using Sentiment Analysis and Neural Networks

Knowledge-Based - Systems

System Implementation for SemEval-2017 Task 4 Subtask A Based on Interpolated Deep Neural Networks

Module 12. Machine Learning. Version 2 CSE IIT, Kharagpur

Learning Methods for Fuzzy Systems

Activity 2 Multiplying Fractions Math 33. Is it important to have common denominators when we multiply fraction? Why or why not?

Defragmenting Textual Data by Leveraging the Syntactic Structure of the English Language

A Decision Tree Analysis of the Transfer Student Emma Gunu, MS Research Analyst Robert M Roe, PhD Executive Director of Institutional Research and

OCR for Arabic using SIFT Descriptors With Online Failure Prediction

Notes on The Sciences of the Artificial Adapted from a shorter document written for course (Deciding What to Design) 1

Human Emotion Recognition From Speech

The 9 th International Scientific Conference elearning and software for Education Bucharest, April 25-26, / X

Individual Component Checklist L I S T E N I N G. for use with ONE task ENGLISH VERSION

A Case Study: News Classification Based on Term Frequency

Mathematics process categories

Generative models and adversarial training

Calibration of Confidence Measures in Speech Recognition

CS 446: Machine Learning

Device Independence and Extensibility in Gesture Recognition

Objectives. Chapter 2: The Representation of Knowledge. Expert Systems: Principles and Programming, Fourth Edition

Lip reading: Japanese vowel recognition by tracking temporal changes of lip shape

If we want to measure the amount of cereal inside the box, what tool would we use: string, square tiles, or cubes?

Notetaking Directions

IN THIS UNIT YOU LEARN HOW TO: SPEAKING 1 Work in pairs. Discuss the questions. 2 Work with a new partner. Discuss the questions.

Answer Key For The California Mathematics Standards Grade 1

CS 1103 Computer Science I Honors. Fall Instructor Muller. Syllabus

Word Segmentation of Off-line Handwritten Documents

Using focal point learning to improve human machine tacit coordination

Getting Started with Deliberate Practice

Section 7, Unit 4: Sample Student Book Activities for Teaching Listening

Get a Smart Start with Youth

QuickStroke: An Incremental On-line Chinese Handwriting Recognition System

Classification Using ANN: A Review

File # for photo

Grade 6: Correlated to AGS Basic Math Skills

Australian Journal of Basic and Applied Sciences

First and Last Name School District School Name School City, State

Second Exam: Natural Language Parsing with Neural Networks

Linking the Ohio State Assessments to NWEA MAP Growth Tests *

A Guide to Adequate Yearly Progress Analyses in Nevada 2007 Nevada Department of Education

Using the Attribute Hierarchy Method to Make Diagnostic Inferences about Examinees Cognitive Skills in Algebra on the SAT

Lecture 10: Reinforcement Learning

Model Ensemble for Click Prediction in Bing Search Ads

Genevieve L. Hartman, Ph.D.

Effective Practice Briefings: Robert Sylwester 03 Page 1 of 12

EVERYTHING DiSC WORKPLACE LEADER S GUIDE

Eduroam Support Clinics What are they?

Framewise Phoneme Classification with Bidirectional LSTM and Other Neural Network Architectures

Semi-supervised methods of text processing, and an application to medical concept extraction. Yacine Jernite Text-as-Data series September 17.

Evolution of Symbolisation in Chimpanzees and Neural Nets

Custom Program Title. Leader s Guide. Understanding Other Styles. Discovering Your DiSC Style. Building More Effective Relationships

ENME 605 Advanced Control Systems, Fall 2015 Department of Mechanical Engineering

Case study Norway case 1

Netpix: A Method of Feature Selection Leading. to Accurate Sentiment-Based Classification Models

Modeling function word errors in DNN-HMM based LVCSR systems

DegreeWorks Advisor Reference Guide

Simple Random Sample (SRS) & Voluntary Response Sample: Examples: A Voluntary Response Sample: Examples: Systematic Sample Best Used When

Managerial Decision Making

arxiv: v1 [cs.lg] 15 Jun 2015

How To Enroll using the Stout Mobile App

Twitter Sentiment Classification on Sanders Data using Hybrid Approach

Comparison of network inference packages and methods for multiple networks inference

Discriminative Learning of Beam-Search Heuristics for Planning

Kindergarten Lessons for Unit 7: On The Move Me on the Map By Joan Sweeney

Multiple Measures Assessment Project - FAQs

arxiv: v1 [cs.cv] 10 May 2017

Dublin City Schools Mathematics Graded Course of Study GRADE 4

Numeracy Medium term plan: Summer Term Level 2C/2B Year 2 Level 2A/3C

Using dialogue context to improve parsing performance in dialogue systems

The Good Judgment Project: A large scale test of different methods of combining expert predictions

CS177 Python Programming

Transcription:

Machine Learning (Decision Trees and Intro to Neural Nets) CSCI 3202, Fall 2010

Assignments To read this week: Chapter 18, sections 1-4 and 7 Problem Set 3 due next week!

Learning a Decision Tree We look at someone making classification decisions, and try to infer the rule that they are using.(e.g., we might look at someone choosing videos and try to predict whether they think a particular title is a good video or not.) We assume that their rule can be written as a tree in which each node represents a local decision based on an attribute.

Some Attributes for Mike s Video Choices Do things blow up? (Tends to be good.) Is the title written in script? (Tends to be bad.) Is it a sequel? (Tends to be bad.) Is there a monster? (Tends to be good.) Is it based on a TV show? (Tends to be bad.)

A Training Set of Examples: We Watch Mike Make Lots of Video Choices Blowup? Script? Sequel? Monster? TV? Y Yes No Yes Yes No Y Yes Yes No Yes No N Yes Yes Yes No Yes and others

Building a Tree from Our Examples Suppose we have 12 attributes, and 200 examples of Mike s video choices:100 positive and 100 negative. Now, the crucial question: which attribute is most important to Mike?

Which is More Important? Attribute A divides the set as follows: (70 yes, 30 no) for A true (30 yes, 70 no) for A false Attribute B divides the set as follows: (100 yes, 90 no) for B true (0 yes, 10 no) for B false

Information as a Criterion for Attribute Values: Information value for a set of probabilities: Σ(- P i )log 2 (P i ) So, for a standard coin flip, the information is 2* (-1/2) (log 1/2) = 1 bit

Attribute A Information at start: -1/2 * log(1/2) + -1/2 * log (1/2) = 1 Information after Attribute A: Choice 1, weighted by 0.5: -0.7*log(0.7) + -0.3*(log 0.3) Choice 2, weighted by 0.5-0.3*log(0.3) + -0.7*(log 0.7) Total information: 0.44065 + 0.4406 = 0.881

Attribute B Choice 1, weighted by 0.95-10/19 * (log 10/19) + -9/19(log 9/19) Choice 2, weighted by 0.05-0 * (log 0) + -1 (log 1) Total information value: 0.948 + 0 The change in information by using Attribute A is greater, so A is the more informative attribute.

Splitting Examples into Training and Test Sets Given our initial set of examples, split it into a (randomly-chosen) training set and a test set. Once the algorithm has generated a tree for the training set, use the test set to gauge the accuracy of the tree (measure the percent of the test set that is correctly classified). For sufficiently large and sufficiently representative training sets we converge on a high accuracy for the test set.

Things We ve Swept Under the Rug Doesn t this strategy lead to attributes with many possibilities? (E.g., day of the year?) Are we sure that all new examples will be completely classified? Aren t there some functions that are hard to express using decision trees?

A Bad Concept for Decision Trees to Learn Majority function (classified as true whenever the majority of the attributes are positive, false otherwise) Each attribute is equally important, and none are very effective at dividing the set

Neural Networks: Some First Concepts Each neural element is loosely based on the structure of neurons A neural net is a collection of neural elements connected by weighted links We think of some set of neurons as input elements ; these are linked to output elements which can be interpreted as a classification of the input pattern Standard formats: perceptrons and multilayer feedforward networks

Structure of an (artificial) neuron Think of this as a very simple computational element: it receives numeric input values, sums those values, and compares them to a threshold. If the sum of the inputs is greater than the threshold, the neuron outputs a 1; otherwise it outputs a zero. The output of this neuron is connected (as usual, via weighted links) to subsequent neurons in the net.

Perceptrons: the Simplest Neural Network Perceptrons are two-layer networks: one layer of inputs directly connected to a layer of outputs. For simplicity, we can look at a perceptron with a single output node

Training a Perceptron by Adjusting its Weights Overall error is the (squared) value of the difference between what we wanted and what we got from our perceptron. Once we see that our perceptron is in error, we can adjust each of the weights leading to the output node. We ll adjust each weight in such a way as to make the error value smaller.