HCAI. We have AI that can search, and represent knowledge, and plan actions, and play games. So where does the human factor come into all this?

Similar documents
Introduction to Ensemble Learning Featuring Successes in the Netflix Prize Competition

CS Machine Learning

Machine Learning and Data Mining. Ensembles of Learners. Prof. Alexander Ihler

Lecture 1: Machine Learning Basics

Rule Learning With Negation: Issues Regarding Effectiveness

LEGO MINDSTORMS Education EV3 Coding Activities

Python Machine Learning

Houghton Mifflin Online Assessment System Walkthrough Guide

Rule Learning with Negation: Issues Regarding Effectiveness

MYCIN. The MYCIN Task

A Decision Tree Analysis of the Transfer Student Emma Gunu, MS Research Analyst Robert M Roe, PhD Executive Director of Institutional Research and

MYCIN. The embodiment of all the clichés of what expert systems are. (Newell)

Active Learning. Yingyu Liang Computer Sciences 760 Fall

Objectives. Chapter 2: The Representation of Knowledge. Expert Systems: Principles and Programming, Fourth Edition

Probability and Statistics Curriculum Pacing Guide

Lecture 1: Basic Concepts of Machine Learning

Version Space. Term 2012/2013 LSI - FIB. Javier Béjar cbea (LSI - FIB) Version Space Term 2012/ / 18

OCR for Arabic using SIFT Descriptors With Online Failure Prediction

(Sub)Gradient Descent

Chinese Language Parsing with Maximum-Entropy-Inspired Parser

Applications of data mining algorithms to analysis of medical data

Learning Methods in Multilingual Speech Recognition

Constructive Induction-based Learning Agents: An Architecture and Preliminary Experiments

Analyzing sentiments in tweets for Tesla Model 3 using SAS Enterprise Miner and SAS Sentiment Analysis Studio

Probability estimates in a scenario tree

Learning From the Past with Experiment Databases

Visit us at:

Introduction to the Practice of Statistics

Entrepreneurial Discovery and the Demmert/Klein Experiment: Additional Evidence from Germany

Machine Learning from Garden Path Sentences: The Application of Computational Linguistics

Rule-based Expert Systems

Pragmatic Use Case Writing

A student diagnosing and evaluation system for laboratory-based academic exercises

CS 1103 Computer Science I Honors. Fall Instructor Muller. Syllabus

Chapter 2 Rule Learning in a Nutshell

Softprop: Softmax Neural Network Backpropagation Learning

QuickStroke: An Incremental On-line Chinese Handwriting Recognition System

AUTOMATED TROUBLESHOOTING OF MOBILE NETWORKS USING BAYESIAN NETWORKS

Data Stream Processing and Analytics

Using MAP-IT to Assess for Healthy People 2020

Universidade do Minho Escola de Engenharia

On-Line Data Analytics

Genevieve L. Hartman, Ph.D.

Assignment 1: Predicting Amazon Review Ratings

Introduction to Simulation

Software Maintenance

OPTIMIZATINON OF TRAINING SETS FOR HEBBIAN-LEARNING- BASED CLASSIFIERS

Appendix L: Online Testing Highlights and Script

Chapters 1-5 Cumulative Assessment AP Statistics November 2008 Gillespie, Block 4

SARDNET: A Self-Organizing Feature Map for Sequences

Calibration of Confidence Measures in Speech Recognition

Approval Authority: Approval Date: September Support for Children and Young People

COMPUTER-ASSISTED INDEPENDENT STUDY IN MULTIVARIATE CALCULUS

Intel-powered Classmate PC. SMART Response* Training Foils. Version 2.0

On-the-Fly Customization of Automated Essay Scoring

Ministry of Education, Republic of Palau Executive Summary

Planning a Webcast. Steps You Need to Master When

Using dialogue context to improve parsing performance in dialogue systems

12- A whirlwind tour of statistics

Knowledge based expert systems D H A N A N J A Y K A L B A N D E

Notes on The Sciences of the Artificial Adapted from a shorter document written for course (Deciding What to Design) 1

System Implementation for SemEval-2017 Task 4 Subtask A Based on Interpolated Deep Neural Networks

Early Warning System Implementation Guide

learning collegiate assessment]

Analysis of Hybrid Soft and Hard Computing Techniques for Forex Monitoring Systems

Word Segmentation of Off-line Handwritten Documents

Exploration. CS : Deep Reinforcement Learning Sergey Levine

FUZZY EXPERT. Dr. Kasim M. Al-Aubidy. Philadelphia University. Computer Eng. Dept February 2002 University of Damascus-Syria

16.1 Lesson: Putting it into practice - isikhnas

Introduction to Questionnaire Design

Grade 6: Correlated to AGS Basic Math Skills

1 Use complex features of a word processing application to a given brief. 2 Create a complex document. 3 Collaborate on a complex document.

Steve Miller UNC Wilmington w/assistance from Outlines by Eileen Goldgeier and Jen Palencia Shipp April 20, 2010

POFI 2301 WORD PROCESSING MS WORD 2010 LAB ASSIGNMENT WORKSHEET Office Systems Technology Daily Flex Entry

Your School and You. Guide for Administrators

Multi-Lingual Text Leveling

Using focal point learning to improve human machine tacit coordination

The Singapore Copyright Act applies to the use of this document.

re An Interactive web based tool for sorting textbook images prior to adaptation to accessible format: Year 1 Final Report

Process improvement, The Agile Way! By Ben Linders Published in Methods and Tools, winter

DegreeWorks Advisor Reference Guide

Twitter Sentiment Classification on Sanders Data using Hybrid Approach

A GENERIC SPLIT PROCESS MODEL FOR ASSET MANAGEMENT DECISION-MAKING

Essentials of Ability Testing. Joni Lakin Assistant Professor Educational Foundations, Leadership, and Technology

ENME 605 Advanced Control Systems, Fall 2015 Department of Mechanical Engineering

An OO Framework for building Intelligence and Learning properties in Software Agents

CSC200: Lecture 4. Allan Borodin

Unit: Human Impact Differentiated (Tiered) Task How Does Human Activity Impact Soil Erosion?

TA Certification Course Additional Information Sheet

AQUA: An Ontology-Driven Question Answering System

School Size and the Quality of Teaching and Learning

Study and Analysis of MYCIN expert system

SECTION 12 E-Learning (CBT) Delivery Module

AP Statistics Summer Assignment 17-18

STT 231 Test 1. Fill in the Letter of Your Choice to Each Question in the Scantron. Each question is worth 2 point.

Laboratorio di Intelligenza Artificiale e Robotica

Self Study Report Computer Science

Activities, Exercises, Assignments Copyright 2009 Cem Kaner 1

Proof Theory for Syntacticians

FRAMEWORK FOR IDENTIFYING THE MOST LIKELY SUCCESSFUL UNDERPRIVILEGED TERTIARY STUDY BURSARY APPLICANTS

Transcription:

HCAI We have AI that can search, and represent knowledge, and plan actions, and play games. So where does the human factor come into all this? AI has practical applications for human-computer interaction (HCI), as well as for autonomous behaviour For example, Bell s automated directory service: For what city? For what name? More interesting though, is the creation of an agent that can represent expert knowledge CSC384 Lecture Slides Steve Engels, 2005 Slide 1 of 20

Expert Systems Programs that represent a human expert s knowledge in a certain domain, with the ability to analyze a situation, and possibly recommend a course of action First devised in 1970 s, was in vogue in industry applications throughout the 1980 s. Still used today, in specialized applications Example #1: HelloYellow (310-YELO) voice-driven Yellow Pages searching application conversational marketing uses business types and location to narrow down recommendations for restaurants, shops, etc. CSC384 Lecture Slides Steve Engels, 2005 Slide 2 of 20

More Expert Systems Example #2: Mycin (1970 s) medical expert system, diagnosed infection blood diseases correct diagnosis rate of about 65%, above most nonspecialists and only slightly below specialist rates (~80%) never actually used in practice, due to liability issues Example #3: Microsoft troubleshooter solves problems by working with user to diagnose symptoms product s effectiveness is sometimes questionable, but it allows the Help Center to reduce the number of trivial support cases that it has to deal with Example #4: Autopilots CSC384 Lecture Slides Steve Engels, 2005 Slide 3 of 20

Expert System Components Knowledge Base stores the attributes that affect the problem domain, as well as possible classifications and solutions to the situation stores rules that connect factors with solutions, usually in conjunctive if-then form rules are either set manually by a domain expert or generated automatically from data Interface obtains information about current situation from user or world usually prompts user for information on the factors that would narrow down the possible situations most effectively continues to prompt for information until the possible situations all belong to the same class of problems CSC384 Lecture Slides Steve Engels, 2005 Slide 4 of 20

Expert System Tools Expert systems are typically stored as a set of rules, through which a satisfiability search is performed after obtaining each new piece of information Example: CLIPS C Language Integrated Production System NASA-sponsored expert system software, which automatically creates an expert system based on userdefined facts and rules Question: What information should be obtained first, to classify the problem the fastest? Organization of questions can be represented as a decision tree CSC384 Lecture Slides Steve Engels, 2005 Slide 5 of 20

Decision Trees Example decision tree for waiting for a table (from Russell & Norvig, p. 654) CSC384 Lecture Slides Steve Engels, 2005 Slide 6 of 20

Decision Trees (cont d) Decision tree components: Internal nodes of decision tree represent tests of one of the attributes of the situation Branches of tree represent the possible values of the test used in making the decision Leaf nodes represent the classification of this problem Simplification rules: Assume that branches represent discrete values (continuous values are an extension) assuming boolean values is a further simplification Classifications are either positive or negative (multiple assessment possibilities are also an extension) CSC384 Lecture Slides Steve Engels, 2005 Slide 7 of 20

Decision Tree Features Advantages of expert systems & decision trees: Industrial benefits reduced worker demand less downtime while waiting on scarce expert resources Simple to comprehend and interpret (white box model) Robustness can process large dataset without pre-processing can be verified statistically against other test datasets Disadvantages of expert systems & decision trees: Data needs to be very well-specified Ordering of tests can lead to very bad decision trees CSC384 Lecture Slides Steve Engels, 2005 Slide 8 of 20

Bad Decision Trees Decision trees can be bad for much the same reason that binary search trees can be bad: Example: Given the following data Example Smooth? Green? Hollow? Type Lime No Yes No Fruit Cucumber Yes Yes No Veg Apple Yes No No Fruit Pepper Yes No Yes Veg the tree could be either: H Y G N S H Y N Y N V F V F or V Y V Y S Y N G N F N F CSC384 Lecture Slides Steve Engels, 2005 Slide 9 of 20

Bad Decision Trees (cont d) Other risk of decision trees is overfitting the data Sometimes it s better to have one or two misclassified values than to have the decision tree branch too far down, just to capture the data sparse data problem: some categories might only have one or two elements, very prone to error or noise Occam s Razor: solution to a situation is usually the simplest one available (within reason) CSC384 Lecture Slides Steve Engels, 2005 Slide 10 of 20

Decision Tree Strategies One strategy is to keep most informative nodes at root (nodes whose attribute splits the data the best) Measurement for information about a node is entropy n I(P(C 1 ), P(C 2 ), P(C n )) = Σ -P(C i )log 2 P(C i ) i=1 Gives a measurement in bits. Nodes with equal probability for two possibilities I(½,½) (fair coin toss) transmit 1 bit of information: I(½,½) = -½log 2 ½-½log 2 ½ = 1 bit Nodes with 99% of getting one value (e.g. heads) only transmits 0.08 bits of information from a decision CSC384 Lecture Slides Steve Engels, 2005 Slide 11 of 20

Choosing Attribute Tests As the probability of the possible classification categories nears 0 and 1, the entropy test will approach 0 overall (highest entropy value is 1) Selection strategy: keep attributes that minimize the entropy in the nodes that result from the data split (greedy selection strategy) stop selecting attributes when entropy is zero (leaf node condition) In fruit/vegetable example: entropy(g) = 1 bit entropy(s) = entropy(h) = -¼log 2 ¼-¾log 2 ¾ = -¼(-2) - ¾(-0.415) = 0.915 bits Only choose G for root attribute if you need to guarantee a 2- attribute decision tree. Otherwise, S or H are better. CSC384 Lecture Slides Steve Engels, 2005 Slide 12 of 20

Information Gain Another attribute test is the gain in information that comes from choosing an attribute to split the decision tree cases. The gain is the difference between the information needed at the node where an attribute is chosen, and the information needed by the nodes that result from choosing the attribute. Gain(C,A) = entropy(c) - Σ P(A=v)entropy(C A=v) v V(A) C is the classification category, A is the variable for the attribute, V is the set of attribute values, and v is a particular value from this set. CSC384 Lecture Slides Steve Engels, 2005 Slide 13 of 20

Entropy Examples Picking a restaurant to go to: 9 10 Bad entropy Good entropy CSC384 Lecture Slides Steve Engels, 2005 Slide 14 of 20

Training & Testing To show how more data eliminates the problems of sparse data and noise, separate data into training and test sets. After creating model based on examples in training set, put test cases through decision tree and record the percentage that get classified accurately. The result is that performance improves as training size increases, although this might be a result of peeking during training (allowing test set to gradually influence training set). CSC384 Lecture Slides Steve Engels, 2005 Slide 15 of 20

Decision Tree Pruning To keep the decision tree simple and avoid overfitting the data, we can prune the less relevant attributes from the tree 1. First, put the tree into rule-based form. 2. (Rules from the fruit example would be: if (green && smooth) then Vegetable. if (green && smooth) then Fruit ). 3. Construct a contingency table for the rules, that measures the number of occurrences for an attribute in each rule 4. Calculate the expected value for each value of an attribute, and see how much the occurrences of these value deviates from the expected value the χ 2 chi-squared test 5. Values with low deviation can be eliminated from the decision tree 6. Rebuild the tree, using the modified attribute list CSC384 Lecture Slides Steve Engels, 2005 Slide 16 of 20

ID3 Decision Tree Algorithms basic algorithm; uses entropy measurement to select attributes for decision tree nodes chooses attributes to minimize the entropy in the resulting nodes CART (Classification and Regression Trees) relies on the Gini impurity test (1 - Σ frequencies 2 ) to check if the leaf categories are homogenous or not C4.5 & C5.0 based on ID3 algorithm prunes trees to lower decision tree height also considers cases with missing attribute data, varying costs and continuous values CSC384 Lecture Slides Steve Engels, 2005 Slide 17 of 20

Decision Tree Variations Branch costs attribute tests aren t always 100% certain. by placing confidence values on each branch, the expert system can model uncertainty in its decision-making a key ability for when data doesn t classify neatly into categories result is a confidence value for all leaf categories, based on an overall calculation. 0.9 G 0.1 S H 0.8 0.2 0.75 0.25 V F V F e.g. object is green, smooth and not hollow C(V) = (0.9)*(0.8) + (0.1)*(0.75) = 0.795 C(F) = (0.9)*(0.2) + (0.1)*(0.25) = 0.205 CSC384 Lecture Slides Steve Engels, 2005 Slide 18 of 20

Decision Tree Variations (cont d) Continuous values more difficult to ascertain attribute divisions for continuous values than for discrete values, but still possible rather than testing the attribute s one or two possible values, intervals are chosen along the continuous range of the attribute, to see which reduces the entropy of the system the most that attribute division is then compared to the other attribute values of the system CSC384 Lecture Slides Steve Engels, 2005 Slide 19 of 20

Ensemble Learning To help train the decision tree faster, we can create several decision trees and have them build a stronger model by weighting them The training examples can be weighted as well, so that examples that were misclassified earlier can be weighted more heavily in later training runs This technique is called boosting Idea behind this is that a single weak decision tree might misclassify a situation, but several classifiers are less likely to misclassify in the exact same way get majority opinion when applying label CSC384 Lecture Slides Steve Engels, 2005 Slide 20 of 20