SUPERVISED LEARNING. We ve finished Part I: Problem Solving We ve finished Part II: Reasoning with uncertainty. Part III: (Machine) Learning

Similar documents
Introduction to Ensemble Learning Featuring Successes in the Netflix Prize Competition

Python Machine Learning

Module 12. Machine Learning. Version 2 CSE IIT, Kharagpur

Lecture 1: Machine Learning Basics

CS Machine Learning

CSL465/603 - Machine Learning

Rule Learning with Negation: Issues Regarding Effectiveness

Rule Learning With Negation: Issues Regarding Effectiveness

Course Outline. Course Grading. Where to go for help. Academic Integrity. EE-589 Introduction to Neural Networks NN 1 EE

CS 446: Machine Learning

Lecture 1: Basic Concepts of Machine Learning

(Sub)Gradient Descent

Twitter Sentiment Classification on Sanders Data using Hybrid Approach

Machine Learning and Data Mining. Ensembles of Learners. Prof. Alexander Ihler

Version Space. Term 2012/2013 LSI - FIB. Javier Béjar cbea (LSI - FIB) Version Space Term 2012/ / 18

Softprop: Softmax Neural Network Backpropagation Learning

Artificial Neural Networks written examination

Speech Recognition at ICSI: Broadcast News and beyond

OPTIMIZATINON OF TRAINING SETS FOR HEBBIAN-LEARNING- BASED CLASSIFIERS

The 9 th International Scientific Conference elearning and software for Education Bucharest, April 25-26, / X

Semi-supervised methods of text processing, and an application to medical concept extraction. Yacine Jernite Text-as-Data series September 17.

INPE São José dos Campos

Assignment 1: Predicting Amazon Review Ratings

Chapter 2 Rule Learning in a Nutshell

Language Acquisition Fall 2010/Winter Lexical Categories. Afra Alishahi, Heiner Drenhaus

Switchboard Language Model Improvement with Conversational Data from Gigaword

Probability and Statistics Curriculum Pacing Guide

Learning From the Past with Experiment Databases

Generative models and adversarial training

A Version Space Approach to Learning Context-free Grammars

Word learning as Bayesian inference

Active Learning. Yingyu Liang Computer Sciences 760 Fall

Semi-Supervised Face Detection

Probabilistic Latent Semantic Analysis

A Case Study: News Classification Based on Term Frequency

STA 225: Introductory Statistics (CT)

Content-based Image Retrieval Using Image Regions as Query Examples

CLASSIFICATION OF TEXT DOCUMENTS USING INTEGER REPRESENTATION AND REGRESSION: AN INTEGRATED APPROACH

Algebra 1, Quarter 3, Unit 3.1. Line of Best Fit. Overview

Using Web Searches on Important Words to Create Background Sets for LSI Classification

Learning Methods in Multilingual Speech Recognition

Indian Institute of Technology, Kanpur

Universidade do Minho Escola de Engenharia

Iterative Cross-Training: An Algorithm for Learning from Unlabeled Web Pages

Human Emotion Recognition From Speech

Knowledge Transfer in Deep Convolutional Neural Nets

Machine Learning from Garden Path Sentences: The Application of Computational Linguistics

Extracting Opinion Expressions and Their Polarities Exploration of Pipelines and Joint Models

Discriminative Learning of Beam-Search Heuristics for Planning

An OO Framework for building Intelligence and Learning properties in Software Agents

Artificial Neural Networks

Constructive Induction-based Learning Agents: An Architecture and Preliminary Experiments

Predicting Student Attrition in MOOCs using Sentiment Analysis and Neural Networks

The lab is designed to remind you how to work with scientific data (including dealing with uncertainty) and to review experimental design.

Reducing Features to Improve Bug Prediction

Netpix: A Method of Feature Selection Leading. to Accurate Sentiment-Based Classification Models

Linking Task: Identifying authors and book titles in verbose queries

Chinese Language Parsing with Maximum-Entropy-Inspired Parser

Business Analytics and Information Tech COURSE NUMBER: 33:136:494 COURSE TITLE: Data Mining and Business Intelligence

have to be modeled) or isolated words. Output of the system is a grapheme-tophoneme conversion system which takes as its input the spelling of words,

Proof Theory for Syntacticians

Learning Methods for Fuzzy Systems

ReinForest: Multi-Domain Dialogue Management Using Hierarchical Policies and Knowledge Ontology

A Comparison of Two Text Representations for Sentiment Analysis

Maximizing Learning Through Course Alignment and Experience with Different Types of Knowledge

2/15/13. POS Tagging Problem. Part-of-Speech Tagging. Example English Part-of-Speech Tagsets. More Details of the Problem. Typical Problem Cases

System Implementation for SemEval-2017 Task 4 Subtask A Based on Interpolated Deep Neural Networks

Courses in English. Application Development Technology. Artificial Intelligence. 2017/18 Spring Semester. Database access

Applications of data mining algorithms to analysis of medical data

The Method of Immersion the Problem of Comparing Technical Objects in an Expert Shell in the Class of Artificial Intelligence Algorithms

Using focal point learning to improve human machine tacit coordination

ScienceDirect. A Framework for Clustering Cardiac Patient s Records Using Unsupervised Learning Techniques

Exploration. CS : Deep Reinforcement Learning Sergey Levine

TUESDAYS/THURSDAYS, NOV. 11, 2014-FEB. 12, 2015 x COURSE NUMBER 6520 (1)

Using the Attribute Hierarchy Method to Make Diagnostic Inferences about Examinees Cognitive Skills in Algebra on the SAT

Beyond the Pipeline: Discrete Optimization in NLP

Axiom 2013 Team Description Paper

A NEW ALGORITHM FOR GENERATION OF DECISION TREES

Rule-based Expert Systems

Laboratorio di Intelligenza Artificiale e Robotica

Model Ensemble for Click Prediction in Bing Search Ads

POS tagging of Chinese Buddhist texts using Recurrent Neural Networks

Purdue Data Summit Communication of Big Data Analytics. New SAT Predictive Validity Case Study

Knowledge-Based - Systems

arxiv: v1 [cs.cl] 2 Apr 2017

AUTOMATED TROUBLESHOOTING OF MOBILE NETWORKS USING BAYESIAN NETWORKS

A survey of multi-view machine learning

Mining Association Rules in Student s Assessment Data

Preference Learning in Recommender Systems

DOCTORAL SCHOOL TRAINING AND DEVELOPMENT PROGRAMME

Product Feature-based Ratings foropinionsummarization of E-Commerce Feedback Comments

Automatic Discretization of Actions and States in Monte-Carlo Tree Search

Experiment Databases: Towards an Improved Experimental Methodology in Machine Learning

Deep search. Enhancing a search bar using machine learning. Ilgün Ilgün & Cedric Reichenbach

PREDICTING SPEECH RECOGNITION CONFIDENCE USING DEEP LEARNING WITH WORD IDENTITY AND SCORE FEATURES

Role of Pausing in Text-to-Speech Synthesis for Simultaneous Interpretation

Speeding Up Reinforcement Learning with Behavior Transfer

ADVANCED MACHINE LEARNING WITH PYTHON BY JOHN HEARTY DOWNLOAD EBOOK : ADVANCED MACHINE LEARNING WITH PYTHON BY JOHN HEARTY PDF

How do adults reason about their opponent? Typologies of players in a turn-taking game

A New Perspective on Combining GMM and DNN Frameworks for Speaker Adaptation

Transcription:

SUPERVISED LEARNING Progress Report We ve finished Part I: Problem Solving We ve finished Part II: Reasoning with uncertainty Part III: (Machine) Learning Supervised Learning Unsupervised Learning Overlaps quite a bit with Part II 1

Today Reading We re skipping to AIMA Chapter 18! AIMA 18.1-18.2, skim 20.2.2 Goals Intro. to Machine Learning Supervised learning terminology Naïve Bayes (Decision Trees) Machine Learning The term machine learning is a bit misleading Pattern recognition We can use machine learning to n learn the probabilities for a BN n learn the topology of a BN n learn heuristic function for games 2

Subfields of Machine Learning Supervised learning learning with labels classification, regression, structured prediction Unsupervised learning learning without labels clustering, projection methods Reinforcement learning learning with rewards planning Supervised Learning Terminology data set instance, input features label, output hypothesis hypothesis class realizable, consistent 3

Types of Supervised Learning Tasks Regression y is a (vector of) real-valued number(s) e.g. price of a commodity, pollution levels, brain activity Classification y is a discrete (categorical) value e.g. spam or not spam, 5-star ratings Structured prediction y is a structured object e.g. given sentence predict parse tree, given words in a sentence predict POS tags Types of Supervised Learning Tasks Supervised learning Spam Digit recognition Rainfall levels in India Pollution index Stock returns User s ratings of movies Genre classification Sentiment analysis Document classification Image recognition Part-of-speech Storm trajectories 4

So what is learning? Learning is the process of finding (constructing, searching for) a hypothesis that performs well on the training data and generalizes well to unseen data (the test data) D TRAIN Training D h D TEST Testing Measure of performance Ockham s Razor (inductive bias) Ockham s Razor Prefer the simplest consistent hypothesis Example: Curve fitting x is the x-coordinate y is the y-coordinate f(x) f(x) Both hypotheses are consistent Which is better? (a) x (b) x 5

Overfitting (phenomenon) Overfitting Learner fits itself to noise in the training data failing to generalize well Causes: noisy data (too little data), overly complex models Example: Curve fitting Which is better? Common Supervised Learning Algorithms Graphical models Naïve Bayes classifiers Bayesian networks Decision trees Random forests (many decision trees) Neural Networks Perceptrons Artificial neural networks Deep belief nets Max margin classifiers Support vector machines Regression analysis Logistic regression Linear regression Each of these algorithms makes assumptions these assumptions are known as the inductive bias of the classifier 6

Naïve Bayes Classifier Used for classification x i are symptoms and y = {Flu, Appendicitis, } x i are word frequencies and y = {Politics, Sports, Finance, } Inductive bias: features are conditionally independent given label y x 1 x 2 x F Naïve Bayes Classifier Training: learn p(y) and p(x f y) from data set D Think of D as a set of samples we observed Use these samples to estimate distributions Testing: Once we estimate these probabilities from D, want to compute p(y=k x) for a new instance x Assign x to whichever class has highest probability 7

The Economic Meltdown: Should you be concerned? - PhD Comics Decision Tree Classifier x 1 x 2 x 3 y 1 y 2 y 3 8

Decision Tree Classifier decision tree Decision Tree Classifier Decision trees are best suited to problems where Each attribute is discrete The label y is discrete The hypothesis can be expressed using disjunctions (OR) of conjunctions (AND) The training data may contain errors The training data may contain missing attribute values 9

Decision Tree Classifier If the features are continuous, internal nodes may test the value of a feature against a threshold Decision Tree Classifier Learns axis-parallel decision boundaries, i.e. divides feature space into hyper-rectangles 10

Learning a Decision Tree decision tree Learning a Decision Tree function DECISION-TREE-LEARNING (examples, attributes, parents) returns a tree if examples is empty return MAJORITY_VOTE(parents) else if all examples have same classification return classification else if attributes is empty return MAJORITY_VOTE(examples) else A CHOOSE-BEST-ATTRIBUTE (examples) tree a new decision tree with root A for each value v k of A S k examples with value v k for attribute A subtree DECISION-TREE-LEARNING(S k, attributes-a, examples) add branch to tree with label (A=v k ) and subtree return tree 11