Machine-Learning Tasks and Feature Space Representations. Goals for the lecture

Similar documents
Module 12. Machine Learning. Version 2 CSE IIT, Kharagpur

Lecture 1: Machine Learning Basics

Active Learning. Yingyu Liang Computer Sciences 760 Fall

CSL465/603 - Machine Learning

Introduction to Ensemble Learning Featuring Successes in the Netflix Prize Competition

Python Machine Learning

CS Machine Learning

Lecture 1: Basic Concepts of Machine Learning

(Sub)Gradient Descent

Rule Learning With Negation: Issues Regarding Effectiveness

12- A whirlwind tour of statistics

Rule Learning with Negation: Issues Regarding Effectiveness

The 9 th International Scientific Conference elearning and software for Education Bucharest, April 25-26, / X

Learning Methods for Fuzzy Systems

Language Acquisition Fall 2010/Winter Lexical Categories. Afra Alishahi, Heiner Drenhaus

Axiom 2013 Team Description Paper

Probabilistic Latent Semantic Analysis

Generative models and adversarial training

Arizona s College and Career Ready Standards Mathematics

Artificial Neural Networks written examination

Activity 2 Multiplying Fractions Math 33. Is it important to have common denominators when we multiply fraction? Why or why not?

Evolution of Symbolisation in Chimpanzees and Neural Nets

Course Outline. Course Grading. Where to go for help. Academic Integrity. EE-589 Introduction to Neural Networks NN 1 EE

OCR for Arabic using SIFT Descriptors With Online Failure Prediction

Learning Structural Correspondences Across Different Linguistic Domains with Synchronous Neural Language Models

Quantitative analysis with statistics (and ponies) (Some slides, pony-based examples from Blase Ur)

Machine Learning and Data Mining. Ensembles of Learners. Prof. Alexander Ihler

First Grade Standards

Human Emotion Recognition From Speech

Experiment Databases: Towards an Improved Experimental Methodology in Machine Learning

Impact of Cluster Validity Measures on Performance of Hybrid Models Based on K-means and Decision Trees

Measurement. When Smaller Is Better. Activity:

Applications of data mining algorithms to analysis of medical data

Australian Journal of Basic and Applied Sciences

Assignment 1: Predicting Amazon Review Ratings

Laboratorio di Intelligenza Artificiale e Robotica

Welcome to. ECML/PKDD 2004 Community meeting

Grade 6: Correlated to AGS Basic Math Skills

Multi-label classification via multi-target regression on data streams

J j W w. Write. Name. Max Takes the Train. Handwriting Letters Jj, Ww: Words with j, w 321

Business Analytics and Information Tech COURSE NUMBER: 33:136:494 COURSE TITLE: Data Mining and Business Intelligence

DOCTORAL SCHOOL TRAINING AND DEVELOPMENT PROGRAMME

Word Segmentation of Off-line Handwritten Documents

Unit 3: Lesson 1 Decimals as Equal Divisions

Cooperative evolutive concept learning: an empirical study

A Neural Network GUI Tested on Text-To-Phoneme Mapping

ReinForest: Multi-Domain Dialogue Management Using Hierarchical Policies and Knowledge Ontology

Linking Task: Identifying authors and book titles in verbose queries

Curriculum Design Project with Virtual Manipulatives. Gwenanne Salkind. George Mason University EDCI 856. Dr. Patricia Moyer-Packenham

Reducing Features to Improve Bug Prediction

A survey of multi-view machine learning

Answer Key For The California Mathematics Standards Grade 1

arxiv: v1 [cs.cv] 10 May 2017

Mining Association Rules in Student s Assessment Data

Full text of O L O W Science As Inquiry conference. Science as Inquiry

OPTIMIZATINON OF TRAINING SETS FOR HEBBIAN-LEARNING- BASED CLASSIFIERS

The taming of the data:

Twitter Sentiment Classification on Sanders Data using Hybrid Approach

Deep search. Enhancing a search bar using machine learning. Ilgün Ilgün & Cedric Reichenbach

Machine Learning from Garden Path Sentences: The Application of Computational Linguistics

Using focal point learning to improve human machine tacit coordination

Phonetic- and Speaker-Discriminant Features for Speaker Recognition. Research Project

Unsupervised Learning of Word Semantic Embedding using the Deep Structured Semantic Model

Visual CP Representation of Knowledge

Speech Recognition at ICSI: Broadcast News and beyond

Medical Complexity: A Pragmatic Theory

Radius STEM Readiness TM

Probability and Statistics Curriculum Pacing Guide

A Case Study: News Classification Based on Term Frequency

Research Design & Analysis Made Easy! Brainstorming Worksheet

Defragmenting Textual Data by Leveraging the Syntactic Structure of the English Language

Analysis of Emotion Recognition System through Speech Signal Using KNN & GMM Classifier

Montana Content Standards for Mathematics Grade 3. Montana Content Standards for Mathematical Practices and Mathematics Content Adopted November 2011

Mining Student Evolution Using Associative Classification and Clustering

STT 231 Test 1. Fill in the Letter of Your Choice to Each Question in the Scantron. Each question is worth 2 point.

Statistics and Data Analytics Minor

Laboratorio di Intelligenza Artificiale e Robotica

Math-U-See Correlation with the Common Core State Standards for Mathematical Content for Third Grade

STA 225: Introductory Statistics (CT)

On-Line Data Analytics

Probability and Game Theory Course Syllabus

A study of speaker adaptation for DNN-based speech synthesis

Massachusetts Institute of Technology Tel: Massachusetts Avenue Room 32-D558 MA 02139

Lecture 10: Reinforcement Learning

Algebra 2- Semester 2 Review

Self Study Report Computer Science

Summary / Response. Karl Smith, Accelerations Educational Software. Page 1 of 8

Class-Discriminative Weighted Distortion Measure for VQ-Based Speaker Identification

AGS THE GREAT REVIEW GAME FOR PRE-ALGEBRA (CD) CORRELATED TO CALIFORNIA CONTENT STANDARDS

Online Updating of Word Representations for Part-of-Speech Tagging

Semi-Supervised Face Detection

AUTOMATIC DETECTION OF PROLONGED FRICATIVE PHONEMES WITH THE HIDDEN MARKOV MODELS APPROACH 1. INTRODUCTION

We are strong in research and particularly noted in software engineering, information security and privacy, and humane gaming.

Abstractions and the Brain

Comparison of EM and Two-Step Cluster Method for Mixed Data: An Application

Maximizing Learning Through Course Alignment and Experience with Different Types of Knowledge

ENME 605 Advanced Control Systems, Fall 2015 Department of Mechanical Engineering

Improving Simple Bayes. Abstract. The simple Bayesian classier (SBC), sometimes called

A New Perspective on Combining GMM and DNN Frameworks for Speaker Adaptation

ASSESSMENT TASK OVERVIEW & PURPOSE:

Transcription:

Machine-Learning Tasks and Feature Space Representations Mark Craven and David Page Computer Sciences 760 Spring 2018 www.biostat.wisc.edu/~craven/cs760/ Some of the slides in these lectures have been adapted/borrowed from materials developed by Tom Dietterich, Pedro Domingos, Tom Mitchell, David Page, and Jude Shavlik Goals for the lecture define the supervised and unsupervised learning tasks consider how to represent instances as fixed-length feature vectors understand the concepts instance (example) feature (attribute) feature space feature types model (hypothesis) training set supervised learning classification (concept learning) Regression batch vs. online learning i.i.d. assumption generalization 1

Goals for the lecture (continued) understand the concepts unsupervised learning clustering anomaly detection dimensionality reduction Can I eat this mushroom? I don t know what type it is I ve never seen it before. Is it edible or poisonous? 2

Can I eat this mushroom? suppose we re given examples of edible and poisonous mushrooms (we ll refer to these as training examples or training instances) edible poisonous can we learn a model that can be used to classify other mushrooms? Representing instances using feature vectors we need some way to represent each instance one common way to do this: use a fixed-length vector to represent features (a.k.a. attributes) of each instance also represent class label of each instance x (1) = bell, fibrous, gray, x (2) = convex, scaly, x (3) = bell, purple, false, musty, smooth, red,! false, foul, true, musty, y (1) = edible y (2) = poisonous y (3) = edible! 3

Standard feature types nominal (including Boolean) no ordering among possible values e.g. color {red, blue, green} (vs. color = 1000 Hertz) ordinal possible values of the feature are totally ordered e.g. size {small, medium, large} numeric (continuous) weight [0 500] hierarchical possible values are partially ordered in an ISA hierarchy e.g. shape closed polygon continuous square triangle circle ellipse Feature hierarchy example Lawrence et al., Data Mining and Knowledge Discovery 5(1-2), 2001 Structure of one feature! Product Pet Foods Tea 99 Product Classes 2,302 Product Subclasses Dried Cat Food Canned Cat Food Friskies Liver, 250g ~30K Products 4

Feature space we can think of each instance as representing a point in a d-dimensional feature space where d is the number of features example: optical properties of oceans in three spectral bands [Traykovski and Sosik, Ocean Optics XIV Conference Proceedings, 1998] Another view of the feature-vector representation: a single database table feature 1 feature 2... feature d class instance 1 0.0 small red true instance 2 9.3 medium red false instance 3 8.2 small blue false... instance n 5.7 medium green true 5

The supervised learning task problem setting set of possible instances: X unknown target function: f : X Y set of models (a.k.a. hypotheses): H = h h : X Y { } given training set of instances of unknown target function f ( x (1), y (1) ), ( x (2), y (2) ) ( x (m), y (m) ) output model that best approximates target function h H The supervised learning task when y is discrete, we term this a classification task (or concept learning) when y is continuous, it is a regression task later in the semester, we will consider tasks in which each y is more structured object (e.g. a sequence of discrete labels) 6

Batch vs. online learning In batch learning, the learner is given the training set as a batch (i.e. all at once) ( x (1), y (1) ), ( x (2), y (2) ) ( x (m), y (m) ) In online learning, the learner receives instances sequentially, and updates the model after each (for some tasks it might have to classify/make a prediction for each x (i) before seeing y (i) ) ( x (1), y (1) ) ( x (2), y (2) ) ( x (i), y (i) ) time i.i.d. instances we often assume that training instances are independent and identically distributed (i.i.d.) sampled independently from the same unknown distribution later in the course we ll consider cases where this assumption does not hold cases where sets of instances have dependencies instances sampled from the same medical image instances from time series etc. cases where the learner can select which instances are labeled for training active learning the target function changes over time (concept drift) 7

Generalization The primary objective in supervised learning is to find a model that generalizes one that accurately predicts y for previously unseen x Can I eat this mushroom that was not in my training set? Model representations throughout the semester, we will consider a broad range of representations for learned models, including decision trees neural networks support vector machines Bayesian networks logic clauses ensembles of the above etc. 8

Mushroom features (from the UCI Machine Learning Repository) sunken is one possible value of the cap-shape feature cap-shape: bell=b,conical=c,convex=x,flat=f, knobbed=k,sunken=s cap-surface: fibrous=f,grooves=g,scaly=y,smooth=s cap-color: brown=n,buff=b,cinnamon=c,gray=g,green=r, pink=p,purple=u,red=e,white=w,yellow=y bruises?: bruises=t,no=f odor: almond=a,anise=l,creosote=c,fishy=y,foul=f, musty=m,none=n,pungent=p,spicy=s gill-attachment: attached=a,descending=d,free=f,notched=n gill-spacing: close=c,crowded=w,distant=d gill-size: broad=b,narrow=n gill-color: black=k,brown=n,buff=b,chocolate=h,gray=g, green=r,orange=o,pink=p,purple=u,red=e, white=w,yellow=y stalk-shape: enlarging=e,tapering=t stalk-root: bulbous=b,club=c,cup=u,equal=e, rhizomorphs=z,rooted=r,missing=? stalk-surface-above-ring: fibrous=f,scaly=y,silky=k,smooth=s stalk-surface-below-ring: fibrous=f,scaly=y,silky=k,smooth=s stalk-color-above-ring: brown=n,buff=b,cinnamon=c,gray=g,orange=o, pink=p,red=e,white=w,yellow=y stalk-color-below-ring: brown=n,buff=b,cinnamon=c,gray=g,orange=o, pink=p,red=e,white=w,yellow=y veil-type: partial=p,universal=u veil-color: brown=n,orange=o,white=w,yellow=y ring-number: none=n,one=o,two=t ring-type: cobwebby=c,evanescent=e,flaring=f,large=l, none=n,pendant=p,sheathing=s,zone=z spore-print-color: black=k,brown=n,buff=b,chocolate=h,green=r, orange=o,purple=u,white=w,yellow=y population: abundant=a,clustered=c,numerous=n, scattered=s,several=v,solitary=y habitat: grasses=g,leaves=l,meadows=m,paths=p, urban=u,waste=w,woods=d A learned decision tree if odor=almond, predict edible if odor=none spore-print-color=white gill-size=narrow gill-spacing=crowded, predict poisonous 9

Classification with a learned decision tree once we have a learned model, we can use it to classify previously unseen instances y = edible or poisonous? x = bell, fibrous, brown, false, foul, Unsupervised learning in unsupervised learning, we re given a set of instances, without y s x (1), x (2) x (m) goal: discover interesting regularities that characterize the instances common unsupervised learning tasks clustering anomaly detection dimensionality reduction 10

Clustering given training set of instances x (1), x (2) x (m) output model h H that divides the training set into clusters such that there is intra-cluster similarity and inter-cluster dissimilarity Clustering example Clustering irises using three different features (the colors represent clusters identified by the algorithm, not y s provided as input) 11

Anomaly detection learning task given training set of instances output x (1), x (2) x (m) model h H that represents normal x performance task given a previously unseen x determine if x looks normal or anomalous Anomaly detection example Let s say our model is represented by: 1979-2000 average, ±2 stddev Does the data for 2012 look anomalous? 12

Dimensionality reduction given training set of instances x (1), x (2) x (m) output model h H that represents each x with a lower-dimension feature vector while still preserving key properties of the data Dimensionality reduction example We can represent a face using all of the pixels in a given image More effective method (for many tasks): represent each face as a linear combination of eigenfaces 13

Dimensionality reduction example represent each face as a linear combination of eigenfaces = α (1) 1 + α (1) 2 + + α (1) 20 x (1) = α (1) 1, α (1) (1) 2,, α 20 = α (2) 1 + α (2) 2 + + α (2) 20 x (2) = α (2) 1, α (2) (2) 2,, α 20 # of features is now 20 instead of # of pixels in images Other learning tasks later in the semester we ll cover other learning tasks that are not strictly supervised or unsupervised reinforcement learning semi-supervised learning etc. 14