Welcome to CMPS 142 and 242: Machine Learning

Similar documents
Lecture 1: Machine Learning Basics

Course Outline. Course Grading. Where to go for help. Academic Integrity. EE-589 Introduction to Neural Networks NN 1 EE

CSL465/603 - Machine Learning

Module 12. Machine Learning. Version 2 CSE IIT, Kharagpur

Introduction to Ensemble Learning Featuring Successes in the Netflix Prize Competition

(Sub)Gradient Descent

Python Machine Learning

Rule Learning With Negation: Issues Regarding Effectiveness

Rule Learning with Negation: Issues Regarding Effectiveness

Active Learning. Yingyu Liang Computer Sciences 760 Fall

Lecture 1: Basic Concepts of Machine Learning

CS Machine Learning

OCR for Arabic using SIFT Descriptors With Online Failure Prediction

Semi-Supervised Face Detection

Generative models and adversarial training

Iterative Cross-Training: An Algorithm for Learning from Unlabeled Web Pages

Assignment 1: Predicting Amazon Review Ratings

Probabilistic Latent Semantic Analysis

Twitter Sentiment Classification on Sanders Data using Hybrid Approach

The 9 th International Scientific Conference elearning and software for Education Bucharest, April 25-26, / X

On-Line Data Analytics

Semi-supervised methods of text processing, and an application to medical concept extraction. Yacine Jernite Text-as-Data series September 17.

Machine Learning and Data Mining. Ensembles of Learners. Prof. Alexander Ihler

Learning From the Past with Experiment Databases

OPTIMIZATINON OF TRAINING SETS FOR HEBBIAN-LEARNING- BASED CLASSIFIERS

Evolutive Neural Net Fuzzy Filtering: Basic Description

Learning Structural Correspondences Across Different Linguistic Domains with Synchronous Neural Language Models

Word Segmentation of Off-line Handwritten Documents

Welcome to. ECML/PKDD 2004 Community meeting

Human Emotion Recognition From Speech

A study of speaker adaptation for DNN-based speech synthesis

Laboratorio di Intelligenza Artificiale e Robotica

Artificial Neural Networks written examination

Laboratorio di Intelligenza Artificiale e Robotica

ADVANCED MACHINE LEARNING WITH PYTHON BY JOHN HEARTY DOWNLOAD EBOOK : ADVANCED MACHINE LEARNING WITH PYTHON BY JOHN HEARTY PDF

Mining Association Rules in Student s Assessment Data

Softprop: Softmax Neural Network Backpropagation Learning

Software Maintenance

Citrine Informatics. The Latest from Citrine. Citrine Informatics. The data analytics platform for the physical world

Unsupervised Learning of Word Semantic Embedding using the Deep Structured Semantic Model

Language Acquisition Fall 2010/Winter Lexical Categories. Afra Alishahi, Heiner Drenhaus

Chinese Language Parsing with Maximum-Entropy-Inspired Parser

Impact of Cluster Validity Measures on Performance of Hybrid Models Based on K-means and Decision Trees

Massachusetts Institute of Technology Tel: Massachusetts Avenue Room 32-D558 MA 02139

Business Analytics and Information Tech COURSE NUMBER: 33:136:494 COURSE TITLE: Data Mining and Business Intelligence

Version Space. Term 2012/2013 LSI - FIB. Javier Béjar cbea (LSI - FIB) Version Space Term 2012/ / 18

Why Did My Detector Do That?!

Reducing Features to Improve Bug Prediction

A survey of multi-view machine learning

Spring 2014 SYLLABUS Michigan State University STT 430: Probability and Statistics for Engineering

CS 1103 Computer Science I Honors. Fall Instructor Muller. Syllabus

Learning Methods for Fuzzy Systems

Exploration. CS : Deep Reinforcement Learning Sergey Levine

Axiom 2013 Team Description Paper

Top US Tech Talent for the Top China Tech Company

Applications of data mining algorithms to analysis of medical data

Probability and Statistics Curriculum Pacing Guide

Mining Student Evolution Using Associative Classification and Clustering

Handling Concept Drifts Using Dynamic Selection of Classifiers

Testing A Moving Target: How Do We Test Machine Learning Systems? Peter Varhol Technology Strategy Research, USA

Machine Learning and Development Policy

FF+FPG: Guiding a Policy-Gradient Planner

TD(λ) and Q-Learning Based Ludo Players

Courses in English. Application Development Technology. Artificial Intelligence. 2017/18 Spring Semester. Database access

Deep search. Enhancing a search bar using machine learning. Ilgün Ilgün & Cedric Reichenbach

Model Ensemble for Click Prediction in Bing Search Ads

Lecture 10: Reinforcement Learning

A Neural Network GUI Tested on Text-To-Phoneme Mapping

Computerized Adaptive Psychological Testing A Personalisation Perspective

Algebra 1, Quarter 3, Unit 3.1. Line of Best Fit. Overview

Automatic Discretization of Actions and States in Monte-Carlo Tree Search

Georgetown University at TREC 2017 Dynamic Domain Track

arxiv: v1 [cs.lg] 15 Jun 2015

Machine Learning from Garden Path Sentences: The Application of Computational Linguistics

Comparison of network inference packages and methods for multiple networks inference

MGT/MGP/MGB 261: Investment Analysis

Calibration of Confidence Measures in Speech Recognition

EDCI 699 Statistics: Content, Process, Application COURSE SYLLABUS: SPRING 2016

FSL-BM: Fuzzy Supervised Learning with Binary Meta-Feature for Classification

Learning Methods in Multilingual Speech Recognition

The Boosting Approach to Machine Learning An Overview

Linking Task: Identifying authors and book titles in verbose queries

The Good Judgment Project: A large scale test of different methods of combining expert predictions

Knowledge Transfer in Deep Convolutional Neural Nets

Experiment Databases: Towards an Improved Experimental Methodology in Machine Learning

Pp. 176{182 in Proceedings of The Second International Conference on Knowledge Discovery and Data Mining. Predictive Data Mining with Finite Mixtures

Chapter 2 Rule Learning in a Nutshell

Robot Learning Simultaneously a Task and How to Interpret Human Instructions

WHEN THERE IS A mismatch between the acoustic

Experiments with SMS Translation and Stochastic Gradient Descent in Spanish Text Author Profiling

arxiv: v2 [cs.cv] 30 Mar 2017

Rule discovery in Web-based educational systems using Grammar-Based Genetic Programming

QuickStroke: An Incremental On-line Chinese Handwriting Recognition System

EECS 571 PRINCIPLES OF REAL-TIME COMPUTING Fall 10. Instructor: Kang G. Shin, 4605 CSE, ;

have to be modeled) or isolated words. Output of the system is a grapheme-tophoneme conversion system which takes as its input the spelling of words,

Class-Discriminative Weighted Distortion Measure for VQ-Based Speaker Identification

Learning Optimal Dialogue Strategies: A Case Study of a Spoken Dialogue Agent for

DOCTORAL SCHOOL TRAINING AND DEVELOPMENT PROGRAMME

Switchboard Language Model Improvement with Conversational Data from Gigaword

Cal s Dinner Card Deals

Transcription:

Welcome to CMPS 142 and 242: Machine Learning Instructor: David Helmbold, dph@soe.ucsc.edu Office hours: Monday 1:30-2:30, Thursday 4:15-5:00 TA: Aaron Michelony, amichelo@soe.ucsc.edu Web page: www.soe.ucsc.edu/classes/cmps242/fall13/01 Text: Pattern Recognition and Machine Learning, by Bishop 1

Administrivia Sign up sheet (enrollment) Evaluation: Group Homework 20% Late midterm exam 40 % Projects (group) 40 % Must pass both exam and project Expectations/Style Reading assignments Attendance/participation My hearing/writing Academic honesty Topics: Introduction Bayesian learning and parameter estimation Instance based methods Linear Regression Linear Classification Decision Trees and Neural networks Graphical Models Support Vector Machines Clustering, EM Algorithm Boosting (AdaBoost) On-line prediction Reinforcement learning 2

Lecture Slides for INTRODUCTION TO Machine Learning ETHEM ALPAYDIN The MIT Press, 2004 (modified by DPH 2006--2011) alpaydin@boun.edu.tr http://www.cmpe.boun.edu.tr/~ethem/i2ml

CHAPTER 1: Introduction

Why Learn? Machine learning is programming computers to optimize a performance criterion using example data or past experience (inference in statistics) There is no need to learn to calculate payroll Learning is used when: Human expertise does not exist (navigating on Mars), Humans are unable to explain their expertise (speech recognition, object detection) Solution changes in time (routing on a computer network) Solution needs to be adapted or customized to particular cases (or users) 5

What We Talk About When We Talk About Learning Learning general models from a set of particular examples Data is cheap and abundant (data warehouses, data marts); knowledge is expensive and scarce. Example in retail: Customer transactions to consumer behavior: People who bought Da Vinci Code also bought The Five People You Meet in Heaven (www.amazon.com) Build a model that is a good and useful approximation to the data. 6

What is Machine Learning? Optimize a performance criterion using example data or past experience. Role of Statistics: Inference from a sample Role of Computer science: Efficient algorithms to Solve the optimization problem Representing and evaluating the model for inference 7

Stat. Machine learning is not: Cognitive science (how people think/learn) Teaching computers to think But is related to: Statistics Data Mining - KDD Control theory part of AI, but not traditional AI 8

Supervised Batch Learning Assume (unknown) distribution over things Things have measurable attributes or features Get instances x by drawing things from distribution and recording observations. Teacher labels instances making examples (x, y) or (x, t) (Bishop) Set of labeled examples is the training set or sample Create hypothesis (rule) from sample hypothesis predicts on new random instances, evaluated using a loss function 9

Supervised Learning Framework learning prediction 10

Supervised Learning (cont.) Classification: labels are nominal (unordered set, e.g. {ham, spam} {democrat, republican, indep.}) Binary Classification Regression: labels are numeric (e.g. price of used car) Sometimes predictions are probabilities 11

Examples Thing Observations Prediction Written Digit Pixel array Which digit? Email message Words, Subject, sender Ham or Spam? Customer Recent purchases interest level in a new product Used car Year, make, mpg, options Price or value 12

Regression Example: Price of a used car x : car attributes t : price assume t = g (x θ ) g ( ) model (e.g. linear) θ parameters (w, w 0 ) t y(x) = wx+w 0 x 13

Batch Assumption: iid Examples Distribution of things and measurements defines some unknown (but fixed) P(x,t) over domain-label pairs Find a hypothesis h that is close to the truth A loss function L(t, t ) measures error of predictions, often L(t, t )=0 if t=t and L(t,t )=1 otherwise (classification) Want to minimize P(x,t) L(t, h(x)) -- probability of error for 0-1 loss Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning The MIT Press (V1.1) 14

Supervised Learning: Uses Prediction of future cases: Use the rule to predict the output for future inputs Knowledge extraction: The rule is easy to understand Compression: The rule is simpler than the data it explains Outlier detection: Exceptions that are not covered by the rule, e.g., fraud and data entry errors 15

Can we Generalize? Learning is an ill-posed problem: If we assume nothing else, any label t could be likely for an unseen x Need an inductive bias limiting possible P(x,t) Often assume some kind of simplicity (e.g. linearity) based on domain knowledge Bayesian approach: put prior on rules, and balance prior with evidence (data) 16

Noise Data not always perfect Unmeasured Features Attribute noise (random or systemic) Label noise (random or systemic) Noise associated with inductive bias errors Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning The MIT Press (V1.1) 17

Overfitting and Underfitting Overfitting happens when the hypothesis that is too complex for the truth Underfitting happens when the hypothesis is too simple. 18

Bishop fig 1.4 19

Sup. Learning as parameter estimation Model (hypothesis set or class): h θ (x) Empirical error Error/Loss function: N E θ = L( t n,h θ (x n )) n =1 + regularization(θ) Optimization procedure: ˆ θ argmin(e θ ) θ Regularization penalizes complex θ Model choice + regularization = inductive bias! Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning The MIT Press (V1.1) 20

Don t rely on training error! To estimate generalization error, we need data unseen during training. Often data split into Training set (50%) Validation set (25%) (did training work? Use for Parameter selection/model complexity ) Final Test (publication) set (25%) Resampling when there are few examples cross validation (describe) Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning The MIT Press (V1.1) 21

Other kinds of supervised learning Reinforcement learning - learning a policy for influencing or reacting to environment No supervised output, but delayed rewards Credit assignment problem Game playing/robot in a maze, etc. On-line learning: predict on each instance in turn Semi-supervised learning uses both labeled and unlabeled data Active learning request labels for particular instances 22

Unsupervised Learning Learning what normally happens No labels Clustering: Grouping similar instances Example applications Segmentation in customer relationship mgmt Image compression: Color quantization Bioinformatics: Learning motifs Identifying unusual Airplane landings Deep learning learn the features 23

Resources: Datasets UCI Repository: http://www.ics.uci.edu/~mlearn/mlrepository.html UCI KDD Archive: http://kdd.ics.uci.edu/summary.data.application.html Statlib: http://lib.stat.cmu.edu/ Delve: http://www.cs.utoronto.ca/~delve/ MLcomp: http://mlcomp.org 24

Resources: Journals Journal of Machine Learning Research www.jmlr.org Machine Learning Neural Computation Neural Networks IEEE Transactions on Neural Networks IEEE Transactions on Pattern Analysis and Machine Intelligence Annals of Statistics Journal of the American Statistical Association... 25

Resources: Conferences International Conference on Machine Learning (ICML) Neural Information Processing Systems (NIPS) Uncertainty in Artificial Intelligence (UAI) Computational Learning Theory (COLT) European Conference on Machine Learning (ECML) Knowledge Discovery and Data Mining (KDD) International Joint Conference on Artificial Intelligence (IJCAI) International Conference on Neural Networks (ICANN)... 26