Welcome to CMPS 142 Machine Learning

Similar documents
Lecture 1: Machine Learning Basics

(Sub)Gradient Descent

Module 12. Machine Learning. Version 2 CSE IIT, Kharagpur

Python Machine Learning

Introduction to Ensemble Learning Featuring Successes in the Netflix Prize Competition

CSL465/603 - Machine Learning

Machine Learning and Data Mining. Ensembles of Learners. Prof. Alexander Ihler

CS Machine Learning

Assignment 1: Predicting Amazon Review Ratings

Lecture 1: Basic Concepts of Machine Learning

OCR for Arabic using SIFT Descriptors With Online Failure Prediction

Course Outline. Course Grading. Where to go for help. Academic Integrity. EE-589 Introduction to Neural Networks NN 1 EE

Learning From the Past with Experiment Databases

CS 446: Machine Learning

Version Space. Term 2012/2013 LSI - FIB. Javier Béjar cbea (LSI - FIB) Version Space Term 2012/ / 18

Active Learning. Yingyu Liang Computer Sciences 760 Fall

The 9 th International Scientific Conference elearning and software for Education Bucharest, April 25-26, / X

Artificial Neural Networks written examination

CS4491/CS 7265 BIG DATA ANALYTICS INTRODUCTION TO THE COURSE. Mingon Kang, PhD Computer Science, Kennesaw State University

Twitter Sentiment Classification on Sanders Data using Hybrid Approach

Reducing Features to Improve Bug Prediction

On-Line Data Analytics

Software Maintenance

Language Acquisition Fall 2010/Winter Lexical Categories. Afra Alishahi, Heiner Drenhaus

Rule Learning with Negation: Issues Regarding Effectiveness

Semi-supervised methods of text processing, and an application to medical concept extraction. Yacine Jernite Text-as-Data series September 17.

Why Did My Detector Do That?!

A Case Study: News Classification Based on Term Frequency

Rule Learning With Negation: Issues Regarding Effectiveness

OPTIMIZATINON OF TRAINING SETS FOR HEBBIAN-LEARNING- BASED CLASSIFIERS

Exploration. CS : Deep Reinforcement Learning Sergey Levine

Generative models and adversarial training

Iterative Cross-Training: An Algorithm for Learning from Unlabeled Web Pages

Class-Discriminative Weighted Distortion Measure for VQ-Based Speaker Identification

Human Emotion Recognition From Speech

Semi-Supervised Face Detection

ADVANCED MACHINE LEARNING WITH PYTHON BY JOHN HEARTY DOWNLOAD EBOOK : ADVANCED MACHINE LEARNING WITH PYTHON BY JOHN HEARTY PDF

arxiv: v2 [cs.cv] 30 Mar 2017

CLASSIFICATION OF TEXT DOCUMENTS USING INTEGER REPRESENTATION AND REGRESSION: AN INTEGRATED APPROACH

MGT/MGP/MGB 261: Investment Analysis

Unsupervised Learning of Word Semantic Embedding using the Deep Structured Semantic Model

Laboratorio di Intelligenza Artificiale e Robotica

Learning Structural Correspondences Across Different Linguistic Domains with Synchronous Neural Language Models

Multi-label classification via multi-target regression on data streams

Universidade do Minho Escola de Engenharia

Softprop: Softmax Neural Network Backpropagation Learning

The stages of event extraction

STT 231 Test 1. Fill in the Letter of Your Choice to Each Question in the Scantron. Each question is worth 2 point.

Applications of data mining algorithms to analysis of medical data

Probabilistic Latent Semantic Analysis

have to be modeled) or isolated words. Output of the system is a grapheme-tophoneme conversion system which takes as its input the spelling of words,

Algebra 1, Quarter 3, Unit 3.1. Line of Best Fit. Overview

Cal s Dinner Card Deals

Multivariate k-nearest Neighbor Regression for Time Series data -

Pp. 176{182 in Proceedings of The Second International Conference on Knowledge Discovery and Data Mining. Predictive Data Mining with Finite Mixtures

Lecture 10: Reinforcement Learning

Data Structures and Algorithms

Chapter 2 Rule Learning in a Nutshell

Testing A Moving Target: How Do We Test Machine Learning Systems? Peter Varhol Technology Strategy Research, USA

Probability and Statistics Curriculum Pacing Guide

Model Ensemble for Click Prediction in Bing Search Ads

Business Analytics and Information Tech COURSE NUMBER: 33:136:494 COURSE TITLE: Data Mining and Business Intelligence

Switchboard Language Model Improvement with Conversational Data from Gigaword

Courses in English. Application Development Technology. Artificial Intelligence. 2017/18 Spring Semester. Database access

EDCI 699 Statistics: Content, Process, Application COURSE SYLLABUS: SPRING 2016

Deep search. Enhancing a search bar using machine learning. Ilgün Ilgün & Cedric Reichenbach

Mining Student Evolution Using Associative Classification and Clustering

Word Segmentation of Off-line Handwritten Documents

WHEN THERE IS A mismatch between the acoustic

An Introduction to Simio for Beginners

Georgetown University at TREC 2017 Dynamic Domain Track

Mining Association Rules in Student s Assessment Data

Data Fusion Through Statistical Matching

A study of speaker adaptation for DNN-based speech synthesis

LEGO MINDSTORMS Education EV3 Coding Activities

Learning Methods for Fuzzy Systems

Spring 2014 SYLLABUS Michigan State University STT 430: Probability and Statistics for Engineering

Learning to Rank with Selection Bias in Personal Search

Experiment Databases: Towards an Improved Experimental Methodology in Machine Learning

TUESDAYS/THURSDAYS, NOV. 11, 2014-FEB. 12, 2015 x COURSE NUMBER 6520 (1)

Lahore University of Management Sciences. FINN 321 Econometrics Fall Semester 2017

A survey of multi-view machine learning

Laboratorio di Intelligenza Artificiale e Robotica

An investigation of imitation learning algorithms for structured prediction

Axiom 2013 Team Description Paper

A Neural Network GUI Tested on Text-To-Phoneme Mapping

Linking Task: Identifying authors and book titles in verbose queries

Rule discovery in Web-based educational systems using Grammar-Based Genetic Programming

Handling Concept Drifts Using Dynamic Selection of Classifiers

The Boosting Approach to Machine Learning An Overview

Firms and Markets Saturdays Summer I 2014

TextGraphs: Graph-based algorithms for Natural Language Processing

FSL-BM: Fuzzy Supervised Learning with Binary Meta-Feature for Classification

Data Stream Processing and Analytics

Time series prediction

QuickStroke: An Incremental On-line Chinese Handwriting Recognition System

Extracting Opinion Expressions and Their Polarities Exploration of Pipelines and Joint Models

Speech Emotion Recognition Using Support Vector Machine

Using Web Searches on Important Words to Create Background Sets for LSI Classification

Knowledge Transfer in Deep Convolutional Neural Nets

Transcription:

Welcome to CMPS 142 Machine Learning Instructor: David Helmbold, dph@soe.ucsc.edu Office hours: Tentatively after class Tu-Th 12-1:30. TA: Keshav Mathur, kemathur@ucsc.edu Web page: https://courses.soe.ucsc.edu/courses/cmps142/spring15/01 Text: Andrew Ng s lecture notes: http://cs229.stanford.edu/materials.html 1

Administrivia Sign up sheet (enrollment) Evaluation: Group Homework 30% Late midterm exam 40 % Projects (group) 30 % Must pass exam Expectations/Style Reading assignments Attendance/participation My hearing/writing Academic honesty Topics: Introduction Regression and multiclass (ch 3) Logistic regression Perceptron Naïve Bayes and generative models Nearest Neighbor Support Vector Machines Decision trees Model and feature selection Ensemble methods Learning Theory Unsupervised learning 2

Lecture Slides for INTRODUCTION TO Machine Learning ETHEM ALPAYDIN The MIT Press, 2004 (modified by DPH 2006--2011) alpaydin@boun.edu.tr http://www.cmpe.boun.edu.tr/~ethem/i2ml

CHAPTER 1: Introduction

Why Learn? Machine learning is programming computers to optimize a performance criterion using example data or past experience (inference in statistics) There is no need to learn to calculate payroll Learning is used when: Human expertise does not exist (navigating on Mars), Humans are unable to explain their expertise (speech recognition, object detection) Solution changes in time (routing on a computer network) Solution needs to be adapted or customized to particular cases (or users) 5

What We Talk About When We Talk About Learning Learning general models from a set of particular examples Data is cheap and abundant (data warehouses, data marts); knowledge is expensive and scarce. Example in retail: Customer transactions to consumer behavior: People who bought Da Vinci Code also bought The Five People You Meet in Heaven (www.amazon.com) Build a model that is a good and useful approximation to the data. 6

What is Machine Learning? Optimize a performance criterion using example data or past experience. Role of Statistics: Inference from a sample Role of Computer science: Efficient algorithms to Solve the optimization problem Representing and evaluating the model for inference 7

Statistical Machine learning is not: Cognitive science (how people think/learn) Teaching computers to think But is related to: Statistics Data Mining Knowledge Discovery Control theory part of AI, but not traditional AI 8

Supervised Batch Learning Assume (unknown) distribution over things Things have measurable attributes or features Get instances (feature vectors) x by drawing things from distribution and recording observations. Teacher labels instances making examples (x, y) Set of labeled examples is the training set or sample Create hypothesis (rule or function) from sample hypothesis predicts on new random instances, evaluated using a loss function (e.g. number of mistakes) 9

Supervised Learning (cont.) Classification: labels are nominal (unordered set, e.g. {ham, spam} {democrat, republican, indep.}) Binary Classification Regression: labels are numeric (e.g. price of house) Ranking problems (order a set of objects) 10

Examples Thing Observations Prediction Written Digit Pixel array Which digit? Email message Words, Subject, sender Ham or Spam? Customer Recent purchases interest level in a new product Used car Year, make, mpg, options Price or value 11

Batch Assumption: iid Examples Distribution of things and measurements defines some unknown (but fixed) P(x,y) or D(x,y) over domain-label pairs Find a hypothesis or function f(x) that is close to the truth A loss function L(y, y ) measures error of predictions, often L(y,y )=0 if y=y and L(y,y )=1 otherwise (classification) Want to minimize P(x,y) L(y, f(x)) -- e.g. probability of error for 0-1 loss Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning The MIT Press (V1.1) 12

Supervised Learning: Uses Prediction of future cases: Use the rule to predict the output for future inputs Knowledge extraction: The rule is easy to understand Compression: The rule is simpler than the data it explains Outlier detection: Exceptions that are not covered by the rule, e.g., fraud and data entry errors 13

Can we Generalize? Learning is an ill-posed problem: If we assume nothing else, any label y could be right for an unseen x Need an inductive bias limiting possible P(x,y) Often assume some kind of simplicity (e.g. linearity) based on domain knowledge Bayesian approach: put prior on rules, and balance prior with evidence (data) 14

Noise Data not always perfect Unmeasured Features Attribute noise (random or systemic) Label noise (random or systemic) inductive bias errors may look like noise Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning The MIT Press (V1.1) 15

Overfitting and Underfitting Overfitting happens when the hypothesis is too complex for the truth Underfitting happens when the hypothesis is too simple. 16

Bishop fig 1.4 17

Don t rely on training error! To estimate generalization error, we need data unseen during training. Often data split into Training set (70%) Validation set (10%) (did training work? Use for Parameter selection/model complexity ) Final Test (publication) set (20%) Resampling when there are few examples cross validation (describe) Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning The MIT Press (V1.1) 18

Other kinds of supervised learning Reinforcement learning - learning a policy for influencing or reacting to environment Game playing/robot in a maze, etc. No supervised output, but delayed rewards Credit assignment problem On-line learning: predict on each instance in turn Semi-supervised learning uses both labeled and unlabeled data Active learning request labels for particular instances 19

Unsupervised Learning Learning what normally happens No labels Clustering: Grouping similar instances Example applications Segmentation in customer relationship mgmt Image compression: Color quantization Bioinformatics: Learning motifs Identifying unusual Airplane landings Deep learning learn the features 20