CSE 446 Machine Learning

Similar documents
(Sub)Gradient Descent

Introduction to Ensemble Learning Featuring Successes in the Netflix Prize Competition

Lecture 1: Machine Learning Basics

Module 12. Machine Learning. Version 2 CSE IIT, Kharagpur

CS4491/CS 7265 BIG DATA ANALYTICS INTRODUCTION TO THE COURSE. Mingon Kang, PhD Computer Science, Kennesaw State University

Lecture 1: Basic Concepts of Machine Learning

Python Machine Learning

CSL465/603 - Machine Learning

CS Machine Learning

COSI Meet the Majors Fall 17. Prof. Mitch Cherniack Undergraduate Advising Head (UAH), COSI Fall '17: Instructor COSI 29a

Machine Learning and Data Mining. Ensembles of Learners. Prof. Alexander Ihler

Twitter Sentiment Classification on Sanders Data using Hybrid Approach

Machine Learning from Garden Path Sentences: The Application of Computational Linguistics

Semi-Supervised Face Detection

CS 446: Machine Learning

Human Emotion Recognition From Speech

Rule Learning With Negation: Issues Regarding Effectiveness

CLASSIFICATION OF TEXT DOCUMENTS USING INTEGER REPRESENTATION AND REGRESSION: AN INTEGRATED APPROACH

Learning From the Past with Experiment Databases

Probabilistic Latent Semantic Analysis

A Case Study: News Classification Based on Term Frequency

Semi-supervised methods of text processing, and an application to medical concept extraction. Yacine Jernite Text-as-Data series September 17.

Rule Learning with Negation: Issues Regarding Effectiveness

Business Analytics and Information Tech COURSE NUMBER: 33:136:494 COURSE TITLE: Data Mining and Business Intelligence

Laboratorio di Intelligenza Artificiale e Robotica

Spring 2014 SYLLABUS Michigan State University STT 430: Probability and Statistics for Engineering

CS 1103 Computer Science I Honors. Fall Instructor Muller. Syllabus

Reducing Features to Improve Bug Prediction

The 9 th International Scientific Conference elearning and software for Education Bucharest, April 25-26, / X

Probability and Statistics Curriculum Pacing Guide

Switchboard Language Model Improvement with Conversational Data from Gigaword

Interactive Whiteboard

Indian Institute of Technology, Kanpur

Speech Emotion Recognition Using Support Vector Machine

Australian Journal of Basic and Applied Sciences

Active Learning. Yingyu Liang Computer Sciences 760 Fall

Measurement. When Smaller Is Better. Activity:

Copyright 2002 by the McGraw-Hill Companies, Inc.

Introduction to Forensic Drug Chemistry

Data Structures and Algorithms

Exploration. CS : Deep Reinforcement Learning Sergey Levine

Axiom 2013 Team Description Paper

Generative models and adversarial training

Assignment 1: Predicting Amazon Review Ratings

Office Hours: Mon & Fri 10:00-12:00. Course Description

Lecture 10: Reinforcement Learning

Intensive English Program Southwest College

San José State University Department of Psychology PSYC , Human Learning, Spring 2017

ScienceDirect. A Framework for Clustering Cardiac Patient s Records Using Unsupervised Learning Techniques

STA 225: Introductory Statistics (CT)

CS 100: Principles of Computing

ENME 605 Advanced Control Systems, Fall 2015 Department of Mechanical Engineering

Word Segmentation of Off-line Handwritten Documents

Lahore University of Management Sciences. FINN 321 Econometrics Fall Semester 2017

Reinforcement Learning by Comparing Immediate Reward

Activity Recognition from Accelerometer Data

Universidade do Minho Escola de Engenharia

Time series prediction

A survey of multi-view machine learning

CS 101 Computer Science I Fall Instructor Muller. Syllabus

ENEE 302h: Digital Electronics, Fall 2005 Prof. Bruce Jacob

The Boosting Approach to Machine Learning An Overview

Learning Methods in Multilingual Speech Recognition

QuickStroke: An Incremental On-line Chinese Handwriting Recognition System

Laboratorio di Intelligenza Artificiale e Robotica

S T A T 251 C o u r s e S y l l a b u s I n t r o d u c t i o n t o p r o b a b i l i t y

Syllabus for CHEM 4660 Introduction to Computational Chemistry Spring 2010

Spring 2015 Natural Science I: Quarks to Cosmos CORE-UA 209. SYLLABUS and COURSE INFORMATION.

ACTL5103 Stochastic Modelling For Actuaries. Course Outline Semester 2, 2014

Purdue Data Summit Communication of Big Data Analytics. New SAT Predictive Validity Case Study

Using dialogue context to improve parsing performance in dialogue systems

Full text of O L O W Science As Inquiry conference. Science as Inquiry

A Reinforcement Learning Variant for Control Scheduling

Speech Recognition at ICSI: Broadcast News and beyond

OPTIMIZATINON OF TRAINING SETS FOR HEBBIAN-LEARNING- BASED CLASSIFIERS

Analysis of Emotion Recognition System through Speech Signal Using KNN & GMM Classifier

A Comparison of Two Text Representations for Sentiment Analysis

Evolutive Neural Net Fuzzy Filtering: Basic Description

Agents and environments. Intelligent Agents. Reminders. Vacuum-cleaner world. Outline. A vacuum-cleaner agent. Chapter 2 Actuators

Experiment Databases: Towards an Improved Experimental Methodology in Machine Learning

arxiv: v1 [cs.lg] 15 Jun 2015

ADVANCED MACHINE LEARNING WITH PYTHON BY JOHN HEARTY DOWNLOAD EBOOK : ADVANCED MACHINE LEARNING WITH PYTHON BY JOHN HEARTY PDF

Softprop: Softmax Neural Network Backpropagation Learning

MYCIN. The MYCIN Task

Welcome to. ECML/PKDD 2004 Community meeting

Extracting Opinion Expressions and Their Polarities Exploration of Pipelines and Joint Models

MGT/MGP/MGB 261: Investment Analysis

Syllabus - ESET 369 Embedded Systems Software, Fall 2016

ReinForest: Multi-Domain Dialogue Management Using Hierarchical Policies and Knowledge Ontology

A Study of the Effectiveness of Using PER-Based Reforms in a Summer Setting

Discriminative Learning of Beam-Search Heuristics for Planning

Accounting 312: Fundamentals of Managerial Accounting Syllabus Spring Brown

Math 96: Intermediate Algebra in Context

A Version Space Approach to Learning Context-free Grammars

EECS 571 PRINCIPLES OF REAL-TIME COMPUTING Fall 10. Instructor: Kang G. Shin, 4605 CSE, ;

*Net Perceptions, Inc West 78th Street Suite 300 Minneapolis, MN

Algebra 1, Quarter 3, Unit 3.1. Line of Best Fit. Overview

Detecting Wikipedia Vandalism using Machine Learning Notebook for PAN at CLEF 2011

AP Chemistry

Comparison of network inference packages and methods for multiple networks inference

Transcription:

CSE 446 Machine What is Machine? Daniel Weld Xiao Ling Congle Zhang 1 2 Machine Study of algorithms that improve their performance at some task with experience Why? Data Machine Understanding Is this topic important? 3 4 Exponential Growth in Data Supremacy of Machine Data Machine Understanding 5 Machine learning is preferred approach to Speech recognition, Natural language processing Web search result ranking Computer vision Medical outcomes analysis Robot control Computational biology Sensor networks This trend is accelerating Improved machine learning algorithms Improved data capture, networking, faster computers Software too complex to write by hand New sensors / IO devices Demand for self-customization to user, environment 6 1

Syllabus Logistics Covers a wide range of Machine techniques from basic to state-of-the-art You will learn about the methods you heard about: Naïve Bayes, logistic regression, nearest-neighbor, decision trees, boosting, neural nets, overfitting, regularization, dimensionality reduction, error bounds, loss function, VC dimension, SVMs, kernels, margin bounds, K-means, EM, mixture models, semisupervised learning, HMMs, graphical models, active learning Covers algorithms, theory and applications It s going to be fun and hard work 7, D. Weld, 8 Prerequisites Staff Probabilities Distributions, densities, marginalization Basic statistics Moments, typical distributions, regression Algorithms Dynamic programming, basic data structures, complexity Programming Mostly your choice of language, but Python (NumPy) Matlab will be very useful We provide some background, but the class will be fast paced Ability to deal with abstract mathematical concepts Two Great TAs: Fantastic resource for learning, interact with them! Xiao Ling, CSE 610, xiaoling@cs Office hours: TBA Congle Zhang, CSE 524, clzhang@cs Office hours: TBA Administrative Assistant Alicen Smith, CSE 546, asmith@cs 9 10 Text Books Required Text: Pattern Recognition and Machine ; Chris Bishop Optional: Machine ; Tom Mitchell The Elements of Statistical : Data Mining, Inference, and Prediction; Trevor Hastie, Robert Tibshirani, Jerome Friedman Information Theory, Inference, and Algorithms; David MacKay Website: Andrew Ng s AI class videos Website: Tom Mitchell s AI class videos Grading 4 homeworks (55%) First one goes out Fri 1/6/12 Start early, Start early, Start early, Start early, Start early, Start early, Start early, Start early, Start early, Start early Midterm (15%) Circa Feb 10 in class Final (30%) TBD by registrar 11 12 2

Homeworks Communication Homeworks are hard, start early Due at the beginning of class Minus 33% credit for each day (or part of day) late All homeworks must be handed in, even for zero credit Collaboration You may discuss the questions Each student writes their own answers Write on your homework anyone with whom you collaborate Each student must write their own code for the programming part Please don t search for answers on the web, Google, previous years homeworks, etc. Ask us if you are not sure if you can use a particular reference Main discussion board https://catalyst.uw.edu/gopost/board/xling/25219/ Urgent announcements cse446@cs Subscribe: http://mailman.cs.washington.edu/mailman/listinfo/cs e446 To email instructors, always use: cse446_instructor@cs 13 14 Space of ML Problems What is Being Learned? Type of Supervision (eg, Experience, Feedback) Labeled Examples Discrete Classification Continuous Regression Policy Apprenticeship Reward Reinforcement Nothing Clustering Classification from data to discrete classes 15 16 Spam filtering data prediction 17 18 3

Text classification Object detection (Prof. H. Schneiderman) Company home page vs Personal home page vs Univeristy home page vs Example training images for each orientation 19 20 Weather prediction Reading a noun (vs verb) [Rustandi et al., 2005] 21 22 The classification pipeline Training Regression Testing predicting a numeric value 23 24 4

Stock market Weather prediction revisted Temperature 25 26 Modeling sensor data Measure temperatures at some locations Predict temperatures throughout the environment Clustering discovering structure in data [Guestrin et al. 04] 27 28 Clustering Data: Group similar things Clustering images Set of Images [Goldberger et al.] 30 5

Clustering web search results Reinforcement training by feedback 31 32 Reinforcement to act Reinforcement learning An agent Makes sensor observations Must select action Receives rewards positive for good states negative for bad states [Ng et al. 05] 33 35 In Summary In Summary What is Being Learned? Type of Supervision (eg, Experience, Feedback) Labeled Examples Discrete Classification Continuous Regression Policy Apprenticeship Reward Reinforcement Nothing Clustering What is Being Learned? Type of Supervision (eg, Experience, Feedback) Labeled Examples Discrete Classification Continuous Regression Policy Apprenticeship Reward Reinforcement Nothing Clustering 36 37 6

Classifier Hypothesis: for labeling examples Key Concepts 3.0 0.0 1.0 2.0? Label: Label: -??? 0.0 1.0 2.0 3.0 4.0 5.0 6.0 38 Generalization ML = Approximation Hypotheses must generalize to correctly classify instances not in the training data. Simply memorizing training examples is a consistent hypothesis that does not generalize. c(x) May not be any perfect fit Classification ~ discrete functions h(x) = contains(`nigeria, x) contains(`wire-transfer, x) h(x) 40 41 x Why is Possible? Experience alone never justifies any conclusion about any unseen instance. occurs when PREJUDICE meets DATA! Bias The nice word for prejudice is bias. Different from Bias in statistics What kind of hypotheses will you consider? What is allowable range of functions you use when approximating? What kind of hypotheses do you prefer? a Frobnitz Daniel S. Weld 42 Daniel S. Weld 43 7

Some Typical Biases ML as Optimization Occam s razor It is needless to do more when less will suffice William of Occam, died 1349 of the Black plague MDL Minimum description length Concepts can be approximated by... conjunctions of predicates... by linear functions... by short decision trees Specify Preference Bias aka Loss Solve using optimization Combinatorial Convex Linear Nasty Daniel S. Weld 44 45 Overfitting Overfitting Hypothesis H is overfit when H and H has smaller error on training examples, but H has bigger error on test examples Hypothesis H is overfit when H and H has smaller error on training examples, but H has bigger error on test examples Causes of overfitting Training set is too small Large number of features Big problem in machine learning One solution: Validation set Overfitting 0.9 Accuracy On training data On test data 08 0.8 The Road Ahead 0.7 0.6 Model complexity (e.g., number of nodes in decision tree) Daniel S. Weld 48 49 8