Applied Machine Learning

Similar documents
(Sub)Gradient Descent

Module 12. Machine Learning. Version 2 CSE IIT, Kharagpur

Python Machine Learning

Lecture 1: Machine Learning Basics

CSL465/603 - Machine Learning

CS Machine Learning

Lecture 1: Basic Concepts of Machine Learning

Introduction to Ensemble Learning Featuring Successes in the Netflix Prize Competition

Artificial Neural Networks written examination

Machine Learning and Data Mining. Ensembles of Learners. Prof. Alexander Ihler

Learning From the Past with Experiment Databases

Laboratorio di Intelligenza Artificiale e Robotica

Machine Learning from Garden Path Sentences: The Application of Computational Linguistics

Axiom 2013 Team Description Paper

Probabilistic Latent Semantic Analysis

Rule Learning With Negation: Issues Regarding Effectiveness

Deep search. Enhancing a search bar using machine learning. Ilgün Ilgün & Cedric Reichenbach

Human Emotion Recognition From Speech

Laboratorio di Intelligenza Artificiale e Robotica

A Case Study: News Classification Based on Term Frequency

Twitter Sentiment Classification on Sanders Data using Hybrid Approach

Switchboard Language Model Improvement with Conversational Data from Gigaword

Rule Learning with Negation: Issues Regarding Effectiveness

Purdue Data Summit Communication of Big Data Analytics. New SAT Predictive Validity Case Study

Unsupervised Learning of Word Semantic Embedding using the Deep Structured Semantic Model

CS 446: Machine Learning

Algebra 1, Quarter 3, Unit 3.1. Line of Best Fit. Overview

Course Outline. Course Grading. Where to go for help. Academic Integrity. EE-589 Introduction to Neural Networks NN 1 EE

PREDICTING SPEECH RECOGNITION CONFIDENCE USING DEEP LEARNING WITH WORD IDENTITY AND SCORE FEATURES

Active Learning. Yingyu Liang Computer Sciences 760 Fall

Reducing Features to Improve Bug Prediction

Issues in the Mining of Heart Failure Datasets

Australian Journal of Basic and Applied Sciences

Using Web Searches on Important Words to Create Background Sets for LSI Classification

Speech Recognition at ICSI: Broadcast News and beyond

A New Perspective on Combining GMM and DNN Frameworks for Speaker Adaptation

arxiv: v2 [cs.cv] 30 Mar 2017

Time series prediction

Generative models and adversarial training

ADVANCED MACHINE LEARNING WITH PYTHON BY JOHN HEARTY DOWNLOAD EBOOK : ADVANCED MACHINE LEARNING WITH PYTHON BY JOHN HEARTY PDF

Lecture 10: Reinforcement Learning

Speech Emotion Recognition Using Support Vector Machine

Assignment 1: Predicting Amazon Review Ratings

Learning Structural Correspondences Across Different Linguistic Domains with Synchronous Neural Language Models

Natural Language Processing: Interpretation, Reasoning and Machine Learning

Welcome to. ECML/PKDD 2004 Community meeting

The 9 th International Scientific Conference elearning and software for Education Bucharest, April 25-26, / X

Softprop: Softmax Neural Network Backpropagation Learning

*Net Perceptions, Inc West 78th Street Suite 300 Minneapolis, MN

AQUA: An Ontology-Driven Question Answering System

Mathematics process categories

CS4491/CS 7265 BIG DATA ANALYTICS INTRODUCTION TO THE COURSE. Mingon Kang, PhD Computer Science, Kennesaw State University

System Implementation for SemEval-2017 Task 4 Subtask A Based on Interpolated Deep Neural Networks

Evolutive Neural Net Fuzzy Filtering: Basic Description

Knowledge Transfer in Deep Convolutional Neural Nets

QuickStroke: An Incremental On-line Chinese Handwriting Recognition System

Large-Scale Web Page Classification. Sathi T Marath. Submitted in partial fulfilment of the requirements. for the degree of Doctor of Philosophy

Analysis of Emotion Recognition System through Speech Signal Using KNN & GMM Classifier

Semi-supervised methods of text processing, and an application to medical concept extraction. Yacine Jernite Text-as-Data series September 17.

Discriminative Learning of Beam-Search Heuristics for Planning

Challenges in Deep Reinforcement Learning. Sergey Levine UC Berkeley

CLASSIFICATION OF TEXT DOCUMENTS USING INTEGER REPRESENTATION AND REGRESSION: AN INTEGRATED APPROACH

Learning Methods for Fuzzy Systems

Testing A Moving Target: How Do We Test Machine Learning Systems? Peter Varhol Technology Strategy Research, USA

Calibration of Confidence Measures in Speech Recognition

Detailed course syllabus

Seminar - Organic Computing

Universidade do Minho Escola de Engenharia

Probability and Statistics Curriculum Pacing Guide

Objectives. Chapter 2: The Representation of Knowledge. Expert Systems: Principles and Programming, Fourth Edition

Statewide Framework Document for:

Reinforcement Learning by Comparing Immediate Reward

Using focal point learning to improve human machine tacit coordination

Learning Distributed Linguistic Classes

Algebra 2- Semester 2 Review

Corrective Feedback and Persistent Learning for Information Extraction

Radius STEM Readiness TM

Word Segmentation of Off-line Handwritten Documents

Model Ensemble for Click Prediction in Bing Search Ads

The Boosting Approach to Machine Learning An Overview

Exploration. CS : Deep Reinforcement Learning Sergey Levine

A Reinforcement Learning Variant for Control Scheduling

TD(λ) and Q-Learning Based Ludo Players

OCR for Arabic using SIFT Descriptors With Online Failure Prediction

Predicting Student Attrition in MOOCs using Sentiment Analysis and Neural Networks

Semi-Supervised Face Detection

STT 231 Test 1. Fill in the Letter of Your Choice to Each Question in the Scantron. Each question is worth 2 point.

Dinesh K. Sharma, Ph.D. Department of Management School of Business and Economics Fayetteville State University

Self Study Report Computer Science

An OO Framework for building Intelligence and Learning properties in Software Agents

A Neural Network GUI Tested on Text-To-Phoneme Mapping

arxiv: v1 [cs.lg] 15 Jun 2015

Shockwheat. Statistics 1, Activity 1

SARDNET: A Self-Organizing Feature Map for Sequences

Chapter 2 Rule Learning in a Nutshell

Natural Language Processing. George Konidaris

Master s Programme in Computer, Communication and Information Sciences, Study guide , ELEC Majors

We are strong in research and particularly noted in software engineering, information security and privacy, and humane gaming.

OPTIMIZATINON OF TRAINING SETS FOR HEBBIAN-LEARNING- BASED CLASSIFIERS

Spring 2016 Stony Brook University Instructor: Dr. Paul Fodor

Transcription:

Applied Spring 2018, CS 519 Prof. Liang Huang School of EECS Oregon State University liang.huang@oregonstate.edu

is Everywhere A breakthrough in machine learning would be worth ten Microsofts (Bill Gates) 2

AI subfields and breakthroughs Artificial Intelligence AI search data mining planning machine learning robotics information retrieval natural language processing (NLP) computer vision 3

AI subfields and breakthroughs Artificial Intelligence IBM Deep Blue, 1997 AI search (no learning) AI search data mining planning machine learning robotics information retrieval natural language processing (NLP) computer vision 3

AI subfields and breakthroughs Artificial Intelligence IBM Deep Blue, 1997 AI search (no learning) AI search data mining planning machine learning robotics information retrieval natural language processing (NLP) computer vision IBM Watson, 2011 NLP + very little ML 3

AI subfields and breakthroughs Artificial Intelligence IBM Deep Blue, 1997 AI search (no learning) AI search data mining planning information retrieval machine learning robotics natural language processing (NLP) computer vision IBM Watson, 2011 NLP + very little ML Google DeepMind AlphaGo, 2017 deep reinforcement learning + AI search 3

AI subfields and breakthroughs Artificial Intelligence IBM Deep Blue, 1997 AI search (no learning) AI search data mining planning RL information retrieval machine learning robotics DL natural language processing (NLP) computer vision IBM Watson, 2011 NLP + very little ML Google DeepMind AlphaGo, 2017 deep reinforcement learning + AI search 3

The Future of Software Engineering See when AI comes, I ll be long gone (being replaced by autonomous cars) but the programmers in those companies will be too, by automatic program generators. --- an Uber driver to an ML prof Uber uses tons of AI/ML: route planning, speech/dialog, recommendation, etc. 4

Failures 5

Failures liang s rule: if you see X carefully in China, just don t do it. 5

Failures 6

Failures 7

Failures clear evidence that AI/ML is used in real life. 7

Part II: Basic Components of Algorithms; Different Types of Learning 8

What is = Automating Automation Getting computers to program themselves Let the data do the work instead! Traditional Programming I love Oregon rule-based translation (1950-2000) Input Computer Program Output I love Oregon Input Output Computer Program (2003-now) 9

Magic? No, more like gardening Seeds = Algorithms Nutrients = Data Gardener = You Plants = Programs There is no better data than more data 10

ML in a Nutshell Tens of thousands of machine learning algorithms Hundreds new every year Every machine learning algorithm has three components: Representation Evaluation Optimization 11

Representation Separating Hyperplanes Support vectors Decision trees Sets of rules / Logic programs Instances (Nearest Neighbor) Graphical models (Bayes/Markov nets) Neural networks Model ensembles Etc. 12

Evaluation Accuracy Precision and recall Squared error Likelihood Posterior probability Cost / Utility Margin Entropy K-L divergence Etc. 13

Optimization Combinatorial optimization E.g.: Greedy search, Dynamic programming Convex optimization E.g.: Gradient descent, Coordinate descent Constrained optimization E.g.: Linear programming, Quadratic programming 14

Gradient Descent if learning rate is too small, it ll converge very slowly if learning rate is too big, it ll diverge 15

Types of Learning Supervised (inductive) learning Training data includes desired outputs Unsupervised learning Training data does not include desired outputs Semi-supervised learning Training data includes a few desired outputs Reinforcement learning Rewards from sequence of actions cat cat dog dog rules white win 16

Supervised Learning Given examples (X, f(x)) for an unknown function f Find a good approximation of function f Discrete f(x): Classification (binary, multiclass, structured) Continuous f(x): Regression 17

When is Supervised Learning Useful when there is no human expert input x: bond graph for a new molecule output f(x): predicted binding strength to AIDS protease when humans can perform the task but can t describe it computer vision: face recognition, OCR where the desired function changes frequently stock price prediction, spam filtering where each user needs a customized function speech recognition, spam filtering 18

Supervised Learning: Classification input X: feature representation ( observation ) 19

Supervised Learning: Classification input X: feature representation ( observation ) (not a good feature) 19

Supervised Learning: Classification input X: feature representation ( observation ) (not a good feature) (a good feature) 19

Supervised Learning: Classification input X: feature representation ( observation ) (not a good feature) (a good feature) 19

Supervised Learning: Classification input X: feature representation ( observation ) 20

Supervised Learning: Regression linear and non-linear regression overfitting and underfitting (same as in classification) 21

What We ll Cover Supervised learning Nearest Neighbors (week 1) Linear Classification (Perceptron and Extensions) (weeks 2-3) Support Vector Machines (weeks 4-5) Kernel Methods (week 5) Structured Prediction (weeks 7-8) Neural Networks and Deep Learning (week 10) Unsupervised learning (week 9) Clustering (k-means, EM) Dimensionality reduction (PCA etc.) 22

Part III: Training, Test, and Generalization Errors; Underfitting and Overfitting; Methods to Prevent Overfitting; Cross-Validation and Leave-One-Out 23

Training, Test, & Generalization Errors in general, as training progresses, training error decreases test error initially decreases, but eventually increases! at that point, the model has overfit to the training data (memorizes noise or outliers) but in reality, you don t know the test data a priori ( blind-test ) generalization error: error on previously unseen data expectation of test error assuming a test data distribution often use a held-out set to simulate test error and do early stopping 24

Under/Over-fitting due to Model underfitting / overfitting occurs due to under/over-training (last slide) underfitting / overfitting also occurs because of model complexity underfitting due to oversimplified model ( as simple as possible, but not simpler! ) overfitting due to overcomplicated model (memorizes noise or outliers in data!) extreme case: the model memorizes the training data, but no generalization! underfitting underfitting underfitting (model complexity) overfitting overfitting 25

Ways to Prevent Overfitting use held-out training data to simulate test data (early stopping) reserve a small subset of training data as development set (aka validation set, dev set, etc) regularization (explicit control of model complexity) more training data (overfitting is more likely on small data) assuming same model complexity polynomials of degree 9 26

Leave-One-Out Cross-Validation what s the best held-out set? random? what if not representative? what if we use every subset in turn? leave-one-out cross-validation train on all but the last sample, test on the last; etc. average the validation errors or divide data into N folds, train on folds 1..(N-1), test on fold N; etc. this is the best approximation of generalization error 27

Part IV: k-nearest Neighbor Classifier 28

Nearest Neighbor Classifier assign label of test example according to the majority of the closest neighbors in training set extremely simple: no training procedure! 1-NN: extreme overfitting; k-nn is better as k increases, the boundaries become smoother k=+? majority vote (extreme underfitting) k=1: red k=3: red k=5: blue 29

Quiz Question what are the leave-one-out cross-validation errors for the following data set, using 1-NN and 3-NN? 30

Quiz Question what are the leave-one-out cross-validation errors for the following data set, using 1-NN and 3-NN? Ans: 1-NN: 5/10; 3-NN: 1/10 30