Introduction to Machine Learning Stephen Scott, Dept of CSE

Similar documents
Python Machine Learning

Module 12. Machine Learning. Version 2 CSE IIT, Kharagpur

Lecture 1: Machine Learning Basics

Lecture 1: Basic Concepts of Machine Learning

Introduction to Ensemble Learning Featuring Successes in the Netflix Prize Competition

Lecture 10: Reinforcement Learning

(Sub)Gradient Descent

Laboratorio di Intelligenza Artificiale e Robotica

Axiom 2013 Team Description Paper

Artificial Neural Networks written examination

Course Outline. Course Grading. Where to go for help. Academic Integrity. EE-589 Introduction to Neural Networks NN 1 EE

QuickStroke: An Incremental On-line Chinese Handwriting Recognition System

A Neural Network GUI Tested on Text-To-Phoneme Mapping

System Implementation for SemEval-2017 Task 4 Subtask A Based on Interpolated Deep Neural Networks

Deep search. Enhancing a search bar using machine learning. Ilgün Ilgün & Cedric Reichenbach

Machine Learning and Data Mining. Ensembles of Learners. Prof. Alexander Ihler

Evolutive Neural Net Fuzzy Filtering: Basic Description

CS Machine Learning

Reinforcement Learning by Comparing Immediate Reward

CSL465/603 - Machine Learning

Exploration. CS : Deep Reinforcement Learning Sergey Levine

Word Segmentation of Off-line Handwritten Documents

Probabilistic Latent Semantic Analysis

Unsupervised Learning of Word Semantic Embedding using the Deep Structured Semantic Model

Laboratorio di Intelligenza Artificiale e Robotica

Reducing Features to Improve Bug Prediction

Switchboard Language Model Improvement with Conversational Data from Gigaword

Calibration of Confidence Measures in Speech Recognition

OCR for Arabic using SIFT Descriptors With Online Failure Prediction

Analysis of Emotion Recognition System through Speech Signal Using KNN & GMM Classifier

Model Ensemble for Click Prediction in Bing Search Ads

Generative models and adversarial training

The 9 th International Scientific Conference elearning and software for Education Bucharest, April 25-26, / X

INPE São José dos Campos

Learning Structural Correspondences Across Different Linguistic Domains with Synchronous Neural Language Models

Analysis of Hybrid Soft and Hard Computing Techniques for Forex Monitoring Systems

Notes on The Sciences of the Artificial Adapted from a shorter document written for course (Deciding What to Design) 1

Human Emotion Recognition From Speech

HIERARCHICAL DEEP LEARNING ARCHITECTURE FOR 10K OBJECTS CLASSIFICATION

Speech Recognition at ICSI: Broadcast News and beyond

OPTIMIZATINON OF TRAINING SETS FOR HEBBIAN-LEARNING- BASED CLASSIFIERS

A Case Study: News Classification Based on Term Frequency

Truth Inference in Crowdsourcing: Is the Problem Solved?

Speech Emotion Recognition Using Support Vector Machine

Iterative Cross-Training: An Algorithm for Learning from Unlabeled Web Pages

Rule Learning With Negation: Issues Regarding Effectiveness

Machine Learning from Garden Path Sentences: The Application of Computational Linguistics

arxiv: v1 [cs.lg] 15 Jun 2015

A New Perspective on Combining GMM and DNN Frameworks for Speaker Adaptation

ADVANCED MACHINE LEARNING WITH PYTHON BY JOHN HEARTY DOWNLOAD EBOOK : ADVANCED MACHINE LEARNING WITH PYTHON BY JOHN HEARTY PDF

Seminar - Organic Computing

The Good Judgment Project: A large scale test of different methods of combining expert predictions

Semi-supervised methods of text processing, and an application to medical concept extraction. Yacine Jernite Text-as-Data series September 17.

A Simple VQA Model with a Few Tricks and Image Features from Bottom-up Attention

Autoregressive product of multi-frame predictions can improve the accuracy of hybrid models

Learning Optimal Dialogue Strategies: A Case Study of a Spoken Dialogue Agent for

An OO Framework for building Intelligence and Learning properties in Software Agents

Learning Methods in Multilingual Speech Recognition

Test Effort Estimation Using Neural Network

On-Line Data Analytics

Universidade do Minho Escola de Engenharia

IAT 888: Metacreation Machines endowed with creative behavior. Philippe Pasquier Office 565 (floor 14)

Georgetown University at TREC 2017 Dynamic Domain Track

Intelligent Agents. Chapter 2. Chapter 2 1

SARDNET: A Self-Organizing Feature Map for Sequences

Australian Journal of Basic and Applied Sciences

Active Learning. Yingyu Liang Computer Sciences 760 Fall

arxiv: v2 [cs.cv] 30 Mar 2017

Lahore University of Management Sciences. FINN 321 Econometrics Fall Semester 2017

TD(λ) and Q-Learning Based Ludo Players

AQUA: An Ontology-Driven Question Answering System

A Reinforcement Learning Variant for Control Scheduling

Learning Methods for Fuzzy Systems

COMPUTER-ASSISTED INDEPENDENT STUDY IN MULTIVARIATE CALCULUS

Assignment 1: Predicting Amazon Review Ratings

Semi-Supervised GMM and DNN Acoustic Model Training with Multi-system Combination and Confidence Re-calibration

Abstractions and the Brain

Proposal of Pattern Recognition as a necessary and sufficient principle to Cognitive Science

On the Combined Behavior of Autonomous Resource Management Agents

A study of speaker adaptation for DNN-based speech synthesis

EECS 571 PRINCIPLES OF REAL-TIME COMPUTING Fall 10. Instructor: Kang G. Shin, 4605 CSE, ;

Learning From the Past with Experiment Databases

Knowledge Transfer in Deep Convolutional Neural Nets

Lecture 6: Applications

Class-Discriminative Weighted Distortion Measure for VQ-Based Speaker Identification

Softprop: Softmax Neural Network Backpropagation Learning

Using Web Searches on Important Words to Create Background Sets for LSI Classification

Artificial Neural Networks

Learning goal-oriented strategies in problem solving

MYCIN. The MYCIN Task

Comparison of EM and Two-Step Cluster Method for Mixed Data: An Application

AUTOMATIC DETECTION OF PROLONGED FRICATIVE PHONEMES WITH THE HIDDEN MARKOV MODELS APPROACH 1. INTRODUCTION

CS 446: Machine Learning

Using focal point learning to improve human machine tacit coordination

Interaction Design Considerations for an Aircraft Carrier Deck Agent-based Simulation

Rover Races Grades: 3-5 Prep Time: ~45 Minutes Lesson Time: ~105 minutes

WHEN THERE IS A mismatch between the acoustic

Data Fusion Through Statistical Matching

Learning to Schedule Straight-Line Code

Linking Task: Identifying authors and book titles in verbose queries

Transcription:

Introduction to Machine Learning Stephen Scott, Dept of CSE

What is Machine Learning? Building machines that automatically learn from experience Sub-area of artificial intelligence (Very) small sampling of applications: Detection of fraudulent credit card transactions Filtering spam email Autonomous vehicles driving on public highways Self-customizing programs: Web browser that learns what you like/where you are and adjusts; autocorrect Applications we can t program by hand: E.g., speech recognition You ve used it today already J

What is Learning? Many different answers, depending on the field you re considering and whom you ask Artificial intelligence vs. psychology vs. education vs. neurobiology vs.

Does Memorization = Learning? Test #1: Thomas learns his mother s face Sees: But will he recognize:

Thus he can generalize beyond what he s seen!

Does Memorization = Learning? (cont d) Test #2: Nicholas learns about trucks Sees: But will he recognize others?

So learning involves ability to generalize from labeled examples In contrast, memorization is trivial, especially for a computer

What is Machine Learning? (cont d) When do we use machine learning? Human expertise does not exist (navigating on Mars) Humans are unable to explain their expertise (speech recognition; face recognition; driving) Solution changes in time (routing on a computer network; browsing history; driving) Solution needs to be adapted to particular cases (biometrics; speech recognition; spam filtering) In short, when one needs to generalize from experience in a non-obvious way

What is Machine Learning? (cont d) When do we not use machine learning? Calculating payroll Sorting a list of words Web server Word processing Monitoring CPU usage Querying a database When we can definitively specify how all cases should be handled

More Formal Definition From Tom Mitchell s 1997 textbook: A computer program is said to learn from experience E with respect to some class of tasks T and performance measure P if its performance at tasks in T, as measured by P, improves with experience E. Wide variations of how T, P, and E manifest

One Type of Task T: Classification Given several labeled examples of a concept E.g., trucks vs. non-trucks (binary); height (real) This is the experience E Examples are described by features E.g., number-of-wheels (int), relative-height (height divided by width), hauls-cargo (yes/no) A machine learning algorithm uses these examples to create a hypothesis (or model) that will predict the label of new (previously unseen) examples

Classification (cont d) Labeled Training Data (labeled examples w/features) Unlabeled Data (unlabeled exs) Machine Learning Algorithm Hypothesis Hypotheses can take on many forms Predicted Labels

Hypothesis Type: Decision Tree Very easy to comprehend by humans Compactly represents if-then rules no hauls-cargo yes non-truck truck 1 4 relative-height num-of-wheels < 1 non-truck < 4 non-truck

Learning Decision Trees While not done Choose an attribute A to test at the current node Choice based on which one results in most homogenous (in label) subsets of training data Often measured based on entropy or Gini index Partition the training set into subsets based on value of A Recursively process subsets to choose attributes for subtrees

Hypothesis Type: k-nearest Neighbor Compare new (unlabeled) example x q with training examples Find k training examples most similar to x q Predict label as majority vote non-truck

Hypothesis Type: Artificial Neural Network Designed to simulate brains Neurons (processing units) communicate via connections, each with a numeric weight Learning comes from adjusting the weights non-truck

Artificial Neural Networks (cont d) ANNs are basis of deep learning Deep refers to depth of the architecture More layers => more processing of inputs Each input to a node is multiplied by a weight Weighted sum S sent through activation function: Rectified linear: max(0, S) Convolutional + pooling: Weights represent a (e.g.) 3x3 convolutional kernel to identify features in (e.g.) images that are translation invariant Sigmoid: tanh(s) or 1/(1+exp(-S)) Often trained via stochastic gradient descent

Other Hypothesis Types for Classification Support vector machines A major variation on artificial neural networks Bagging and boosting Performance enhancers for learning algorithms via re-sampling training data Bayesian methods Build probabilistic models of the data Many more non-truck Variations on the task T: regression (realvalued labels) and predicting probabilities

Example Performance Measures P Let X be a set of labeled instances Classification error: number of instances of X hypothesis h predicts correctly, divided by X Squared error: Sum (y i - h(x i )) 2 over all x i If labels from {0,1}, same as classification error Useful when labels are real-valued Cross-entropy: Sum over all x i from X: y i ln h(x i ) + (1 y i ) ln (1 - h(x i )) Generalizes to > 2 classes Effective when h predicts probabilities

Another Type of Task T: Unsupervised Learning E is now a set of unlabeled examples Examples are still described by features Still want to infer a model of the data, but instead of predicting labels, want to understand its structure E.g., clustering, density estimation, feature extraction

k-means Clustering Randomly choose k cluster centers m 1,,m k Assign each instance x in X to its nearest center While not done Re-compute m i to be the center of cluster i Re-assign each x to cluster of its nearest center

k-means Clustering Example k means: Initial After 1 iteration 20 20 10 10 x 2 0 10 x 2 0 10 20 20 30 40 20 0 20 40 x 1 30 40 20 0 20 40 x 1 20 After 2 iterations 20 After 3 iterations 10 10 0 0 x 2 x 2 10 10 20 20 30 40 20 0 20 40 x 1 30 40 20 0 20 40 x 1

Hierarchical Clustering Group instances into clusters, then group clusters into larger ones As with k-means, requires a similarity measure E.g., Euclidean distance, Manhattan distance, dot product, biological sequence similarity Common in bioinformatics

Feature Extraction via Autoencoding Can train an ANN with unlabeled data Goal: have output x match input x Results in embedding z of input x Can pre-train network to identify features Later, replace decoder with classifier Semisupervised learning

Another Type of Task T: Reinforcement Learning An agent A interacts with its environment At each step, A perceives the state s of its environment and takes action a Action a results in some reward r and changes state to s Markov decision process (MDP) Goal is to maximize expected long-term reward Applications: Backgammon, Go, video games, selfdriving cars

Reinforcement Learning (cont d) RL differs from previous tasks in that the feedback (reward) is typically delayed Often takes several actions before reward received E.g., no reward in checkers until game ends Need to decide how much each action contributed to final reward Credit assignment problem Also, limited sensing ability makes distinct states look the same Partially observable MDP (POMDP)

Q-Learning Popular RL algorithm estimates the value of taking action a in state s Q(s, a) is total reward from taking action a in state s and acting optimally from then on If we know this, then in state s we can choose the action a that maximizes Q Algorithm to learn Q: Each iteration, observe s, take action a, receive immediate reward r, observe new state s Update Q(s, a) = r + g max a Q(s, a ) (g is a parameter)

Q-Learning (cont d) Q-Learning algorithm guaranteed to converge to correct values if every state visited infinitely often Good to know, but not very practical Cannot expect to even visit every state once Chess has about 10 45 states, Go has 10 170 Need to generalize beyond states already seen Where have we heard this before? Machine learning (esp. ANNs) very effective in learning Q functions (or other value functions) Deep learning very effective recently: Atari games and Go at better-than-human levels

Yet Other Variations Missing attributes Must some how estimate values or tolerate them Sequential data, e.g., genomic sequences, speech Hidden Markov models Recurrent neural networks Have much unlabeled data and/or missing attributes, but can purchase some labels/attributes for a price Active learning approaches try to minimize cost Outlier detection non-truck E.g., intrusion detection in computer systems

Issue: Model Complexity In classification and regression, possible to find hypothesis that perfectly classifies all training data But should we necessarily use it?

Model Complexity (cont d) Label: Football player? èto generalize well, need to balance training accuracy with simplicity

Relevant Disciplines Artificial intelligence: Learning as a search problem, using prior knowledge to guide learning Probability theory: computing probabilities of hypotheses Computational complexity theory: Bounds on inherent complexity of learning Control theory: Learning to control processes to optimize performance measures Philosophy: Occam s razor (everything else being equal, simplest explanation is best) Psychology and neurobiology: Practice improves performance, biological justification for artificial neural networks Statistics: Estimating generalization performance

Conclusions Idea of intelligent machines has been around a long time Early on was primarily academic interest Past few decades, improvements in processing power plus very large data sets allows highly sophisticated (and successful!) approaches Prevalent in modern society You ve probably used it several times today No single best approach for any problem Depends on requirements, type of data, volume of data

Example References Hands-On Machine Learning with Scikit-Learn & TensorFlow by Aurelien Geron, O Reilly, 2017, ISBN 9781491962299 Good introduction for the practitioner To be used in CSCE 496/896 Spring 2018 Introduction to Machine Learning, Third Edition by Ethem Alpaydin, MIT Press, 2014, ISBN 9780262028189 Used in CSCE 478/878 Machine Learning, by Tom Mitchell. McGraw-Hill, 1997, ISBN 0070428077 Excellent, highly accessible introduction, but dated

Thank you! Any questions? Any ideas?