Machine Learning: Summary

Similar documents
Introduction to Ensemble Learning Featuring Successes in the Netflix Prize Competition

Python Machine Learning

Lecture 1: Machine Learning Basics

Artificial Neural Networks written examination

Module 12. Machine Learning. Version 2 CSE IIT, Kharagpur

Machine Learning and Data Mining. Ensembles of Learners. Prof. Alexander Ihler

Learning From the Past with Experiment Databases

Exploration. CS : Deep Reinforcement Learning Sergey Levine

CSL465/603 - Machine Learning

CS Machine Learning

Universidade do Minho Escola de Engenharia

(Sub)Gradient Descent

Lecture 1: Basic Concepts of Machine Learning

Laboratorio di Intelligenza Artificiale e Robotica

Reducing Features to Improve Bug Prediction

Assignment 1: Predicting Amazon Review Ratings

Course Outline. Course Grading. Where to go for help. Academic Integrity. EE-589 Introduction to Neural Networks NN 1 EE

Generative models and adversarial training

Data Fusion Through Statistical Matching

arxiv: v1 [cs.lg] 15 Jun 2015

The 9 th International Scientific Conference elearning and software for Education Bucharest, April 25-26, / X

A Case Study: News Classification Based on Term Frequency

Axiom 2013 Team Description Paper

Lecture 10: Reinforcement Learning

Active Learning. Yingyu Liang Computer Sciences 760 Fall

Softprop: Softmax Neural Network Backpropagation Learning

QuickStroke: An Incremental On-line Chinese Handwriting Recognition System

INPE São José dos Campos

Switchboard Language Model Improvement with Conversational Data from Gigaword

Twitter Sentiment Classification on Sanders Data using Hybrid Approach

Issues in the Mining of Heart Failure Datasets

Laboratorio di Intelligenza Artificiale e Robotica

A survey of multi-view machine learning

Model Ensemble for Click Prediction in Bing Search Ads

Speech Emotion Recognition Using Support Vector Machine

Analysis of Hybrid Soft and Hard Computing Techniques for Forex Monitoring Systems

A Reinforcement Learning Variant for Control Scheduling

An OO Framework for building Intelligence and Learning properties in Software Agents

Machine Learning from Garden Path Sentences: The Application of Computational Linguistics

Human Emotion Recognition From Speech

ISFA2008U_120 A SCHEDULING REINFORCEMENT LEARNING ALGORITHM

Business Analytics and Information Tech COURSE NUMBER: 33:136:494 COURSE TITLE: Data Mining and Business Intelligence

Purdue Data Summit Communication of Big Data Analytics. New SAT Predictive Validity Case Study

Iterative Cross-Training: An Algorithm for Learning from Unlabeled Web Pages

Semi-supervised methods of text processing, and an application to medical concept extraction. Yacine Jernite Text-as-Data series September 17.

Word Segmentation of Off-line Handwritten Documents

Unsupervised Learning of Word Semantic Embedding using the Deep Structured Semantic Model

A Neural Network GUI Tested on Text-To-Phoneme Mapping

Analysis of Emotion Recognition System through Speech Signal Using KNN & GMM Classifier

Netpix: A Method of Feature Selection Leading. to Accurate Sentiment-Based Classification Models

Rule Learning With Negation: Issues Regarding Effectiveness

Probabilistic Latent Semantic Analysis

Design Of An Automatic Speaker Recognition System Using MFCC, Vector Quantization And LBG Algorithm

arxiv: v2 [cs.cv] 30 Mar 2017

Rule Learning with Negation: Issues Regarding Effectiveness

Improving Fairness in Memory Scheduling

Word learning as Bayesian inference

Semi-Supervised Face Detection

Calibration of Confidence Measures in Speech Recognition

Seminar - Organic Computing

A Bootstrapping Model of Frequency and Context Effects in Word Learning

Learning Optimal Dialogue Strategies: A Case Study of a Spoken Dialogue Agent for

Predicting Student Attrition in MOOCs using Sentiment Analysis and Neural Networks

CS 446: Machine Learning

Time series prediction

AMULTIAGENT system [1] can be defined as a group of

A Vector Space Approach for Aspect-Based Sentiment Analysis

ScienceDirect. A Framework for Clustering Cardiac Patient s Records Using Unsupervised Learning Techniques

A Decision Tree Analysis of the Transfer Student Emma Gunu, MS Research Analyst Robert M Roe, PhD Executive Director of Institutional Research and

UNIVERSITY OF CALIFORNIA SANTA CRUZ TOWARDS A UNIVERSAL PARAMETRIC PLAYER MODEL

Chinese Language Parsing with Maximum-Entropy-Inspired Parser

Georgetown University at TREC 2017 Dynamic Domain Track

Learning Methods in Multilingual Speech Recognition

HIERARCHICAL DEEP LEARNING ARCHITECTURE FOR 10K OBJECTS CLASSIFICATION

Robot Learning Simultaneously a Task and How to Interpret Human Instructions

Learning Structural Correspondences Across Different Linguistic Domains with Synchronous Neural Language Models

Learning Methods for Fuzzy Systems

FRAMEWORK FOR IDENTIFYING THE MOST LIKELY SUCCESSFUL UNDERPRIVILEGED TERTIARY STUDY BURSARY APPLICANTS

OPTIMIZATINON OF TRAINING SETS FOR HEBBIAN-LEARNING- BASED CLASSIFIERS

Content-based Image Retrieval Using Image Regions as Query Examples

Reinforcement Learning by Comparing Immediate Reward

Multivariate k-nearest Neighbor Regression for Time Series data -

A Pipelined Approach for Iterative Software Process Model

Beyond the Pipeline: Discrete Optimization in NLP

Deep search. Enhancing a search bar using machine learning. Ilgün Ilgün & Cedric Reichenbach

Welcome to. ECML/PKDD 2004 Community meeting

Truth Inference in Crowdsourcing: Is the Problem Solved?

Experiment Databases: Towards an Improved Experimental Methodology in Machine Learning

Discriminative Learning of Beam-Search Heuristics for Planning

Knowledge Transfer in Deep Convolutional Neural Nets

A Simple VQA Model with a Few Tricks and Image Features from Bottom-up Attention

Linking Task: Identifying authors and book titles in verbose queries

A study of speaker adaptation for DNN-based speech synthesis

TD(λ) and Q-Learning Based Ludo Players

Comparison of EM and Two-Step Cluster Method for Mixed Data: An Application

Learning Distributed Linguistic Classes

An Empirical Comparison of Supervised Ensemble Learning Approaches

Using the Attribute Hierarchy Method to Make Diagnostic Inferences about Examinees Cognitive Skills in Algebra on the SAT

Go fishing! Responsibility judgments when cooperation breaks down

have to be modeled) or isolated words. Output of the system is a grapheme-tophoneme conversion system which takes as its input the spelling of words,

Transcription:

Machine Learning: Summary Greg Grudic CSCI-4830 Machine Learning 1

What is Machine Learning? The goal of machine learning is to build computer systems that can adapt and learn from their experience. Tom Dietterich Machine Learning 2

A Generic System x 1 x 2 x N System h1, h2,..., hk y 1 y 2 y M Input Variables: Hidden Variables: Output Variables: x = h = y = ( x x x ),,..., N 1 2 ( h h h ),,..., K 1 2 ( y y y ),,..., K 1 2 Machine Learning 3

Another Definition of Machine Learning Machine Learning algorithms discover the relationships between the variables of a system (input, output and hidden) from direct samples of the system These algorithms originate form many fields: Statistics, mathematics, theoretical computer science, physics, neuroscience, etc Machine Learning 4

When are ML algorithms NOT needed? When the relationships between all system variables (input, output, and hidden) is completely understood! This is NOT the case for almost any real system! Machine Learning 5

The Sub-Fields of ML Supervised Learning Reinforcement Learning Unsupervised Learning Machine Learning 6

Supervised Learning Given: Training examples {( x ( )) ( ( )) ( ( ))} 1, f x1, x2, f x2,..., xp, f xp for some unknown function (system) y = f x ( ) Find f( x) Predict y = f( x ), where x is not in the training set Machine Learning 7

Supervised Learning Algorithms Classification y Regression { 1,..., C} y R Machine Learning 8

1-R (A Decision Tree Stump) Main Assumptions Only one attribute is necessary. Finite number of splits on the attribute. Hypothesis Space Fixed size (parametric): Limited modeling potential Machine Learning 9

Naïve Bayes Main Assumptions: All attributes are equally important. All attributes are statistically independent (given the class value) Hypothesis Space Pr Fixed Pr size x (parametric): Limited modeling 1 y Pr x2 y Pr xd y potential y x = Pr [ x] Machine Learning 10

Linear Regression Main Assumptions: Linear weighted sum of attribute values. Data is linearly separable. Attributes and target values are real valued. Hypothesis Space Fixed size (parametric) : Limited modeling potential d = i i + i= 1 y a x b Machine Learning 11

Linear Regression (Continued) Linearly Separable Not Linearly Separable Machine Learning 12

Decision Trees Main Assumption: Data effectively modeled via decision splits on attributes. Hypothesis Space Variable size (nonparametric): Can model any function Machine Learning 13

Neural Networks Main Assumption: Many simple functional units, combined in parallel, produce effective models. Hypothesis Space Variable size (nonparametric): Can model any function Machine Learning 14

Neural Networks (Continued) Machine Learning 15

Neural Networks (Continued) Learn by modifying weights in Sigmoid Unit Machine Learning 16

K Nearest Neighbor Main Assumption: An effective distance metric exists. Hypothesis Space Variable size (nonparametric): Can model any function Classify according to Nearest Neighbor Separates the input space Machine Learning 17

Bagging Main Assumption: Combining many unstable predictors to produce a ensemble (stable) predictor. Unstable Predictor: small changes in training data produce large changes in the model. e.g. Neural Nets, trees Stable: SVM, nearest Neighbor. Hypothesis Space Variable size (nonparametric): Can model any function Machine Learning 18

Bagging (continued) Each predictor in ensemble is created by taking a bootstrap sample of the data. Bootstrap sample of N instances is obtained by drawing N example at random, with replacement. On average each bootstrap sample has 63% of instances Encourages predictors to have uncorrelated errors. Machine Learning 19

Boosting Main Assumption: Combining many weak predictors (e.g. tree stumps or 1-R predictors) to produce an ensemble predictor. Hypothesis Space Variable size (nonparametric): Can model any function Machine Learning 20

Boosting (Continued) Each predictor is created by using a biased sample of the training data Instances (training examples) with high error are weighted higher than those with lower error Difficult instances get more attention Machine Learning 21

Machine Learning 22

Support Vector Machines Main Assumption: Build a model using minimal number of training instances (Support Vectors). Hypothesis Space Variable size (nonparametric): Can model any function Based on PAC (probably almost correct) learning theory: Minimize the probability that model error is greater than (small number) ε Machine Learning 23

Linear Support Vector Machines Support Vectors Machine Learning 24

Nonlinear Support Vector Machines Project into Kernel Space (Kernels constitute a distance metric in inputs space) Machine Learning 25

Competing Philosophies in Supervised Learning Goal is always to minimize the probability of model errors on future data! A single Model: Motivation - build a single good model. Models that don t adhere to Occam s razor: Minimax Probability Machine (MPM) Trees Neural Networks Nearest Neighbor Radial Basis Functions Occam s razor models: The best model is the simplest one! Support Vector Machines Bayesian Methods Other kernel based methods: Kernel Matching Pursuit Machine Learning 26

Competing Philosophies in Supervised Learning An Ensemble of Models: Motivation a good single model is difficult to compute (impossible?), so build many and combine them. Combining many uncorrelated models produces better predictors... Models that don t use randomness or use directed randomness: Boosting Specific cost function Gradient Boosting Derive a boosting algorithm for any cost function Models that incorporate randomness: Bagging Bootstrap Sample: Uniform random sampling (with replacement) Stochastic Gradient Boosting Bootstrap Sample: Uniform random sampling (with replacement) Random Forests Uniform random sampling (with replacement) Randomize inputs for splitting at tree nodes Machine Learning 27

Evaluating Models Infinite data is best, but N (N=10) Fold cross validation Create N folds or subsets from the training data (approximately equally distributed with approximately the same number of instances). Build N models, each with a different set of N-1 folds, and evaluate each model on the remaining fold Error estimate is average error over all N models Machine Learning 28

Boostrap Estimate Machine Learning 29

Reinforcement Learning (RL) Autonomous agent learns to act optimally without human intervention Agent learns by stochastically interacting with its environment, getting infrequent rewards Goal: maximize infrequent reward Machine Learning 30

Q Learning Machine Learning 31

Agent s Learning Task Machine Learning 32

Unsupervised Learning Studies how input patterns can be represented to reflect the statistical structure of the overall collection of input patterns No outputs are used (unlike supervised learning and reinforcement learning) unsupervised learner brings to bear prior biases as to what aspects of the structure of the input should be captured in the output. Machine Learning 33

Expectation Maximization (EM) Algorithm Clustering of data K-Means Estimating unobserved or hidden variables Machine Learning 34