MACHINE LEARNING. Slide adapted from learning from data book and course, and Berkeley cs188 by Dan Klein, and Pieter Abbeel

Similar documents
Lecture 1: Machine Learning Basics

Introduction to Ensemble Learning Featuring Successes in the Netflix Prize Competition

Python Machine Learning

CS Machine Learning

Speech Recognition at ICSI: Broadcast News and beyond

Assignment 1: Predicting Amazon Review Ratings

A Case Study: News Classification Based on Term Frequency

Exploration. CS : Deep Reinforcement Learning Sergey Levine

(Sub)Gradient Descent

Rule Learning With Negation: Issues Regarding Effectiveness

Active Learning. Yingyu Liang Computer Sciences 760 Fall

Generative models and adversarial training

The 9 th International Scientific Conference elearning and software for Education Bucharest, April 25-26, / X

Probabilistic Latent Semantic Analysis

CS 446: Machine Learning

Rule Learning with Negation: Issues Regarding Effectiveness

Module 12. Machine Learning. Version 2 CSE IIT, Kharagpur

Twitter Sentiment Classification on Sanders Data using Hybrid Approach

CSL465/603 - Machine Learning

Switchboard Language Model Improvement with Conversational Data from Gigaword

Machine Learning and Data Mining. Ensembles of Learners. Prof. Alexander Ihler

Iterative Cross-Training: An Algorithm for Learning from Unlabeled Web Pages

AUTOMATED TROUBLESHOOTING OF MOBILE NETWORKS USING BAYESIAN NETWORKS

Learning From the Past with Experiment Databases

Universidade do Minho Escola de Engenharia

OCR for Arabic using SIFT Descriptors With Online Failure Prediction

Probability and Statistics Curriculum Pacing Guide

Semi-Supervised Face Detection

Version Space. Term 2012/2013 LSI - FIB. Javier Béjar cbea (LSI - FIB) Version Space Term 2012/ / 18

STA 225: Introductory Statistics (CT)

OPTIMIZATINON OF TRAINING SETS FOR HEBBIAN-LEARNING- BASED CLASSIFIERS

Speech Emotion Recognition Using Support Vector Machine

Artificial Neural Networks written examination

Semi-supervised methods of text processing, and an application to medical concept extraction. Yacine Jernite Text-as-Data series September 17.

Chapter 2 Rule Learning in a Nutshell

Corrective Feedback and Persistent Learning for Information Extraction

Analysis of Emotion Recognition System through Speech Signal Using KNN & GMM Classifier

Lahore University of Management Sciences. FINN 321 Econometrics Fall Semester 2017

Multivariate k-nearest Neighbor Regression for Time Series data -

The Good Judgment Project: A large scale test of different methods of combining expert predictions

Using focal point learning to improve human machine tacit coordination

Grade 6: Correlated to AGS Basic Math Skills

CLASSIFICATION OF TEXT DOCUMENTS USING INTEGER REPRESENTATION AND REGRESSION: AN INTEGRATED APPROACH

A New Perspective on Combining GMM and DNN Frameworks for Speaker Adaptation

Machine Learning and Development Policy

Lecture 1: Basic Concepts of Machine Learning

Chinese Language Parsing with Maximum-Entropy-Inspired Parser

Discriminative Learning of Beam-Search Heuristics for Planning

Using Web Searches on Important Words to Create Background Sets for LSI Classification

Indian Institute of Technology, Kanpur

A study of speaker adaptation for DNN-based speech synthesis

Experiments with SMS Translation and Stochastic Gradient Descent in Spanish Text Author Profiling

System Implementation for SemEval-2017 Task 4 Subtask A Based on Interpolated Deep Neural Networks

Using dialogue context to improve parsing performance in dialogue systems

Software Maintenance

Analysis of Enzyme Kinetic Data

Why Did My Detector Do That?!

Australian Journal of Basic and Applied Sciences

Softprop: Softmax Neural Network Backpropagation Learning

Calibration of Confidence Measures in Speech Recognition

Learning Methods for Fuzzy Systems

Mathematics process categories

Laboratorio di Intelligenza Artificiale e Robotica

A Model to Predict 24-Hour Urinary Creatinine Level Using Repeated Measurements

Content-based Image Retrieval Using Image Regions as Query Examples

On-Line Data Analytics

INPE São José dos Campos

Course Outline. Course Grading. Where to go for help. Academic Integrity. EE-589 Introduction to Neural Networks NN 1 EE

AUTOMATIC DETECTION OF PROLONGED FRICATIVE PHONEMES WITH THE HIDDEN MARKOV MODELS APPROACH 1. INTRODUCTION

Comment-based Multi-View Clustering of Web 2.0 Items

Linking Task: Identifying authors and book titles in verbose queries

WHEN THERE IS A mismatch between the acoustic

Measurement. When Smaller Is Better. Activity:

arxiv: v1 [cs.lg] 3 May 2013

Maximizing Learning Through Course Alignment and Experience with Different Types of Knowledge

A student diagnosing and evaluation system for laboratory-based academic exercises

On-the-Fly Customization of Automated Essay Scoring

Likelihood-Maximizing Beamforming for Robust Hands-Free Speech Recognition

An investigation of imitation learning algorithms for structured prediction

CHAPTER 4: REIMBURSEMENT STRATEGIES 24

Communities in Networks. Peter J. Mucha, UNC Chapel Hill

Learning Lesson Study Course

ScienceDirect. A Framework for Clustering Cardiac Patient s Records Using Unsupervised Learning Techniques

CS4491/CS 7265 BIG DATA ANALYTICS INTRODUCTION TO THE COURSE. Mingon Kang, PhD Computer Science, Kennesaw State University

Experiment Databases: Towards an Improved Experimental Methodology in Machine Learning

SINGLE DOCUMENT AUTOMATIC TEXT SUMMARIZATION USING TERM FREQUENCY-INVERSE DOCUMENT FREQUENCY (TF-IDF)

Statewide Framework Document for:

K 1 2 K 1 2. Iron Mountain Public Schools Standards (modified METS) Checklist by Grade Level Page 1 of 11

12- A whirlwind tour of statistics

Speech Segmentation Using Probabilistic Phonetic Feature Hierarchy and Support Vector Machines

Using the Attribute Hierarchy Method to Make Diagnostic Inferences about Examinees Cognitive Skills in Algebra on the SAT

Large-Scale Web Page Classification. Sathi T Marath. Submitted in partial fulfilment of the requirements. for the degree of Doctor of Philosophy

Human Emotion Recognition From Speech

Welcome to the session on ACCUPLACER Policy Development. This session will touch upon common policy decisions an institution may encounter during the

Beyond the Pipeline: Discrete Optimization in NLP

Consultation skills teaching in primary care TEACHING CONSULTING SKILLS * * * * INTRODUCTION

An Empirical and Computational Test of Linguistic Relativity

Unsupervised Learning of Word Semantic Embedding using the Deep Structured Semantic Model

Numeracy Medium term plan: Summer Term Level 2C/2B Year 2 Level 2A/3C

Dublin City Schools Mathematics Graded Course of Study GRADE 4

Transcription:

MACHINE LEARNING Slide adapted from learning from data book and course, and Berkeley cs188 by Dan Klein, and Pieter Abbeel

Machine Learning?? Learning from data Tasks: Prediction Classification Recognition Focus on Supervised Learning only Classification: Naïve Bayes Regression: Linear Regression

Example: Digit Recognition Input: images/ pixel grids Output: a digit 0-9 Setup: Get a large collection of example images, each label with a digit Note: someone has to hand label all this data Want to learn to predict labels of new, future digit images

Other classification Tasks Classification: given inputs x, predict labels (classes) y Examples: Spam detection (input: document/email, classes: spam or not) Medical diagnosis (input: symptoms, classes: diseases) Automatic essay grading (input: document, classes: grades) Movie rating (input: a movie, classes: rating) Credit Approval (input: user profile, classes: accept/reject) many more

The essence of machine learning The essence of machine learning: A pattern exists We cannot pin it down mathematically We have data on it A pattern exists. We don t know it. We have data to learn it. Learning from data to get an information that can make prediction

Credit Approval Classification Applicant information: Approve credit? Age Gender Annual salary Years in residence Years in job Current debt 23 years male $30,000 1 year 1 year $15,000

Credit Approval Classification There is no credit approval formula Banks have a lots of data Customer information: checking status, employment, etc. Whether or not they defaulted on their credit (good or bad).

Components of learning Formalization: Input: x (customer application) Output: y (good/bad customer?) Target function: (ideal credit approval formula) Data: (x1, y1), (x2, y2),, (xn, yn) (historical records) Hypothesis: (formula/classifier to be used)

Unknown Target Function ( Ideal credit approval function ) Training Examples (x1, y1),, (xn, yn) (historical records of credit customer) Learning Algorithm A Hypothesis Set (set of candidate formulas) Final Hypothesis (final credit approval formula)

Unknown Target Function Solution Components ( Ideal credit approval function ) Training Examples (x1, y1),, (xn, yn) (historical records of credit customer) Learning Algorithm A Hypothesis Set (set of candidate formulas) Final Hypothesis (final credit approval formula)

Unknown Target Function Unknown Input Distribution x1,x2,, xn Training Examples (x1, y1),, (xn, yn) ERROR MEASURE Learning Algorithm A Final Hypothesis Hypothesis Set The general supervised learning problem

Model-Based Classification Model-Based approach Build a model (e.g. Bayes net) where both the label and features are random variables Instantiate any observed features Query for the distribution of the label conditioned on the features Challenges (solution components) How to answer the query How should we learn its parameters? What structure should the BN have?

Naïve Bayes for Digits Naïve Bayes: Assume all features are independent effects of the label In other word: features are conditional independent given the class/label Simple digit recognition version: One feature (variable) Fij for each grid position <i,j> Feature vales are on/off, based on whether intensity is more or less than 0.5 in underlying image Each input maps to feature vector, e.g. -> < F0,0 = 0, F0,1 =0,, F15,15 =0> Naïve Bayes model: Y F1 F2 Fn

General Naïve Bayes A general Naïve Bayes Model: Y Y parameters Y x F n values Y x F n values F1 We only have to specify how each feature depends on the class Total number of parameters is linear in n Model is very simplistic, but often work anyway. F2 Fn

Inference for Naïve Bayes Goal: compute posterior distribution over label variable Y Step 1: get joint probability of label and evidence for each label + Step 2: sum to get probability of evidence Step 3: normalize by dividing Step 1 by Step 2

General Naïve Bayes What do we need in order to use Naïve Bayes? Inference method (we just saw this part) Start with a bunch of probabilities: P(Y) and the P(Fi Y) tables Use standard inference to compute P(Y F1 Fn) Nothing new here Estimates of local conditional probability tables P(Y), the prior over labels P(Fi Y) for each feature (evidence variable) These probabilities are collectively called the parameters of the model and denoted by Up until now, we assumed these appeared by magic, but they typically come from training data counts

Example: Conditional Probabilities 1 0.1 1 0.01 1 0.05 2 0.1 2 0.05 2 0.01 3 0.1 3 0.05 3 0.90 4 0.1 4 0.30 4 0.80 5 0.1 5 0.80 5 0.90 6 0.1 6 0.90 6 0.90 7 0.1 7 0.05 7 0.25 8 0.1 8 0.60 8 0.85 9 0.1 9 0.50 9 0.60 0 0.1 0 0.80 0 0.80

Parameter Estimation Estimating the distribution of a random variable (CPTs) Elicitation: ask a human (why is this hard?) Empirically: use training data (learning!) E.g.: for each outcome x, look at the empirical rate of that value: r This is the estimate that maximizes the likelihood of the data Relative frequencies are the maximum likelihood estimate r b

Unseen Events and Laplace Smoothing What happen if you ve never seen an event or feature for a given class? Laplace s estimate: Pretend you saw every outcome once more than you actually did r X = #class r b

Summary Bayes rule lets us do diagnostic queries with causal probabilities The naïve Bayes assumption takes all features to be independent given the class label We can build classifiers out of a naïve Bayes model using training data Smoothing estimates is important in real systems

Input representation and features raw input x = < F0,0 = 0, F0,1 =0,, F15,15 =0> raw input x = (x0, x1, x2,, x256) Features: Extract useful information, e.g., Before: Feature vales are on/off, based on whether intensity is more or less than 0.5 in underlying image Intensity and symmetry x = (x0, x1, x2)

Illustration of features

Linear Regression

Credit Approval Again Classification: Credit Approval (yes/no) Regression: Credit line (dollar amount) Input x = Age 23 years Annual salary $30,000 Years in job 1 year Current depth $15,000 Idea: Assign weight to each attribute/feature based on how important it is. Linear regression output:

How to measure the error How well does approximate? In classification, count the number of misclassified. In linear regression, we use squared error In-sample error: 2

Illustration of linear regression

The expression for Ein

Minimizing Ein

The linear regression algorithm

Linear regression for classification

Linear regression boundary

Overfitting Happen when a classifier fits the training data too tightly and results in a lot of error when try to predict outside data. In other word, fitting the data more than is warranted. Overfitting is a general problem because There are noises in data. Try to fit noises is not a good idea The true model (f) is very complex and our training data cannot really represent it well.

Training and Testing Divided data set into two sets: Training set Test set (sometime there will be one more set called Held out set for tuning parameters Experimentation cycle Learning parameters (e.g. model probabilities or weights) on training set Compute accuracy of test set Very important: never peek at the test set and never let test set influence your learning. Evaluation Accuracy or Error from the training set (out-of-sample error)

Resource: Learning from data http://work.caltech.edu/telecourse.html Andrew Ng Machine Learning https://www.coursera.org/learn/machine-learning https://www.youtube.com/watch?v=uzxylbk2c7e&list=pla89dcfa6adace599 In-depth introduction to machine learning in 15 hours of expert videos https://www.r-bloggers.com/in-depth-introduction-to-machine-learning-in-15-hours-of-exper t-videos/ Python ML library: http://scikit-learn.org/stable/ WekaMOOC : https://weka.waikato.ac.nz/explorer