CS340 Machine learning Lecture 2

Similar documents
Lecture 1: Machine Learning Basics

Python Machine Learning

Module 12. Machine Learning. Version 2 CSE IIT, Kharagpur

(Sub)Gradient Descent

CSL465/603 - Machine Learning

Lecture 1: Basic Concepts of Machine Learning

Introduction to Ensemble Learning Featuring Successes in the Netflix Prize Competition

Generative models and adversarial training

Assignment 1: Predicting Amazon Review Ratings

Axiom 2013 Team Description Paper

Artificial Neural Networks written examination

Twitter Sentiment Classification on Sanders Data using Hybrid Approach

Statewide Framework Document for:

Algebra 1, Quarter 3, Unit 3.1. Line of Best Fit. Overview

Learning From the Past with Experiment Databases

Course Outline. Course Grading. Where to go for help. Academic Integrity. EE-589 Introduction to Neural Networks NN 1 EE

Human Emotion Recognition From Speech

OPTIMIZATINON OF TRAINING SETS FOR HEBBIAN-LEARNING- BASED CLASSIFIERS

Mathematics process categories

Probabilistic Latent Semantic Analysis

Business Analytics and Information Tech COURSE NUMBER: 33:136:494 COURSE TITLE: Data Mining and Business Intelligence

CS Machine Learning

AGS THE GREAT REVIEW GAME FOR PRE-ALGEBRA (CD) CORRELATED TO CALIFORNIA CONTENT STANDARDS

arxiv: v2 [cs.cv] 30 Mar 2017

Learning Disability Functional Capacity Evaluation. Dear Doctor,

Speech Segmentation Using Probabilistic Phonetic Feature Hierarchy and Support Vector Machines

TD(λ) and Q-Learning Based Ludo Players

OCR for Arabic using SIFT Descriptors With Online Failure Prediction

Machine Learning and Data Mining. Ensembles of Learners. Prof. Alexander Ihler

School of Innovative Technologies and Engineering

STA 225: Introductory Statistics (CT)

Algebra 2- Semester 2 Review

Australian Journal of Basic and Applied Sciences

Exploration. CS : Deep Reinforcement Learning Sergey Levine

WHEN THERE IS A mismatch between the acoustic

A Case Study: News Classification Based on Term Frequency

Word Segmentation of Off-line Handwritten Documents

The 9 th International Scientific Conference elearning and software for Education Bucharest, April 25-26, / X

Continual Curiosity-Driven Skill Acquisition from High-Dimensional Video Inputs for Humanoid Robots

Class-Discriminative Weighted Distortion Measure for VQ-Based Speaker Identification

Phonetic- and Speaker-Discriminant Features for Speaker Recognition. Research Project

Switchboard Language Model Improvement with Conversational Data from Gigaword

A Reinforcement Learning Variant for Control Scheduling

Issues in the Mining of Heart Failure Datasets

Speech Emotion Recognition Using Support Vector Machine

Probability and Statistics Curriculum Pacing Guide

Analysis of Emotion Recognition System through Speech Signal Using KNN & GMM Classifier

Rule Learning With Negation: Issues Regarding Effectiveness

Large-Scale Web Page Classification. Sathi T Marath. Submitted in partial fulfilment of the requirements. for the degree of Doctor of Philosophy

Learning Methods for Fuzzy Systems

Using focal point learning to improve human machine tacit coordination

The Evolution of Random Phenomena

12- A whirlwind tour of statistics

Why Did My Detector Do That?!

Speaker recognition using universal background model on YOHO database

Comparison of EM and Two-Step Cluster Method for Mixed Data: An Application

Calibration of Confidence Measures in Speech Recognition

Learning Structural Correspondences Across Different Linguistic Domains with Synchronous Neural Language Models

Laboratorio di Intelligenza Artificiale e Robotica

Robot manipulations and development of spatial imagery

Analysis of Enzyme Kinetic Data

QuickStroke: An Incremental On-line Chinese Handwriting Recognition System

Mathematics Assessment Plan

The Good Judgment Project: A large scale test of different methods of combining expert predictions

Grade 2: Using a Number Line to Order and Compare Numbers Place Value Horizontal Content Strand

Universidade do Minho Escola de Engenharia

Robust Speech Recognition using DNN-HMM Acoustic Model Combining Noise-aware training with Spectral Subtraction

Semi-supervised methods of text processing, and an application to medical concept extraction. Yacine Jernite Text-as-Data series September 17.

Machine Learning from Garden Path Sentences: The Application of Computational Linguistics

Modeling function word errors in DNN-HMM based LVCSR systems

Knowledge Transfer in Deep Convolutional Neural Nets

Likelihood-Maximizing Beamforming for Robust Hands-Free Speech Recognition

A Decision Tree Analysis of the Transfer Student Emma Gunu, MS Research Analyst Robert M Roe, PhD Executive Director of Institutional Research and

A study of speaker adaptation for DNN-based speech synthesis

ADVANCED MACHINE LEARNING WITH PYTHON BY JOHN HEARTY DOWNLOAD EBOOK : ADVANCED MACHINE LEARNING WITH PYTHON BY JOHN HEARTY PDF

Lecture 10: Reinforcement Learning

Rule Learning with Negation: Issues Regarding Effectiveness

A Vector Space Approach for Aspect-Based Sentiment Analysis

A New Perspective on Combining GMM and DNN Frameworks for Speaker Adaptation

ADVANCES IN DEEP NEURAL NETWORK APPROACHES TO SPEAKER RECOGNITION

An Online Handwriting Recognition System For Turkish

Welcome to. ECML/PKDD 2004 Community meeting

LEGO MINDSTORMS Education EV3 Coding Activities

Mathematics. Mathematics

Mathematics subject curriculum

Extracting Opinion Expressions and Their Polarities Exploration of Pipelines and Joint Models

arxiv: v1 [cs.lg] 15 Jun 2015

Mining Topic-level Opinion Influence in Microblog

EDCI 699 Statistics: Content, Process, Application COURSE SYLLABUS: SPRING 2016

Design Of An Automatic Speaker Recognition System Using MFCC, Vector Quantization And LBG Algorithm

CAAP. Content Analysis Report. Sample College. Institution Code: 9011 Institution Type: 4-Year Subgroup: none Test Date: Spring 2011

Mathematics (JUN14MS0401) General Certificate of Education Advanced Level Examination June Unit Statistics TOTAL.

Learning Human Utility from Video Demonstrations for Deductive Planning in Robotics

Learning Probabilistic Behavior Models in Real-Time Strategy Games

Math 96: Intermediate Algebra in Context

Unsupervised Learning of Word Semantic Embedding using the Deep Structured Semantic Model

Latent Knowledge Structures of Traversal Behavior in Hypertext Environment

Evaluating Interactive Visualization of Multidimensional Data Projection with Feature Transformation

Julia Smith. Effective Classroom Approaches to.

How do adults reason about their opponent? Typologies of players in a turn-taking game

Transcription:

CS340 Machine learning Lecture 2

What is machine learning? ``Learning denotes changes in the system that are adaptive in the sense that they enable the system to do the task or tasks drawn from the same population more efficiently and more effectively the next time.'' -- Herbert Simon Closely related to Statistics (fitting models to data and testing them) Data mining/ exploratory data analysis (discovering models) Adaptive control theory AI (building intelligent machines by hand)

Types of machine learning Supervised Learning Classification (pattern recognition) Regression Unsupervised Learning Reinforcement Learning

Classification Example: Credit scoring Differentiating between low-risk and high-risk customers from their income and savings Discriminant: IF income > θ 1 AND savings > θ 2 THEN low-risk ELSE high-risk Input data is two dimensional, output is binary}

Classification p features (attributes) Training set: X: n p y: n 1 n cases Color Shape Size Blue Square Small Red Ellipse Small Label Yes Yes Red Ellipse Large No Test set Blue Crescent Small? Yellow Ring Small?

Notation Alpaydin book uses x t (d-dimensional) to denote t'th training input, and r t to denote t'th training output (response), for t=1:n Bishop book uses x n (d-dimensional) for n'th input, and t n for n'th output (target), n=1:n Hastie book uses x i (p-dimensional) for i'th covariate, and y i for i'th output, i=1:n We will often omit vector notation x_i Please do not let notation obscure the ideas!

Hypothesis (decision tree) yes blue? yes oval? no big? no no yes

Decision Tree blue? yes oval?? big? no no yes

Decision Tree blue? yes oval?? big? no no yes

What's the right hypothesis?

What's the right hypothesis? Linearly separable data

How about now?

How about now? Quadratically separable data

Noisy/ mislabeled data

Overfitting Memorizes irrelevant details of training set

Underfitting Ignores essential details of training set

Larger data set

Now more complex hypothesis is ok

No free lunch theorem Unless you know something about the distribution of problems your learning algorithm will encounter, any hypothesis that agrees with all your data is as good as any other. You have to make assumptions about the underlying future. These assumptions are implicit in the choice of hypothesis space (and maybe the algorithm). Hence learning is inductive, not deductive.

Supervised learning methods Methods differ in terms of The form of hypothesis space they use The method they use to find the best hypothesis given data There are many successful approaches Neural networks Decision trees Support vector machines (SVMs) Gaussian processes Boosting etc

Handwritten digit recognition x t i\ R 16 16, y t i\ {0,...,9}

Face Recognition Training examples of a person Test images AT&T Laboratories, Cambridge UK http://www.uk.research.att.com/facedatabase.html

Linear regression Example: Price of a used car x : car attributes y : price y = g (x θ ) g ( ) model, θ = (w,w 0 ) parameters y = wx+w 0 Regression is like classification except the output is a real-valued scalar

Polynomial regression Polynomial regression is linear regression with polynomial basis functions

Piecewise linear 2D regression Now the basis functions φ(x 1,x 2 ) must be learned from data: how many pieces? where the put them? flat or curved? Much harder problem!

Regression Applications Navigating a car: Angle of the steering wheel (CMU NavLab) Kinematics of a robot arm (x,y) α 2 α 1 = g 1 (x,y) α 2 = g 2 (x,y) α 1 Response surface design

Supervised Learning: Uses Prediction of future cases: Use the rule to predict the output for future inputs Knowledge extraction: The rule is easy to understand Compression: The rule is simpler than the data it explains Outlier detection: Exceptions that are not covered by the rule, e.g., fraud

Unsupervised Learning Learning what normally happens No output Can be formalized in terms of probability density estimation Examples: clustering dimensionality reduction abnormality detection latent variable estimation

K-means clustering Desired output Input Hard labeling Soft labeling K=3 is the number of clusters, here chosen by hand

Hierarchical agglomerative clustering Greedily build a dendogram

Clustering art

Principal components analysis (PCA) Project high dimensional data into a linear subspace which captures most of the variance of the data Input Output

Image denoising with Markov random fields φ Ψ φ Ψ Ψ φ Ψ φ y x Popular in: Computer vision Language modeling Information extraction Sequence prediction Graphics Compatibility with neighbors Local evidence (compatibility with image)

People tracking X 1 X 2 X 3 Unknown player location Y 1 Y 3 Y 2 Observed video frames

Active learning: asking the right questions

Robots that ask questions and learn

Reinforcement Learning Learning a policy: A sequence of outputs No supervised output, but delayed reward Credit assignment problem: which action led to me winning the game of chess? This is covered in CS422 (AI II), not in CS340.