Course Overview and Introduction CE-717 : Machine Learning Sharif University of Technology. M. Soleymani Fall 2016

Similar documents
Module 12. Machine Learning. Version 2 CSE IIT, Kharagpur

(Sub)Gradient Descent

Lecture 1: Basic Concepts of Machine Learning

CS Machine Learning

Lecture 1: Machine Learning Basics

Laboratorio di Intelligenza Artificiale e Robotica

CSL465/603 - Machine Learning

Introduction to Ensemble Learning Featuring Successes in the Netflix Prize Competition

Business Analytics and Information Tech COURSE NUMBER: 33:136:494 COURSE TITLE: Data Mining and Business Intelligence

Axiom 2013 Team Description Paper

Python Machine Learning

Laboratorio di Intelligenza Artificiale e Robotica

A Neural Network GUI Tested on Text-To-Phoneme Mapping

Active Learning. Yingyu Liang Computer Sciences 760 Fall

Lecture 10: Reinforcement Learning

Course Outline. Course Grading. Where to go for help. Academic Integrity. EE-589 Introduction to Neural Networks NN 1 EE

The 9 th International Scientific Conference elearning and software for Education Bucharest, April 25-26, / X

Probabilistic Latent Semantic Analysis

Speech Recognition at ICSI: Broadcast News and beyond

Iterative Cross-Training: An Algorithm for Learning from Unlabeled Web Pages

OPTIMIZATINON OF TRAINING SETS FOR HEBBIAN-LEARNING- BASED CLASSIFIERS

Learning Methods for Fuzzy Systems

CS4491/CS 7265 BIG DATA ANALYTICS INTRODUCTION TO THE COURSE. Mingon Kang, PhD Computer Science, Kennesaw State University

A Case Study: News Classification Based on Term Frequency

Artificial Neural Networks written examination

Multisensor Data Fusion: From Algorithms And Architectural Design To Applications (Devices, Circuits, And Systems)

Learning Structural Correspondences Across Different Linguistic Domains with Synchronous Neural Language Models

Rule Learning With Negation: Issues Regarding Effectiveness

Deep search. Enhancing a search bar using machine learning. Ilgün Ilgün & Cedric Reichenbach

Human Emotion Recognition From Speech

AQUA: An Ontology-Driven Question Answering System

Semi-supervised methods of text processing, and an application to medical concept extraction. Yacine Jernite Text-as-Data series September 17.

Reinforcement Learning by Comparing Immediate Reward

MYCIN. The MYCIN Task

An OO Framework for building Intelligence and Learning properties in Software Agents

Machine Learning from Garden Path Sentences: The Application of Computational Linguistics

Discriminative Learning of Beam-Search Heuristics for Planning

OCR for Arabic using SIFT Descriptors With Online Failure Prediction

Word Segmentation of Off-line Handwritten Documents

Extending Place Value with Whole Numbers to 1,000,000

Twitter Sentiment Classification on Sanders Data using Hybrid Approach

Speech Emotion Recognition Using Support Vector Machine

Machine Learning and Data Mining. Ensembles of Learners. Prof. Alexander Ihler

QuickStroke: An Incremental On-line Chinese Handwriting Recognition System

CS 446: Machine Learning

CLASSIFICATION OF TEXT DOCUMENTS USING INTEGER REPRESENTATION AND REGRESSION: AN INTEGRATED APPROACH

Chinese Language Parsing with Maximum-Entropy-Inspired Parser

Switchboard Language Model Improvement with Conversational Data from Gigaword

Version Space. Term 2012/2013 LSI - FIB. Javier Béjar cbea (LSI - FIB) Version Space Term 2012/ / 18

Activities, Exercises, Assignments Copyright 2009 Cem Kaner 1

A Reinforcement Learning Variant for Control Scheduling

CAAP. Content Analysis Report. Sample College. Institution Code: 9011 Institution Type: 4-Year Subgroup: none Test Date: Spring 2011

Using focal point learning to improve human machine tacit coordination

Word learning as Bayesian inference

Student Perceptions of Reflective Learning Activities

Using Web Searches on Important Words to Create Background Sets for LSI Classification

An investigation of imitation learning algorithms for structured prediction

System Implementation for SemEval-2017 Task 4 Subtask A Based on Interpolated Deep Neural Networks

Assignment 1: Predicting Amazon Review Ratings

The Good Judgment Project: A large scale test of different methods of combining expert predictions

AUTOMATIC DETECTION OF PROLONGED FRICATIVE PHONEMES WITH THE HIDDEN MARKOV MODELS APPROACH 1. INTRODUCTION

Rule Learning with Negation: Issues Regarding Effectiveness

Data Structures and Algorithms

Linking Task: Identifying authors and book titles in verbose queries

Measurement. When Smaller Is Better. Activity:

Speeding Up Reinforcement Learning with Behavior Transfer

2017 Florence, Italty Conference Abstract

ScienceDirect. A Framework for Clustering Cardiac Patient s Records Using Unsupervised Learning Techniques

Online Updating of Word Representations for Part-of-Speech Tagging

Evolution of Symbolisation in Chimpanzees and Neural Nets

Purdue Data Summit Communication of Big Data Analytics. New SAT Predictive Validity Case Study

Lahore University of Management Sciences. FINN 321 Econometrics Fall Semester 2017

arxiv: v2 [cs.cv] 30 Mar 2017

Generative models and adversarial training

Evolutive Neural Net Fuzzy Filtering: Basic Description

SINGLE DOCUMENT AUTOMATIC TEXT SUMMARIZATION USING TERM FREQUENCY-INVERSE DOCUMENT FREQUENCY (TF-IDF)

A New Perspective on Combining GMM and DNN Frameworks for Speaker Adaptation

IAT 888: Metacreation Machines endowed with creative behavior. Philippe Pasquier Office 565 (floor 14)

Montana Content Standards for Mathematics Grade 3. Montana Content Standards for Mathematical Practices and Mathematics Content Adopted November 2011

Continual Curiosity-Driven Skill Acquisition from High-Dimensional Video Inputs for Humanoid Robots

have to be modeled) or isolated words. Output of the system is a grapheme-tophoneme conversion system which takes as its input the spelling of words,

Exploration. CS : Deep Reinforcement Learning Sergey Levine

A Comparison of Two Text Representations for Sentiment Analysis

Mathematics Success Grade 7

The Method of Immersion the Problem of Comparing Technical Objects in an Expert Shell in the Class of Artificial Intelligence Algorithms

Statistical Analysis of Climate Change, Renewable Energies, and Sustainability An Independent Investigation for Introduction to Statistics

Math-U-See Correlation with the Common Core State Standards for Mathematical Content for Third Grade

Guru: A Computer Tutor that Models Expert Human Tutors

A study of speaker adaptation for DNN-based speech synthesis

An Empirical and Computational Test of Linguistic Relativity

MULTILINGUAL INFORMATION ACCESS IN DIGITAL LIBRARY

Chapter 2 Rule Learning in a Nutshell

AGENDA LEARNING THEORIES LEARNING THEORIES. Advanced Learning Theories 2/22/2016

ReinForest: Multi-Domain Dialogue Management Using Hierarchical Policies and Knowledge Ontology

COMPUTER-AIDED DESIGN TOOLS THAT ADAPT

Evaluation of Usage Patterns for Web-based Educational Systems using Web Mining

Evaluation of Usage Patterns for Web-based Educational Systems using Web Mining

Visual CP Representation of Knowledge

Intelligent Agents. Chapter 2. Chapter 2 1

GACE Computer Science Assessment Test at a Glance

Transcription:

Course Overview and Introduction CE-717 : Machine Learning Sharif University of Technology M. Soleymani Fall 2016

Course Info Instructor: Mahdieh Soleymani Email: soleymani@sharif.edu Lectures: Sun-Tue (13:30-15:00) Website: http://ce.sharif.edu/cources/95-96/1/ce717-2 2

Text Books Pattern Recognition and Machine Learning, C. Bishop, Springer, 2006. Machine Learning,T. Mitchell, MIT Press,1998. Additional readings: will be made available when appropriate. Other books: The elements of statistical learning, T. Hastie, R. Tibshirani, J. Friedman, Second Edition, 2008. Machine Learning: A Probabilistic Perspective, K. Murphy, MIT Press, 2012. 3

Marking Scheme Midterm Exam: 25% Final Exam: 30% Project: 5-10% Homeworks (written & programming) : 20-25% Mini-exams: 15% 4

Machine Learning (ML) and Artificial Intelligence (AI) ML appears first as a branch of AI ML is now also a preferred approach to other subareas of AI ComputerVision, Speech Recognition, Robotics Natural Language Processing ML is a strong driver in ComputerVision and NLP 5

A Definition of ML Tom Mitchell (1998):Well-posed learning problem A computer program is said to learn from experience E with respect to some task T and some performance measure P, if its performance on T, as measured by P, improves with experience E. Using the observed data to make better decisions Generalizing from the observed data 6

ML Definition: Example Consider an email program that learns how to filter spam according to emails you do or do not mark as spam. T: Classifying emails as spam or not spam. E: Watching you label emails as spam or not spam. P: The number (or fraction) of emails correctly classified as spam/not spam. 7

The essence of machine learning A pattern exist We do not know it mathematically We have data on it 8

Example: Home Price Housing price prediction 400 Price ($) in 1000 s 300 200 100 0 0 500 1000 1500 2000 2500 Size in feet 2 9 Figure adopted from slides of Andrew Ng, Machine Learning course, Stanford.

Example: Bank loan Applicant form as the input: Output: approving or denying the request 10

Components of (Supervised) Learning Unknown target function: f: X Y Input space: X Output space: Y Training data: x 1, y 1, x 2, y 2,, (x N, y N ) Pick a formula g: X Y that approximates the target function f selected from a set of hypotheses H 11

Training data: Example x 2 12 x 1 Training data x 1 x 2 y 0.9 2.3 1 3.5 2.6 1 2.6 3.3 1 2.7 4.1 1 1.8 3.9 1 6.5 6.8-1 7.2 7.5-1 7.9 8.3-1 6.9 8.3-1 8.8 7.9-1 9.1 6.2-1

Components of (Supervised) Learning Learning model 13

Solution Components Learning model composed of: Learning algorithm Hypothesis set Perceptron example 14

Perceptron classifier Input x = x 1,, x d x 2 Classifier: d If i=1 w i x i > threshold then output 1 else output 1 The linear formula g H can be written: 15 g x = sign d i=1 w i x i threshold If we add a coordinate x 0 = 1 to the input: g x = sign d i=0 + w 0 w i x i Vector form x 1 g x = sign w T x

Perceptron learning algorithm: linearly separable data Give the training data x 1, y 1,, (x N, y (N) ) Misclassified data x n, y n : sign(w T x n ) y (n) Repeat Pick a misclassified data x n, y n from training data and update w: w = w + y (n) x (n) Until all training data points are correctly classified by g 16

Perceptron learning algorithm: Example of weight update x 2 x 2 17 x 1 x 1

Experience (E) in ML Basic premise of learning: Using a set of observations to uncover an underlying process We have different types of (getting) observations in different types or paradigms of ML methods 18

Paradigms of ML Supervised learning (regression, classification) predicting a target variable for which we get to see examples. Unsupervised learning revealing structure in the observed data Reinforcement learning partial (indirect) feedback, no explicit guidance Given rewards for a sequence of moves to learn a policy and utility functions Other paradigms: semi-supervised learning, active learning, online learning, etc. 19

Supervised Learning: Regression vs. Classification Supervised Learning Regression: predict a continuous target variable E.g., y [0,1] Classification: predict a discrete target variable E.g.,y {1,2,, C} 20

Data in Supervised Learning Data are usually considered as vectors in a d dimensional space Now, we make this assumption for illustrative purpose We will see it is not necessary x 1 x 2... x d y (Target) Columns: Features/attributes/dimensions Rows: Data/points/instances/examples/samples Y column: Target/outcome/response/label 21 Sample1 Sample 2 Sample n-1 Sample n

Regression: Example Housing price prediction 400 Price ($) in 1000 s 300 200 100 0 0 500 1000 1500 2000 2500 Size in feet 2 Figure adopted from slides of Andrew Ng 22

Classification: Example Weight (Cat, Dog) 1(Dog) 0(Cat) weight weight 23

Supervised Learning vs. Unsupervised Learning Supervised learning Given:Training set labeled set of N input-output pairs D = x i, y i i=1 Goal: learning a mapping from x to y N Unsupervised learning Given:Training set x i N i=1 Goal: find groups or structures in the data Discover the intrinsic structure in the data 24

Supervised Learning: Samples x 2 Classification x 1 25

Unsupervised Learning: Samples x 2 Type I Type II Clustering Type III x 1 26

Sample Data in Unsupervised Learning Unsupervised Learning: x 1 x 2... x d Sample1 Columns: Features/attributes/dimensions Rows: Data/points/instances/examples/s amples Sample 2 Sample n-1 Sample n 27

Unsupervised Learning: Example Applications Clustering docs based on their similarities Grouping new stories in the Google news site Market segmentation: group customers into different market segments given a database of customer data. Social network analysis 28

Reinforcement Provides only an indication as to whether an action is correct or not Data in supervised learning: (input, correct output) Data in Reinforcement Learning: (input, some output, a grade of reward for this output) 29

Reinforcement Learning Typically, we need to get a sequence of decisions it is usually assumed that reward signals refer to the entire sequence 30

Is learning feasible? Learning an unknown function is impossible. The function can assume any value outside the data we have. However, it is feasible in a probabilistic sense. 31

Example 32

Generalization We don t intend to memorize data but need to figure out the pattern. A core objective of learning is to generalize from the experience. Generalization: ability of a learning algorithm to perform accurately on new, unseen examples after having experienced. 33

Components of (Supervised) Learning Learning model 34

Main Steps of Learning Tasks Selection of hypothesis set (or model specification) Which class of models (mappings) should we use for our data? Learning: find mapping f (from hypothesis set) based on the training data Which notion of error should we use? (loss functions) Optimization of loss function to find mapping f Evaluation: how well f generalizes to yet unseen examples How do we ensure that the error on future data is minimized? (generalization) 35

Some Learning Applications Face, speech, handwritten character recognition Document classification and ranking in web search engines Photo tagging Self-customizing programs (recommender systems) Database mining (e.g., medical records) Market prediction (e.g., stock/house prices) Computational biology (e.g., annotation of biological sequences) Autonomous vehicles 36

ML in Computer Science Why ML applications are growing? Improved machine learning algorithms Availability of data (Increased data capture, networking, etc) Demand for self-customization to user or environment Software too complex to write by hand 37

Handwritten Digit Recognition Example Data: labeled samples 0 1 2 3 4 5 6 7 8 9 38

Example: Input representation 39

Example: Illustration of features 40

Example: Classification boundary 41

Main Topics of the Course Supervised learning Regression Classification (our main focus) Learning theory Unsupervised learning Reinforcement learning Some advanced topics & applications Most of the lectures are on this topic 42

Resource Yaser S. Abu-Mostafa, Malik Maghdon-Ismail, and Hsuan Tien Lin, Learning from Data, 2012. 43