ECS171: Machine Learning

Similar documents
(Sub)Gradient Descent

Introduction to Ensemble Learning Featuring Successes in the Netflix Prize Competition

Python Machine Learning

Lecture 1: Machine Learning Basics

CSL465/603 - Machine Learning

Lecture 1: Basic Concepts of Machine Learning

CS Machine Learning

Module 12. Machine Learning. Version 2 CSE IIT, Kharagpur

Artificial Neural Networks written examination

Axiom 2013 Team Description Paper

Generative models and adversarial training

Assignment 1: Predicting Amazon Review Ratings

Probabilistic Latent Semantic Analysis

A study of speaker adaptation for DNN-based speech synthesis

Human Emotion Recognition From Speech

System Implementation for SemEval-2017 Task 4 Subtask A Based on Interpolated Deep Neural Networks

Laboratorio di Intelligenza Artificiale e Robotica

arxiv: v2 [cs.cv] 30 Mar 2017

Model Ensemble for Click Prediction in Bing Search Ads

Deep search. Enhancing a search bar using machine learning. Ilgün Ilgün & Cedric Reichenbach

Laboratorio di Intelligenza Artificiale e Robotica

A New Perspective on Combining GMM and DNN Frameworks for Speaker Adaptation

Australian Journal of Basic and Applied Sciences

Learning Structural Correspondences Across Different Linguistic Domains with Synchronous Neural Language Models

Twitter Sentiment Classification on Sanders Data using Hybrid Approach

CS 446: Machine Learning

CS 1103 Computer Science I Honors. Fall Instructor Muller. Syllabus

The 9 th International Scientific Conference elearning and software for Education Bucharest, April 25-26, / X

Learning Methods for Fuzzy Systems

Semi-supervised methods of text processing, and an application to medical concept extraction. Yacine Jernite Text-as-Data series September 17.

Exploration. CS : Deep Reinforcement Learning Sergey Levine

Machine Learning from Garden Path Sentences: The Application of Computational Linguistics

Rule Learning With Negation: Issues Regarding Effectiveness

Machine Learning and Data Mining. Ensembles of Learners. Prof. Alexander Ihler

Active Learning. Yingyu Liang Computer Sciences 760 Fall

A Comparison of Two Text Representations for Sentiment Analysis

Course Outline. Course Grading. Where to go for help. Academic Integrity. EE-589 Introduction to Neural Networks NN 1 EE

Learning Methods in Multilingual Speech Recognition

A Neural Network GUI Tested on Text-To-Phoneme Mapping

Softprop: Softmax Neural Network Backpropagation Learning

ADVANCED MACHINE LEARNING WITH PYTHON BY JOHN HEARTY DOWNLOAD EBOOK : ADVANCED MACHINE LEARNING WITH PYTHON BY JOHN HEARTY PDF

Rule Learning with Negation: Issues Regarding Effectiveness

OCR for Arabic using SIFT Descriptors With Online Failure Prediction

Phonetic- and Speaker-Discriminant Features for Speaker Recognition. Research Project

An OO Framework for building Intelligence and Learning properties in Software Agents

Lecture 10: Reinforcement Learning

Discriminative Learning of Beam-Search Heuristics for Planning

Online Updating of Word Representations for Part-of-Speech Tagging

CS 101 Computer Science I Fall Instructor Muller. Syllabus

ENME 605 Advanced Control Systems, Fall 2015 Department of Mechanical Engineering

PREDICTING SPEECH RECOGNITION CONFIDENCE USING DEEP LEARNING WITH WORD IDENTITY AND SCORE FEATURES

CS4491/CS 7265 BIG DATA ANALYTICS INTRODUCTION TO THE COURSE. Mingon Kang, PhD Computer Science, Kennesaw State University

Semi-Supervised GMM and DNN Acoustic Model Training with Multi-system Combination and Confidence Re-calibration

Issues in the Mining of Heart Failure Datasets

Reinforcement Learning by Comparing Immediate Reward

QuickStroke: An Incremental On-line Chinese Handwriting Recognition System

Iterative Cross-Training: An Algorithm for Learning from Unlabeled Web Pages

Training a Neural Network to Answer 8th Grade Science Questions Steven Hewitt, An Ju, Katherine Stasaski

Welcome to. ECML/PKDD 2004 Community meeting

Switchboard Language Model Improvement with Conversational Data from Gigaword

Version Space. Term 2012/2013 LSI - FIB. Javier Béjar cbea (LSI - FIB) Version Space Term 2012/ / 18

Multi-tasks Deep Learning Model for classifying MRI images of AD/MCI Patients

Unsupervised Learning of Word Semantic Embedding using the Deep Structured Semantic Model

CLASSIFICATION OF TEXT DOCUMENTS USING INTEGER REPRESENTATION AND REGRESSION: AN INTEGRATED APPROACH

arxiv: v1 [cs.lg] 15 Jun 2015

Speech Emotion Recognition Using Support Vector Machine

A survey of multi-view machine learning

Semi-Supervised Face Detection

Reducing Features to Improve Bug Prediction

Impact of Cluster Validity Measures on Performance of Hybrid Models Based on K-means and Decision Trees

Analysis of Hybrid Soft and Hard Computing Techniques for Forex Monitoring Systems

Learning From the Past with Experiment Databases

Calibration of Confidence Measures in Speech Recognition

Modeling function word errors in DNN-HMM based LVCSR systems

Using focal point learning to improve human machine tacit coordination

Testing A Moving Target: How Do We Test Machine Learning Systems? Peter Varhol Technology Strategy Research, USA

Lahore University of Management Sciences. FINN 321 Econometrics Fall Semester 2017

arxiv: v2 [cs.ir] 22 Aug 2016

Word Segmentation of Off-line Handwritten Documents

Dialog-based Language Learning

Entrepreneurial Discovery and the Demmert/Klein Experiment: Additional Evidence from Germany

University of Alberta. Large-Scale Semi-Supervised Learning for Natural Language Processing. Shane Bergsma

Modeling function word errors in DNN-HMM based LVCSR systems

Large-Scale Web Page Classification. Sathi T Marath. Submitted in partial fulfilment of the requirements. for the degree of Doctor of Philosophy

arxiv: v1 [cs.cv] 10 May 2017

Test Effort Estimation Using Neural Network

Second Exam: Natural Language Parsing with Neural Networks

Probability and Game Theory Course Syllabus

Exposé for a Master s Thesis

Probability and Statistics Curriculum Pacing Guide

Chapter 2 Rule Learning in a Nutshell

OPTIMIZATINON OF TRAINING SETS FOR HEBBIAN-LEARNING- BASED CLASSIFIERS

Ensemble Technique Utilization for Indonesian Dependency Parser

Multi-label classification via multi-target regression on data streams

Knowledge Transfer in Deep Convolutional Neural Nets

ScienceDirect. A Framework for Clustering Cardiac Patient s Records Using Unsupervised Learning Techniques

How to read a Paper ISMLL. Dr. Josif Grabocka, Carlotta Schatten

Multi-label Classification via Multi-target Regression on Data Streams

2/15/13. POS Tagging Problem. Part-of-Speech Tagging. Example English Part-of-Speech Tagsets. More Details of the Problem. Typical Problem Cases

Evolutive Neural Net Fuzzy Filtering: Basic Description

Transcription:

ECS171: Machine Learning Lecture 1: Overview of class, LFD 1.1, 1.2 Cho-Jui Hsieh UC Davis Jan 8, 2018

Course Information Website: http://www.stat.ucdavis.edu/~chohsieh/teaching/ ECS171_Winter2018/main.html and canvas My office: Mathematical Sciences Building (MSB) 4232 Office hours: Tuesday 1pm-2pm, MSB 4232 (starting next week) TAs: Patrick Chen (phpchen@ucdavis.edu) Xuanqing Liu (xqliu@ucdavis.edu) Office hour: Thursday 10AM 11AM Kemper 55 (starting next week) My email: chohsieh@ucdavis.edu

Course Information Course Material: Part I (before midterm exam): Use the book Learning from data (LFD) by Abu-Mostafa, Magdon-Ismail and Hsuan-Tian Lin Foundation of machine learning: why can we learn from data? overfitting, underfitting, training vs testing, regularization 11 lectures Most slides are based on Yaser Abu-Mostafa (Caltech): http://work.caltech.edu/lectures.html#lectures Hsuan-Tian Lin (NTU): https://www.csie.ntu.edu.tw/~htlin/course/mlfound17fall/ Part II: Introduce some practical machine learning models. Deep learning, kernel methods, boosting, tree-based approach, clustering, dimension reduction

Grading Policy Midterm (30%) Written exam for Part I Homework (30%) 2 or 3 homeworks Final project (40%) Competition?

Final project Group of 4 students. We will announce the dataset and task Kaggle-styled competition Upload your model/prediction online Our website will report the accuracy Final report: Report the algorithms you have tested and the implementation details Discuss your findings

The Learning Problem

From learning to machine learning What is learning? Machine learning: observations Learning Skill data Machine Learning Skill Automatic the learning process! Skill: how to make decision (action) Classify an image Predict bitcoin price...

Example: movie recommendation Data: user-movie ratings Skill: predict how a user rate an unrated movie Known as the Netflix problem A competition held by Netflix in 2006 1 million ratings, 480K users, 17K movies 10% improvement over baseline 1 million dollar price

Movie rating - a solution Each viewer/movie is associated with a latent factor Prediction: Rating viewer/movie factors Learning: Known ratings viewer/movie factors

Credit Approval Problem Customer record: To be learned: Is Approving credit card good for bank?

Formalize the Learning Problem Input: x X (customer application) e.g., x = [23, 1, 1000000, 1, 0.5, 200000] Output: y Y (good/bad after approving credit card) Target function to be learned: f : X Y (ideal credit approval formula) Data (historical records in bank): D = {(x 1, y 1 ), (x 2, y 2 ),, (x N, y N )} Hypothesis (function) g : X Y (learned formula to be used)

Basic Setup of Learning Problem

Learning Model A learning model has two components: The hypothesis set H: Set of candidate hypothesis (functions) The learning algorithm: To pick a hypothesis (function) from the H Usually optimization algorithm (choose the best function to minimize the training error)

Perceptron Our first ML model: perceptron (1957) Learning a linear function Single layer neural network Next, we introduce two components of perceptron: What s the hypothesis space? What s the learning algorithm?

Perceptron Hypothesis Space Define the hypothesis set H For input x = (x 1,..., x d ) attributes of a customer Approve credit if Deny credit if d w i x i > threshold, i=1 d w i x i < threshold i=1 Define Y = {+1(good), 1(bad)} Linear hypothesis space H: all the h with the following form d h(x) = sign( w i x i threshold) (perceptron hypothesis) i=1

Perceptron Hypothesis Space (cont d) Introduce an artificial coordinate x 0 = 1 and set w 0 = threshold d d h(x) = sign( w i x i threshold) = sign( w i x i ) = sign(w T x) (vector form) i=1 i=0 Customer features x: points on R d (d dimensional space) Labels y: +1 or 1 Hypothesis h: linear hyperplanes

Select g from H H: all possible linear hyperplanes How to select the best one? g(x n ) f (x n ) = y n for most of the n = 1,, N Naive approach: Test all h H and choose the best one minimizing the training error train error = 1 N N I (h(x n ) y n ) n=1 (I ( ): indicator) Difficult: H is of infinite size

Perceptron Learning Algorithm Perceptron Learning Algorithm (PLA) Initial from some w (e.g., w = 0) For t = 1, 2, Find a misclassified point n(t): Update the weight vector: sign(w T x n(t) ) y n(t) w w + y n(t) x n(t)

PLA Iteratively Find a misclassified point Rotate the hyperplane according to the misclassified point

Perceptron Learning Algorithm Converge for linearly separable case: Linearly separable: there exists a perceptron (linear) hypothesis f with 0 training error PLA is guaranteed to obtain f (Stop when no more misclassified point)

Binary classification Data: Features for each training example: {x n } N n=1, each x n R d Labels for each training example: y n {+1, 1} Goal: learn a function f : R d {+1, 1} Examples: Credit approve/disapprove Email spam/not-spam patient sick/not sick...

Other types of labels - Multi-class Multi-class classification: y n {1,, C} (C-way classification) Example: Coin recognition Classify coins by two features (size, mass) (x n R 2 ) y n Y = {1c, 5c, 10c, 25c} (Y = {1, 2, 3, 4}) Other examples: hand-written digits,

Other types of labels - Regression Regression: y n R (output is a real number) Example: Stock price prediction Movie rating prediction

Other types of labels - structure prediction I }{{} pronoun love }{{} verb ML }{{} noun Multiclass classification for each word (word word class) (not using information of the whole sentence) Structure prediction problem: sentence structure (class of each word) Other examples: speech recognition, image captioning,...

Machine Learning Problems Machine learning problems can usually be categorized into Supervised learning: every x n comes with y n (label) (semi-supervised learning) Unsupervised learning: only x n, no y n Reinforcement learning: Examples contain (input, some output, grade for this output)

Unsupervised Learning (no y n ) Clustering: given examples x 1,..., x N, classify them into K classes Other unsupervised learning: Outlier detection: {x n } unusual(x) Dimensional reduction...

Semi-supervised learning Only some (few) x n has y n Labeled data is much more expensive than unlabeled data

Reinforcement Learning Used a lot in game AI, robotic controls Agent observe state S t Agent conduct action A t (ML model, based on input S t ) Environment gives agent reward R t Environment gives agent next state S t+1 Only observe grade for a certain action (best action is not revealed) Ads system: (customer, ad choice, click or not)

Conclusions Two components in ML: Set up a hypothesis space (potential functions) Develop an algorithm to choose a good hypothesis based on training examples A perceptron algorithm (linear classification) Supervised vs unsupervised learning Next class: LFD 1.3, 1.4 Questions?